Philipp Rudo <prudo(a)redhat.com> 于2021年12月6日周一 23:47写道:
On Fri, 3 Dec 2021 21:05:11 +0800
Kairui Song <ryncsn(a)gmail.com> wrote:
> Coiby Xu <coxu(a)redhat.com> 于 2021年12月3日周五 下午6:00写道:
> >
> > On Wed, Dec 01, 2021 at 06:15:07PM +0800, Kairui Song wrote:
> > >Coiby Xu <coxu(a)redhat.com> 于2021年11月19日周五 上午11:24写道:
> > >>
> > >> The crashkernel=auto implementation in kernel space has been rejected
> > >> upstream [1]. The current user space implementation [2] [3] ships a
> > >> crashkernel.default but hasn't supported the swiotlb memory
requirement,
> > >> custom crashkernel value from user and fadump.
> > >>
> > >> The crashkernel.default implementation seems to be overly
> > >> complex,
> > >> - the default crashkernel value rarely changes. This is no
need to ship
> > >> the same crashkernel.default default for every
kernel package of a
> > >> architecture;
> > >> - when deciding the value of crashkernel for a new
kernel, the
> > >> crashkernel.default of existing kernel is took into
consideration
> > >>
> > >> We can simply let the kexec-tools maintain the default crashkernel
> > >> values and provide an API for kdump-anacon-addon to query it. And for
> > >> a newly installed kernel, we can simply call "kdumpctl
reset-crashkernel
> > >> KERNELPATH" to set its crashkernel value.
> > >>
> > >> For the unfulfilled requirements,
> > >> - crashkernel is introduced to
/etc/kdump.conf for the user can set
> > >> custom crashkernel value to tell
kexec-tools to manage crashkernel
> > >> value automatically.
> > >> - "kdumpctl
reset-crashkernel" has been written for the above
> > >> purpose.
> > >> - "kdumpctl fadump on/off"
is added for supporting fadump.
> > >>
> > >>
> > >> [1]
https://lore.kernel.org/linux-mm/20210507010432.IN24PudKT%25akpm@linux-fo...
> > >> [2]
https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1171
> > >> [3]
https://lists.fedoraproject.org/archives/list/kexec@lists.fedoraproject.o...
> > >>
> > >
> > >Nice idea👍, this makes thing so much cleaner.
> >
> > Thanks for endorsing this idea!
> >
> > >
> > >I remember the per version crashkernel.default file is introduced for
> > >many purposes in mind.
> > >For example a new kernel package may enabled/disabled something so it
> > >will need a larger/smaller crashkernel value, or, eg. a variant
> > >kernel, like debug kernel, can set itself a larger value.
> >
>
> Hi, thanks for the reply.
@Kairui: Nice to see you active on the mailing list :)
Nice to see you too :)
> > I have been wondering why you ship a crashkernel.default with a kernel
> > package, debug kernel could be one of the reasons. But kdump memory
> > requirement could change dynamically. For example, the user could toggle
> > SME on/off. So a static crashkernel couldn't address this dynamic
> > change.
>
> That's absolutely correct, kernel itself isn't capable of estimating
> the crashkernel value. And actually crashkernel.default wasn't suppose
> to play the sole role on deciding the final crashkernel value.
>
> Userspace have the ultimate right and more capable of estimating and
> deciding the crashkernel value.
>
> i was planning previously, that crashkernel value decision will be
> composed of two parts:
> - kernel's crashkernel.default value as a reference.
> - kexec-tools do some estimation and adjust based on the kernel value.
>
> This way both kernel and kexec-tools package have a say on deciding
> the final crashkernel value.
>
> > On the other hand, we should be able to tell if a kernel is a
> > debug kernel or has something enabled/disabled in kexec-tools. So my idea
> > is the default crashkernel value could act as a baseline value and we
> > will increase/decrease the crashernel value dynamically after evaluating
> > different cases in kexec-tools.
>
> I totally agree, I originally wanted to enhance the "kdumpctl
> estimate", it will have two ways of estimating:
>
> - Fast path: according to data collectable in first kernel (including
> the crashkernel.default file).
> - Slow path: do a actually kdump and collect info during the kdump
> run. It need at least one reboot, which will be really slow on some
> big servers, it's an unanswered question if this really worth it.
>
> Then kexec-tools can suggest, or even set the crashkernel based on the
> estimating. But it's undecided how the two different estimating will
> play together on deciding the final crashkernel value. Maybe keep
> using the value estimated by fast path unless user manually triggers
> a slow path run. (And the slow path run will be invalided after kernel
> upgrade, so again not sure if user will really like it, maybe just
> focus on development of the fast path estimate is simpler).
Actually this is something I was pondering on in the last few days,
too. My trigger was looking at patch 9 from this series as it is adding
the extra memory for SEV only when setting crashkernel= but not when
estimating it. On the other hand do_estimate does consider some cases
(like the memory required for encrypted targets) that are missing in
reset_kernel. So I started thinking about how those two cases but
couldn't find a proper solution yet.
The main problem I see is that 'estimate' and 'reset-crashkernel' run
in completely different environments. Especially 'reset-crashkernel' is
supposed to run during the installation of a kernel which leads to some
problems. For example there is no dump initramfs created yet and it
might not be possible to create one as the currently running kernel and
the one that is installed might require different modules.
So the "perfect" solution might need to look something like this
1) boot system to run updates
2) reboot new kernel to estimate and set crashkernel=
2b) reboot for slow path (optional)
3) reboot to make use of new crashkernel= value
and that every time when installing a new kernel, which I don't think
is reasonable.
That's where I'm stuck at the moment...
I'm also not very sure about this, I had some other ideas previously...
First maybe the installer can just use default value for default
installation, this is no way one can know what the user will use for
kdump later, so installer just don't do anything special. (there is
currently a special case for encrypted storage, because encrypted
storage is more and more common but kdump doesn't work by default...
it was suppose to be removed once the keyring can be reused across
kexec, then the default value should cover most default local dump
setup).
And after the installation, I was thinking about two solutions:
A.
Make the crashkernel.default value set to an acceptable value,
covering 99% of kdump setup.
Then kexec-tools can do a fast path (should spend about <1s)
estimation every time it boot/start/restart, and print a warning if
the crashkernel value is not suitable for the kdump config for special
cases. It's up to user to customize the crashkernel value, kexec-tools
just provide some helper.
New installer kernel either inherit the customized value or stick with
default value, user is responsible for updating the value if
customized.
B.
Make the crashkernel.default value a large enough value to cover
(almost) every cases.
kexec-tools do a estimation on boot, and frees unneeded memory. If
kdump is disabled then free all.
One issue is how to balance the value between big enough to cover
everything and not failing boot due to OOM. Kexec-tools can still
generate some warning if memory is not enough but this is suppose to
be very very rare, and I doubt anyone will still want kdump if it eat
that much memory.
New installed kernel just use default value, as in such case,
customized value will be very rare now, and kexec-tools can warn the
user seriously that after kernel upgrade, remember to redo the
crashkernel value customization.
For both solution, the slow path estimation can be used to refine the
estimation result, and user can also use it regularly for debug or
test the crashkernel value, and do a slow path estimate for first boot
might not be a bad idea. Only issue is how the slow path estimating
result should be invalidated, invalidate it for every kernel upgrade
seem too aggressive, invalidate it every major/minor release might be
too inaccurate?