Hi Kairui,
On Fri, 21 Jan 2022 19:08:23 +0800
Kairui Song <ryncsn(a)gmail.com> wrote:
Hi,
Philipp Rudo <prudo(a)redhat.com> 于2022年1月20日周四 22:53写道:
>
> Hi Kairui,
>
> On Thu, 20 Jan 2022 14:04:51 +0800
> Kairui Song <ryncsn(a)gmail.com> wrote:
>
> > Hi Philipp,
> >
> > Thanks for the detailed review!
> >
> > Philipp Rudo <prudo(a)redhat.com> 于2022年1月20日周四 03:15写道:
> > >
> > > Hi Kairui,
> > >
> > > I think I understand what you are doing. A few comments/questions below.
> > >
> > > On Tue, 18 Jan 2022 15:05:47 +0800
> > > Kairui Song <ryncsn(a)gmail.com> wrote:
> > >
> > > > From: Kairui Song <kasong(a)tencent.com>
> > > >
> > > > Currently kdump simply uses dracut and pulls all the required
binaries
> > > > from the host directly. This keep things simple as kdump is using
same
> > > > binaries and libs from the system of fisrt kernel.
> > > >
> > > > But some binaries, like systemd, is growing too large, which
increase
> > > > the memory usage by a lot, and most of its features are never used
> > > > in kdump. So it's more reasonable to use a rebuilt-minimized
version.
> > > >
> > > > So for such binaries, introduce a "build root" overlay
here. Before
> > > > calling dracut to build initramfs, setup an overlay to let the build
> > > > root override current rootfs, so dracut can just work as usual, but
the
> > > > binaries used are from the "build root".
> > > >
> > > > All binaries/libs that need to be rebuild for smaller size can be put
in
> > > > the build root.
> > > >
> > > > Signed-off-by: Kairui Song <kasong(a)tencent.com>
> > > > ---
> > > >
> > > > Hi,
> > > >
> > > > I'm trying again to make this package into Fedora:
> > > >
https://github.com/ryncsn/kdump-initrd-systemd
> > > >
> > > > It's a minimized version of systemd, as kdump heavily depend on
> > > > systemd and so using a minimized version of systemd just for kdump
can
> > > > help a lot for saving memory.
> > > >
> > > > But it need to depend on kexec-tools for the present of this dir:
> > > > `$RPM_BUILD_ROOT%{_prefix}/lib/buildroot`
> > > >
> > > > So this patch have to be merged first, then selinux rule need to be
> > > > updated, then that kdump-initrd-systemd package can pass rpm checks
> > > > and get included in Fedora.
> > > >
> > > > I've sent the patch previously and got acked by Pingfan, but it
didn't
> > > > get merged that time:
> > > >
https://lists.fedoraproject.org/archives/list/kexec@lists.fedoraproject.o...
> > > >
> > > > There have been many updated to kexec-tools so I rebased this patch
to
> > > > latest rawhide.
> > > >
> > > > Next steps are:
> > > > 1. Discuss and get this patch merged, and make a new kexec-tools
release.
> > > > 2. Open a bug for SELinux rules to let mkdumprd have access to
> > > > buildroot path. SELinux team will need the code base from
kexec-tools
> > > > as reference the SELinux rule update.
> > > > 3. Introduce that package (and maybe other minimized package) to
Fedora.
> > > >
> > > > I've tested this on some cloud instances, which show a great
improvement
> > > > in memory usage:
> > > >
> > > > With kdump-initrd-systemd
> > > > [debug_mem] kdump saving vmcore
> > > > MemTotal: 90084 kB
> > > > MemFree: 18612 kB
> > > > MemAvailable: 49004 kB
> > > >
> > > > Without kdump-initrd-systemd
> > > > [debug_mem] kdump saving vmcore
> > > > MemTotal: 90084 kB
> > > > MemFree: 9492 kB
> > > > MemAvailable: 39800 kB
> > > >
> > > > Which is a good start point, as systemd can be minimized even more,
and
> > > > some libraries are still being depended on by other packaged, which
can also
> > > > be minimized later. Systemd is a big part of the initramfs that
can't be
> > > > stirpped easily so the repackaged version will be very helpful.
> > > >
> > > > kexec-tools.spec | 1 +
> > > > mkdumprd | 20 +++++++++++++++++++-
> > > > 2 files changed, 20 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kexec-tools.spec b/kexec-tools.spec
> > > > index 849d09c..c90a60a 100644
> > > > --- a/kexec-tools.spec
> > > > +++ b/kexec-tools.spec
> > > > @@ -173,6 +173,7 @@ mkdir -p $RPM_BUILD_ROOT%{_unitdir}
> > > > mkdir -p -m755 $RPM_BUILD_ROOT%{_bindir}
> > > > mkdir -p -m755 $RPM_BUILD_ROOT%{_libdir}
> > > > mkdir -p -m755 $RPM_BUILD_ROOT%{_prefix}/lib/kdump
> > > > +mkdir -p -m755 $RPM_BUILD_ROOT%{_prefix}/lib/buildroot
> > > > mkdir -p -m755 $RPM_BUILD_ROOT%{_sharedstatedir}/kdump
> > > > install -m 755 %{SOURCE1} $RPM_BUILD_ROOT%{_bindir}/kdumpctl
> > > >
> > > > diff --git a/mkdumprd b/mkdumprd
> > > > index 593ec77..386f55a 100644
> > > > --- a/mkdumprd
> > > > +++ b/mkdumprd
> > > > @@ -22,6 +22,7 @@ if ! dlog_init; then
> > > > exit 1
> > > > fi
> > > >
> > > > +KDUMP_BUILDROOT="/usr/lib/kdump/buildroot"
> > > > SSH_KEY_LOCATION="/root/.ssh/kdump_id_rsa"
> > > > SAVE_PATH=$(get_save_path)
> > > > OVERRIDE_RESETTABLE=0
> > > > @@ -32,11 +33,13 @@ dracut_args=(--add kdumpbase --quiet --hostonly
--hostonly-cmdline --hostonly-i1
> > > > MKDUMPRD_TMPDIR="$(mktemp -d -t mkdumprd.XXXXXX)"
> > > > [ -d "$MKDUMPRD_TMPDIR" ] || perror_exit "dracut:
mktemp -p -d -t dracut.XXXXXX failed."
> > > > MKDUMPRD_TMPMNT="$MKDUMPRD_TMPDIR/target"
> > > > +MKDUMPRD_TMPROOT="$MKDUMPRD_TMPDIR/root"
> > > >
> > > > trap '
> > > > ret=$?;
> > > > is_mounted $MKDUMPRD_TMPMNT && umount -f
$MKDUMPRD_TMPMNT;
> > > > [[ -d $MKDUMPRD_TMPDIR ]] && rm --one-file-system -rf --
"$MKDUMPRD_TMPDIR";
> > > > + [[ -d $MKDUMPRD_TMPROOT ]] && rm --one-file-system -rf
-- "$MKDUMPRD_TMPROOT";
> > >
> > > I don't think you need the line here. TMPROOT is a sub-directory of
> > > TMPDIR and thus should be removed in the line above.
> >
> > I'm not sure if we might need to write something into TMPROOT in the
> > future, and it might be a standalone file system, the previous `rm` is
> > called with "--one-file-system" so it won't be covered it. But
for now
> > it's RO so I think we can drop this line indeed.
>
> sorry, I don't get your argument here. TMPROOT is a sub-directory of
> TMPDIR. So both must be on the same fs unless we explicitly mount
> something to TMPROOT (in the same namespace). But even in that case I
> would expect that any change made to TMPDIR is reverted, i.e. the
> correct sequence would be
>
> 1) rm -rf --one-file-system TMPROOT
> 2) umount TMPROOT
> 3) rm -rf --one-file-system TMPDIR
>
> Or am I missing something?
Oh, you are right, please ignore my reply.
>
> > > > exit $ret;
> > > > ' EXIT
> > > >
> > > > @@ -457,7 +460,22 @@ if ! is_fadump_capable; then
> > > > add_dracut_arg "--no-hostonly-default-device"
> > > > fi
> > > >
> > > > -dracut "${dracut_args[@]}" "$@"
> > > > +if [ -e $KDUMP_BUILDROOT/usr ] || [ -e $KDUMP_BUILDROOT/etc ]; then
> > >
> > > just for my understanding. In $KDUMP_BUILDROOT/{usr,etc} other
> > > packages, i.e. not kexec-tools, can install binaries/configs that are
> > > supposed to be used in the kdump initrd instead of their counterparts
> > > with the same name in /usr or /etc. Is that correct?
> >
> > Yes, that's correct.
> >
> > > If the above is correct I think we should also implement a mechanism
> > > that marks an initramfs as "tainted" if anything was added to
those
> > > directories. Otherwise I see the danger that a lot of time will be
> > > wasted by debugging problems that are caused by other rpms which are
> > > not supported by our team and in worst case cannot be reproduced
> > > because they are caused by bugs in some private rpm...
> >
> > Interesting idea, how about include a new file in the initramfs (eg.
> > kdumproot-files.txt) to indicate which file is included from the kdump
> > buildroot.
>
> That was also my first idea. I would only also add the package name
> that provided the file together with the file name, i.e. something like
>
> <file> $(rpm -q --whatprovides <file>)
>
> for every file.
>
> But I'm not sure if that is sufficient as people usually only provide a
> log when they report a bug but not the iniramfs. So the log must
> contain at least an indicator that something was added so we can ask
> the reporter to provide the initramfs to get that file.
>
> What makes things even worse is that this log entry should ideally be
> printed by the kernel during boot. Otherwise there is the chance that
> kdump.sh is never executed to print the log entry. For example consider
> an adjusted systemd that crashes early during boot causing a kernel
> panic.
> The best idea I had so far is to add the indicator to the kernel
> command line. But that is quite a dirty hack in my opinion.
I think it's actually quite common to ask user to provide the
initramfs in case of kdump fail, or at least ask user to provide log
of the first kernel.
That's more or less why I want the indicator. Customers don't provide
the initramfs by themselves but we have to ask for it. The indicator
tells us that we need to ask for it.
So Kdump service can generate some extra log in first kernel as well
to indicate the initramfs contains some special binaries from the
buildroot.
Printing some extra log in the first kernel doesn't hurt. But in my
opinion having an indicator in the second kernel is more important. As
I think it's more likely that customers provide the second kernels
log rather than the first kernels one.
But yes, for debugging that will probably help.
I agree that provide an early pre-init log definitely help, but not
sure if this is neccessary. If we really need that, just adding extra
kernel should be OK.
Yes, printing a full log isn't necessary. Simply having the indicator
should be enough.
Currently we still have a
"BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.10-300.fc35.aarch64" for kdump
kernel, and kernel complains "Unknown command line parameters:
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.10-300.fc35.aarch64". Maybe add an
extra "kdump.initrd_tainted" is not that bad, and second kernel
userspace may make use of this parameter later as well. And many other
userspaces are all (ab)usesing the kernel cmdline to passthrough
infos, I think it's quite common.
True, it's common. But it's still an abuse ;)
Anyway, I believe there is no other way for what we want. Having the
"kdump.initamfs_tainted" parameter sounds like the way to go for me.
I can make the initramfs itself, mkdumprd build process, kdump load
service, and, kernel cmdline, all have some small splice of change to
indicate that the kdump initramfs is built with things from the build
root, and it's not hard to do so, and there is no performance hit.
Just maybe it's a bit too noisy? How do you think?
As said above, in my opinion it's important to have a file in the
initrd that contains which binary/config was overwritten by which rpm
and an indicator in the second kernels log that the initrd is tainted.
Everything else might be helpful but isn't really necessary in my
opinion.
Thanks
Philipp