Thin provision is a mechanism that you can allocate a lvm volume which has a large virtual size for file systems but actually in a small physical size. The physical size can be autoextended in use if thin pool reached a threshold specified in /etc/lvm/lvm.conf.
There are 2 works should be handled when enable lvm2 thinp for kdump:
1) Check if the dump target device or directory is thinp device. 2) Monitor the thin pool and autoextend its size when it reached the threshold during kdump.
According to my testing, the memory consumption procedure for lvm2 thinp is the thin pool size-autoextend phase. For fedora and rhel9, the default crashkernel value is enough. But for rhel8, the default crashkernel value 1G-4G:160M is not enough, so it should be handled particularly.
v1 -> v2:
1) Modified the usage of lvs cmd when check if target is lvm2 thinp device. 2) Removed the sync flag way of mounting for lvm2 thinp target during kdump, use "sync -f vmcore" to force sync data, and handle the error if fails.
v2 -> v3:
1) Removed "sync -f vmcore" patch out of the patch set, for it is addressing an issue which is not specifically to lvm2 thinp support for kdump.
Tao Liu (3): Add lvm2 thin provision dump target checker Add lvm2-monitor.service for kdump when lvm2 thinp enabled lvm.conf should be check modified if lvm2 thinp enabled
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kdump-lib-initramfs.sh | 20 ++++++++++++++++++++ kdumpctl | 1 + kexec-tools.spec | 2 ++ 5 files changed, 54 insertions(+) create mode 100644 dracut-lvm2-monitor.service
We need to check if a directory or a device is lvm2 thinp target.
First, we use get_block_dump_target() to convert dump path into block device, then we check if the device is lvm2 thinp target by cmd lvs.
Signed-off-by: Tao Liu ltao@redhat.com --- kdump-lib-initramfs.sh | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/kdump-lib-initramfs.sh b/kdump-lib-initramfs.sh index 84e6bf7..92404f4 100755 --- a/kdump-lib-initramfs.sh +++ b/kdump-lib-initramfs.sh @@ -131,3 +131,22 @@ is_fs_dump_target() { [ -n "$(kdump_get_conf_val "ext[234]|xfs|btrfs|minix")" ] } + +is_lvm2_thinp_device() +{ + _device_path=$1 + _lvm2_thin_device=$(lvs -S 'lv_layout=sparse && lv_layout=thin' \ + --nosuffix --noheadings -o vg_name,lv_name "$_device_path" 2>/dev/null) + + [ -n "$_lvm2_thin_device" ] && return $? +} + +is_lvm2_thinp_dump_target() +{ + _target=$(get_block_dump_target) + if [ -n "$_target" ]; then + is_lvm2_thinp_device "$_target" + else + return 1 + fi +} \ No newline at end of file
If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for monitor and autoextend the size of thin pool. Otherwise the vmcore dumped to a no-enough-space target will be incomplete and unable for further analysis.
In this patch, lvm2-monitor.service will be started before kdump-capture .service for 2nd kernel, then be stopped in kdump post.d phase. So the thin pool monitoring and size-autoextend can be ensured during kdump.
Signed-off-by: Tao Liu ltao@redhat.com --- dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kexec-tools.spec | 2 ++ 3 files changed, 33 insertions(+) create mode 100644 dracut-lvm2-monitor.service
diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service new file mode 100644 index 0000000..88e79e1 --- /dev/null +++ b/dracut-lvm2-monitor.service @@ -0,0 +1,15 @@ +[Unit] +Description=Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling +Documentation=man:dmeventd(8) man:lvcreate(8) man:lvchange(8) man:vgchange(8) +After=initrd.target initrd-parse-etc.service sysroot.mount +After=dracut-initqueue.service dracut-pre-mount.service dracut-mount.service dracut-pre-pivot.service +Before=initrd-cleanup.service kdump-capture.service shutdown.target local-fs-pre.target +DefaultDependencies=no +Conflicts=shutdown.target + +[Service] +Type=oneshot +Environment=LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES=1 +ExecStart=/usr/sbin/lvm vgchange --monitor y +ExecStop=/usr/sbin/lvm vgchange --monitor n +RemainAfterExit=yes \ No newline at end of file diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index c319fc2..19c0f46 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -1016,6 +1016,20 @@ remove_cpu_online_rule() { sed -i '/SUBSYSTEM=="cpu"/d' "$file" }
+kdump_install_lvm2_monitor_service() +{ + inst "$moddir/lvm2-monitor.service" "$systemdsystemunitdir/lvm2-monitor.service" + systemctl -q --root "$initdir" add-wants initrd.target lvm2-monitor.service + + # We should stop lvm2-monitor service after kdump. SIGTERM is ignored + # by dmeventd when device is monitored. So before stopping dmevend, devices + # shall be unmonitored. This can save the waiting time between systemd-shutdown + # Sending SIGTERM and SIGKILL to remaining processes. + mkdir -p "${initdir}/etc/kdump/post.d" + echo "systemctl stop lvm2-monitor" > "${initdir}/etc/kdump/post.d/stop-lvm2-monitor.sh" + chmod +x "${initdir}/etc/kdump/post.d/stop-lvm2-monitor.sh" +} + install() { local arch
@@ -1058,6 +1072,8 @@ install() { inst "$moddir/kdump.sh" "/usr/bin/kdump.sh" inst "$moddir/kdump-capture.service" "$systemdsystemunitdir/kdump-capture.service" systemctl -q --root "$initdir" add-wants initrd.target kdump-capture.service + is_lvm2_thinp_dump_target && + kdump_install_lvm2_monitor_service # Replace existing emergency service and emergency target cp "$moddir/kdump-emergency.service" "$initdir/$systemdsystemunitdir/emergency.service" cp "$moddir/kdump-emergency.target" "$initdir/$systemdsystemunitdir/emergency.target" diff --git a/kexec-tools.spec b/kexec-tools.spec index 6673000..5f4344d 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -60,6 +60,7 @@ Source109: dracut-early-kdump-module-setup.sh
Source200: dracut-fadump-init-fadump.sh Source201: dracut-fadump-module-setup.sh +Source202: dracut-lvm2-monitor.service
%ifarch ppc64 ppc64le Requires(post): servicelog @@ -240,6 +241,7 @@ cp %{SOURCE102} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpb cp %{SOURCE104} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE104}} cp %{SOURCE106} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE106}} cp %{SOURCE107} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE107}} +cp %{SOURCE202} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE202}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}} mkdir -p -m755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99earlykdump
lvm2 relies on /etc/lvm/lvm.conf to determine its behaviour. The important configs such as thin_pool_autoextend_threshold and thin_pool_autoextend_percent will be used during kdump in 2nd kernel. So if the file is modified, the initramfs should be rebuild to include the latest.
Signed-off-by: Tao Liu ltao@redhat.com --- kdump-lib-initramfs.sh | 1 + kdumpctl | 1 + 2 files changed, 2 insertions(+)
diff --git a/kdump-lib-initramfs.sh b/kdump-lib-initramfs.sh index 92404f4..8ea2d66 100755 --- a/kdump-lib-initramfs.sh +++ b/kdump-lib-initramfs.sh @@ -8,6 +8,7 @@ DEFAULT_SSHKEY="/root/.ssh/kdump_id_rsa" KDUMP_CONFIG_FILE="/etc/kdump.conf" FENCE_KDUMP_CONFIG_FILE="/etc/sysconfig/fence_kdump" FENCE_KDUMP_SEND="/usr/libexec/fence_kdump_send" +LVM_CONF="/etc/lvm/lvm.conf"
# Read kdump config in well formated style kdump_read_conf() diff --git a/kdumpctl b/kdumpctl index 6188d47..b157eb8 100755 --- a/kdumpctl +++ b/kdumpctl @@ -383,6 +383,7 @@ check_files_modified()
# HOOKS is mandatory and need to check the modification time files="$files $HOOKS" + is_lvm2_thinp_dump_target && files="$files $LVM_CONF" check_exist "$files" && check_executable "$EXTRA_BINS" || return 2
for file in $files; do
On Fri, May 27, 2022 at 02:45:12PM +0800, Tao Liu wrote:
Thin provision is a mechanism that you can allocate a lvm volume which has a large virtual size for file systems but actually in a small physical size. The physical size can be autoextended in use if thin pool reached a threshold specified in /etc/lvm/lvm.conf.
So what's the core requirement? Is it about that root filesystem of the machine is on a thin device and after crash we are trying to save dump to rootfs.
Typically rootfs might have been setup for auto-extension. But problem here is that you are dumping the core from initrafsm context and auto extension might not work out of the box? So you are basically trying to pack dmeventd systemd unit file into initramfs and /etc/lvm/lvm.conf in initramfs to make auto extension work.
Vivek
There are 2 works should be handled when enable lvm2 thinp for kdump:
- Check if the dump target device or directory is thinp device.
- Monitor the thin pool and autoextend its size when it reached the threshold during kdump.
According to my testing, the memory consumption procedure for lvm2 thinp is the thin pool size-autoextend phase. For fedora and rhel9, the default crashkernel value is enough. But for rhel8, the default crashkernel value 1G-4G:160M is not enough, so it should be handled particularly.
v1 -> v2:
- Modified the usage of lvs cmd when check if target is lvm2 thinp device.
- Removed the sync flag way of mounting for lvm2 thinp target during kdump, use "sync -f vmcore" to force sync data, and handle the error if fails.
v2 -> v3:
- Removed "sync -f vmcore" patch out of the patch set, for it is
addressing an issue which is not specifically to lvm2 thinp support for kdump.
Tao Liu (3): Add lvm2 thin provision dump target checker Add lvm2-monitor.service for kdump when lvm2 thinp enabled lvm.conf should be check modified if lvm2 thinp enabled
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kdump-lib-initramfs.sh | 20 ++++++++++++++++++++ kdumpctl | 1 + kexec-tools.spec | 2 ++ 5 files changed, 54 insertions(+) create mode 100644 dracut-lvm2-monitor.service
-- 2.33.1
On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote:
If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for monitor and autoextend the size of thin pool. Otherwise the vmcore dumped to a no-enough-space target will be incomplete and unable for further analysis.
In this patch, lvm2-monitor.service will be started before kdump-capture .service for 2nd kernel, then be stopped in kdump post.d phase. So the thin pool monitoring and size-autoextend can be ensured during kdump.
Signed-off-by: Tao Liu ltao@redhat.com
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kexec-tools.spec | 2 ++ 3 files changed, 33 insertions(+) create mode 100644 dracut-lvm2-monitor.service
diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service
This seems to be a copy of /lib/systemd/system/lvm2-monitor.service. Wondering if we can dirctly include that file in initramfs when generating image. But I am fuzzy on details of dracut implementation. It has been too long since I played with it. So Bao and kdump team will be best to comment on this.
Vivek
new file mode 100644 index 0000000..88e79e1 --- /dev/null +++ b/dracut-lvm2-monitor.service @@ -0,0 +1,15 @@ +[Unit] +Description=Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling +Documentation=man:dmeventd(8) man:lvcreate(8) man:lvchange(8) man:vgchange(8) +After=initrd.target initrd-parse-etc.service sysroot.mount +After=dracut-initqueue.service dracut-pre-mount.service dracut-mount.service dracut-pre-pivot.service +Before=initrd-cleanup.service kdump-capture.service shutdown.target local-fs-pre.target +DefaultDependencies=no +Conflicts=shutdown.target
+[Service] +Type=oneshot +Environment=LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES=1 +ExecStart=/usr/sbin/lvm vgchange --monitor y +ExecStop=/usr/sbin/lvm vgchange --monitor n +RemainAfterExit=yes \ No newline at end of file diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index c319fc2..19c0f46 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -1016,6 +1016,20 @@ remove_cpu_online_rule() { sed -i '/SUBSYSTEM=="cpu"/d' "$file" }
+kdump_install_lvm2_monitor_service() +{
- inst "$moddir/lvm2-monitor.service" "$systemdsystemunitdir/lvm2-monitor.service"
- systemctl -q --root "$initdir" add-wants initrd.target lvm2-monitor.service
- # We should stop lvm2-monitor service after kdump. SIGTERM is ignored
- # by dmeventd when device is monitored. So before stopping dmevend, devices
- # shall be unmonitored. This can save the waiting time between systemd-shutdown
- # Sending SIGTERM and SIGKILL to remaining processes.
- mkdir -p "${initdir}/etc/kdump/post.d"
- echo "systemctl stop lvm2-monitor" > "${initdir}/etc/kdump/post.d/stop-lvm2-monitor.sh"
- chmod +x "${initdir}/etc/kdump/post.d/stop-lvm2-monitor.sh"
+}
install() { local arch
@@ -1058,6 +1072,8 @@ install() { inst "$moddir/kdump.sh" "/usr/bin/kdump.sh" inst "$moddir/kdump-capture.service" "$systemdsystemunitdir/kdump-capture.service" systemctl -q --root "$initdir" add-wants initrd.target kdump-capture.service
- is_lvm2_thinp_dump_target &&
# Replace existing emergency service and emergency target cp "$moddir/kdump-emergency.service" "$initdir/$systemdsystemunitdir/emergency.service" cp "$moddir/kdump-emergency.target" "$initdir/$systemdsystemunitdir/emergency.target"kdump_install_lvm2_monitor_service
diff --git a/kexec-tools.spec b/kexec-tools.spec index 6673000..5f4344d 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -60,6 +60,7 @@ Source109: dracut-early-kdump-module-setup.sh
Source200: dracut-fadump-init-fadump.sh Source201: dracut-fadump-module-setup.sh +Source202: dracut-lvm2-monitor.service
%ifarch ppc64 ppc64le Requires(post): servicelog @@ -240,6 +241,7 @@ cp %{SOURCE102} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpb cp %{SOURCE104} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE104}} cp %{SOURCE106} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE106}} cp %{SOURCE107} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE107}} +cp %{SOURCE202} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE202}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}} mkdir -p -m755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99earlykdump -- 2.33.1
On Fri, May 27, 2022 at 02:45:15PM +0800, Tao Liu wrote:
lvm2 relies on /etc/lvm/lvm.conf to determine its behaviour. The important configs such as thin_pool_autoextend_threshold and thin_pool_autoextend_percent will be used during kdump in 2nd kernel. So if the file is modified, the initramfs should be rebuild to include the latest.
Signed-off-by: Tao Liu ltao@redhat.com
kdump-lib-initramfs.sh | 1 + kdumpctl | 1 + 2 files changed, 2 insertions(+)
diff --git a/kdump-lib-initramfs.sh b/kdump-lib-initramfs.sh index 92404f4..8ea2d66 100755 --- a/kdump-lib-initramfs.sh +++ b/kdump-lib-initramfs.sh @@ -8,6 +8,7 @@ DEFAULT_SSHKEY="/root/.ssh/kdump_id_rsa" KDUMP_CONFIG_FILE="/etc/kdump.conf" FENCE_KDUMP_CONFIG_FILE="/etc/sysconfig/fence_kdump" FENCE_KDUMP_SEND="/usr/libexec/fence_kdump_send" +LVM_CONF="/etc/lvm/lvm.conf"
# Read kdump config in well formated style kdump_read_conf() diff --git a/kdumpctl b/kdumpctl index 6188d47..b157eb8 100755 --- a/kdumpctl +++ b/kdumpctl @@ -383,6 +383,7 @@ check_files_modified()
# HOOKS is mandatory and need to check the modification time files="$files $HOOKS"
- is_lvm2_thinp_dump_target && files="$files $LVM_CONF"
this probably should be merged with first patch. I was wondering how is_lvm2_thinp_dump_target() is being used. I now see that you are using it to add /etc/lvm/lvm.conf file to initramfs.
I would have thought that /etc/lvm/lvm.conf will be needed in initramfs if rootfs is on a lvm volume. Thinp is just one of the use cases. May be that's the not the case. No dracut module includes lvm.conf even if rootfs is on an lvm volume.
Vivek
for file in $files; do
2.33.1
Dne 27. 05. 22 v 14:20 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote:
If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for monitor and autoextend the size of thin pool. Otherwise the vmcore dumped to a no-enough-space target will be incomplete and unable for further analysis.
In this patch, lvm2-monitor.service will be started before kdump-capture .service for 2nd kernel, then be stopped in kdump post.d phase. So the thin pool monitoring and size-autoextend can be ensured during kdump.
Signed-off-by: Tao Liu ltao@redhat.com
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kexec-tools.spec | 2 ++ 3 files changed, 33 insertions(+) create mode 100644 dracut-lvm2-monitor.service
diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service
This seems to be a copy of /lib/systemd/system/lvm2-monitor.service. Wondering if we can dirctly include that file in initramfs when generating image. But I am fuzzy on details of dracut implementation. It has been too long since I played with it. So Bao and kdump team will be best to comment on this.
This is quite interesting - monitoring should in fact never be started wthin 'ramdisk' so I'm acutlly wondering what is this service file doing there.
Design was to start 'monitoring' of devices just after switch to 'rootfs' - since running 'dmeventd' out of ramdisk does not make any sense at all.
(Adding also Dave since he has been able to push some patches to 'dracut' recently')
Regards
Zdenek
On Fri, May 27, 2022 at 04:42:25PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 14:20 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote:
If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for monitor and autoextend the size of thin pool. Otherwise the vmcore dumped to a no-enough-space target will be incomplete and unable for further analysis.
In this patch, lvm2-monitor.service will be started before kdump-capture .service for 2nd kernel, then be stopped in kdump post.d phase. So the thin pool monitoring and size-autoextend can be ensured during kdump.
Signed-off-by: Tao Liu ltao@redhat.com
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kexec-tools.spec | 2 ++ 3 files changed, 33 insertions(+) create mode 100644 dracut-lvm2-monitor.service
diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service
This seems to be a copy of /lib/systemd/system/lvm2-monitor.service. Wondering if we can dirctly include that file in initramfs when generating image. But I am fuzzy on details of dracut implementation. It has been too long since I played with it. So Bao and kdump team will be best to comment on this.
This is quite interesting - monitoring should in fact never be started wthin 'ramdisk' so I'm acutlly wondering what is this service file doing there.
Design was to start 'monitoring' of devices just after switch to 'rootfs' - since running 'dmeventd' out of ramdisk does not make any sense at all.
Hi Zdenek,
In case of kdump, we save core dump from initramfs context and reboot back into primary kernel. And that's why this need of dm monitoring ( and thin pool auto extension) working from inside the initramfs context.
Thanks Vivek
(Adding also Dave since he has been able to push some patches to 'dracut' recently')
Regards
Zdenek
Dne 27. 05. 22 v 16:50 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 04:42:25PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 14:20 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote:
If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for monitor and autoextend the size of thin pool. Otherwise the vmcore dumped to a no-enough-space target will be incomplete and unable for further analysis.
In this patch, lvm2-monitor.service will be started before kdump-capture .service for 2nd kernel, then be stopped in kdump post.d phase. So the thin pool monitoring and size-autoextend can be ensured during kdump.
Signed-off-by: Tao Liu ltao@redhat.com
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kexec-tools.spec | 2 ++ 3 files changed, 33 insertions(+) create mode 100644 dracut-lvm2-monitor.service
diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service
This seems to be a copy of /lib/systemd/system/lvm2-monitor.service. Wondering if we can dirctly include that file in initramfs when generating image. But I am fuzzy on details of dracut implementation. It has been too long since I played with it. So Bao and kdump team will be best to comment on this.
This is quite interesting - monitoring should in fact never be started wthin 'ramdisk' so I'm acutlly wondering what is this service file doing there.
Design was to start 'monitoring' of devices just after switch to 'rootfs' - since running 'dmeventd' out of ramdisk does not make any sense at all.
Hi Zdenek,
In case of kdump, we save core dump from initramfs context and reboot back into primary kernel. And that's why this need of dm monitoring ( and thin pool auto extension) working from inside the initramfs context.
So IMHO this although does not look like the best approach. AFAIK the lvm.conf within ramdisk is also a modified version.
It looks like there should be a better alternative - like 'after' activation checking there is 'enough' room in thin-pool for use with thinLV - should be 'computable' and in case the size is not good enough - try to extend thin-pool prior use/mount of thinLV (size of space in thin-pool %DATA & %METATDATA and occupancy of %DATA thinLV could be obtained by 'lvs' tool)
Running very resource hungry dmeventd (looks all the process memory in RAM - could be many many MB) in kdump environment is not IMHO worst option here - I'd prefer to avoid execution of dmeventd in this ramfs image.
Regards
Zdenek
On Fri, May 27, 2022 at 04:59:38PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 16:50 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 04:42:25PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 14:20 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote:
If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for monitor and autoextend the size of thin pool. Otherwise the vmcore dumped to a no-enough-space target will be incomplete and unable for further analysis.
In this patch, lvm2-monitor.service will be started before kdump-capture .service for 2nd kernel, then be stopped in kdump post.d phase. So the thin pool monitoring and size-autoextend can be ensured during kdump.
Signed-off-by: Tao Liu ltao@redhat.com
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kexec-tools.spec | 2 ++ 3 files changed, 33 insertions(+) create mode 100644 dracut-lvm2-monitor.service
diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service
This seems to be a copy of /lib/systemd/system/lvm2-monitor.service. Wondering if we can dirctly include that file in initramfs when generating image. But I am fuzzy on details of dracut implementation. It has been too long since I played with it. So Bao and kdump team will be best to comment on this.
This is quite interesting - monitoring should in fact never be started wthin 'ramdisk' so I'm acutlly wondering what is this service file doing there.
Design was to start 'monitoring' of devices just after switch to 'rootfs' - since running 'dmeventd' out of ramdisk does not make any sense at all.
Hi Zdenek,
In case of kdump, we save core dump from initramfs context and reboot back into primary kernel. And that's why this need of dm monitoring ( and thin pool auto extension) working from inside the initramfs context.
So IMHO this although does not look like the best approach. AFAIK the lvm.conf within ramdisk is also a modified version.
It looks like there should be a better alternative - like 'after' activation checking there is 'enough' room in thin-pool for use with thinLV - should be 'computable' and in case the size is not good enough - try to extend thin-pool prior use/mount of thinLV (size of space in thin-pool %DATA & %METATDATA and occupancy of %DATA thinLV could be obtained by 'lvs' tool)
One potential problem here is that we don't know what's the size of vmcore in advance. It gets filtered and saved and we dont know in advance, how many kernel pages will be there.
Is that still right, Bao?
Technically speaking, one could first run makedumpfile to just determine what will be size of vmcore and then actually save vmcore in second round. But that will double the filtering time.
Running very resource hungry dmeventd (looks all the process memory in RAM
- could be many many MB) in kdump environment is not IMHO worst option
here - I'd prefer to avoid execution of dmeventd in this ramfs image.
I understand. We also want to keep the size of kdump initramfs to the minimum.
Thanks Vivek
Dne 27. 05. 22 v 17:39 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 04:59:38PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 16:50 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 04:42:25PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 14:20 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote:
If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for monitor and autoextend the size of thin pool. Otherwise the vmcore dumped to a no-enough-space target will be incomplete and unable for further analysis.
In this patch, lvm2-monitor.service will be started before kdump-capture .service for 2nd kernel, then be stopped in kdump post.d phase. So the thin pool monitoring and size-autoextend can be ensured during kdump.
Signed-off-by: Tao Liu ltao@redhat.com
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kexec-tools.spec | 2 ++ 3 files changed, 33 insertions(+) create mode 100644 dracut-lvm2-monitor.service
diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service
This seems to be a copy of /lib/systemd/system/lvm2-monitor.service. Wondering if we can dirctly include that file in initramfs when generating image. But I am fuzzy on details of dracut implementation. It has been too long since I played with it. So Bao and kdump team will be best to comment on this.
This is quite interesting - monitoring should in fact never be started wthin 'ramdisk' so I'm acutlly wondering what is this service file doing there.
Design was to start 'monitoring' of devices just after switch to 'rootfs' - since running 'dmeventd' out of ramdisk does not make any sense at all.
Hi Zdenek,
In case of kdump, we save core dump from initramfs context and reboot back into primary kernel. And that's why this need of dm monitoring ( and thin pool auto extension) working from inside the initramfs context.
So IMHO this although does not look like the best approach. AFAIK the lvm.conf within ramdisk is also a modified version.
It looks like there should be a better alternative - like 'after' activation checking there is 'enough' room in thin-pool for use with thinLV - should be 'computable' and in case the size is not good enough - try to extend thin-pool prior use/mount of thinLV (size of space in thin-pool %DATA & %METATDATA and occupancy of %DATA thinLV could be obtained by 'lvs' tool)
One potential problem here is that we don't know what's the size of vmcore in advance. It gets filtered and saved and we dont know in advance, how many kernel pages will be there.
Is that still right, Bao?
Technically speaking, one could first run makedumpfile to just determine what will be size of vmcore and then actually save vmcore in second round. But that will double the filtering time.
You could likely 'stream/buffer' these kdump data in form of i.e. '4MiB ~ 128MiB' chunks (or any other suitable size which will be 'quick enough) and before each new write of such chunk just compare there is enough free space in thin-pool with lvs - should be still better then running 'dmeventd' in the background - and gives you also the best control over the deadlock in case you run completely out-of-space (i.e. leaving enough room in thin-pool and avoiding full dump so user could still 'boot')
Since you will be only a single user of thinLV in initramfs - this should be reasonable straigforward to achieve.
Regards
Zdenek
On Fri, May 27, 2022 at 06:05:27PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 17:39 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 04:59:38PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 16:50 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 04:42:25PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 14:20 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote: > If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for > monitor and autoextend the size of thin pool. Otherwise the vmcore > dumped to a no-enough-space target will be incomplete and unable for > further analysis. > > In this patch, lvm2-monitor.service will be started before kdump-capture > .service for 2nd kernel, then be stopped in kdump post.d phase. So > the thin pool monitoring and size-autoextend can be ensured during kdump. > > Signed-off-by: Tao Liu ltao@redhat.com > --- > dracut-lvm2-monitor.service | 15 +++++++++++++++ > dracut-module-setup.sh | 16 ++++++++++++++++ > kexec-tools.spec | 2 ++ > 3 files changed, 33 insertions(+) > create mode 100644 dracut-lvm2-monitor.service > > diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service This seems to be a copy of /lib/systemd/system/lvm2-monitor.service. Wondering if we can dirctly include that file in initramfs when generating image. But I am fuzzy on details of dracut implementation. It has been too long since I played with it. So Bao and kdump team will be best to comment on this.
This is quite interesting - monitoring should in fact never be started wthin 'ramdisk' so I'm acutlly wondering what is this service file doing there.
Design was to start 'monitoring' of devices just after switch to 'rootfs' - since running 'dmeventd' out of ramdisk does not make any sense at all.
Hi Zdenek,
In case of kdump, we save core dump from initramfs context and reboot back into primary kernel. And that's why this need of dm monitoring ( and thin pool auto extension) working from inside the initramfs context.
So IMHO this although does not look like the best approach. AFAIK the lvm.conf within ramdisk is also a modified version.
It looks like there should be a better alternative - like 'after' activation checking there is 'enough' room in thin-pool for use with thinLV - should be 'computable' and in case the size is not good enough - try to extend thin-pool prior use/mount of thinLV (size of space in thin-pool %DATA & %METATDATA and occupancy of %DATA thinLV could be obtained by 'lvs' tool)
One potential problem here is that we don't know what's the size of vmcore in advance. It gets filtered and saved and we dont know in advance, how many kernel pages will be there.
Is that still right, Bao?
Technically speaking, one could first run makedumpfile to just determine what will be size of vmcore and then actually save vmcore in second round. But that will double the filtering time.
You could likely 'stream/buffer' these kdump data in form of i.e. '4MiB ~ 128MiB' chunks (or any other suitable size which will be 'quick enough) and before each new write of such chunk just compare there is enough free space in thin-pool with lvs - should be still better then running 'dmeventd' in the background -
and gives you also the best control over the deadlock in case you run completely out-of-space (i.e. leaving enough room in thin-pool and avoiding full dump so user could still 'boot')
So if we fill up thin pool completely, it might fail to activate over reboot? I do remember there were issues w.r.t filling up thin pool compltely and it was not desired.
So above does not involve growing thin pool at all? Above just says, query currently available space in thin pool and when it is about to be full, stop writing to it? This is suboptimal if there is free space in underlying volume group.
Ok, this is going to be ugly given how kdump works right now. We have this config option core_collector where user can specify how vmcore should be saved (dd, cp, makedumpfile, .....)
None of these tools know about streaming and thin pool extension etc.
I guess one could think of making maekdumpfile aware of thin pool. But given there can be so many dump targets, it will be really ugly from design point of view. Embedding knowledge of a target in a generic filtering tool.
Alternatively we could probably write a tool of our own and pipe makedumpfile output to it. But then user will have to specify it in core_collector for thin pool targets only.
None of the solutions look clean or fit well into the current design.
Thanks Vivek
Since you will be only a single user of thinLV in initramfs - this should be reasonable straigforward to achieve.
Regards
Zdenek
On Fri, May 27, 2022 at 01:05:57PM -0400, Vivek Goyal wrote:
So if we fill up thin pool completely, it might fail to activate over reboot? I do remember there were issues w.r.t filling up thin pool compltely and it was not desired.
So above does not involve growing thin pool at all? Above just says, query currently available space in thin pool and when it is about to be full, stop writing to it? This is suboptimal if there is free space in underlying volume group.
Ok, this is going to be ugly given how kdump works right now. We have this config option core_collector where user can specify how vmcore should be saved (dd, cp, makedumpfile, .....)
None of these tools know about streaming and thin pool extension etc.
I guess one could think of making maekdumpfile aware of thin pool. But given there can be so many dump targets, it will be really ugly from design point of view. Embedding knowledge of a target in a generic filtering tool.
Alternatively we could probably write a tool of our own and pipe makedumpfile output to it. But then user will have to specify it in core_collector for thin pool targets only.
None of the solutions look clean or fit well into the current design.
Maybe I'm not following, but all this sounds unnecessarily complicated. Roughly estimate largest possible kdump size (X MB). Check that the thin pool has X MB free. If not, lvextend -L+XMB the thin pool. If lvextend doesn't find X MB in the vg, then quit without kdump.
On Fri, May 27, 2022 at 12:16:38PM -0500, David Teigland wrote:
On Fri, May 27, 2022 at 01:05:57PM -0400, Vivek Goyal wrote:
So if we fill up thin pool completely, it might fail to activate over reboot? I do remember there were issues w.r.t filling up thin pool compltely and it was not desired.
So above does not involve growing thin pool at all? Above just says, query currently available space in thin pool and when it is about to be full, stop writing to it? This is suboptimal if there is free space in underlying volume group.
Ok, this is going to be ugly given how kdump works right now. We have this config option core_collector where user can specify how vmcore should be saved (dd, cp, makedumpfile, .....)
None of these tools know about streaming and thin pool extension etc.
I guess one could think of making maekdumpfile aware of thin pool. But given there can be so many dump targets, it will be really ugly from design point of view. Embedding knowledge of a target in a generic filtering tool.
Alternatively we could probably write a tool of our own and pipe makedumpfile output to it. But then user will have to specify it in core_collector for thin pool targets only.
None of the solutions look clean or fit well into the current design.
Maybe I'm not following, but all this sounds unnecessarily complicated. Roughly estimate largest possible kdump size (X MB). Check that the thin pool has X MB free. If not, lvextend -L+XMB the thin pool. If lvextend doesn't find X MB in the vg, then quit without kdump.
Estimation is hard. We could just look at raw (unfiltered /proc/vmcore size) and extend it. But problem there is we also support kdump on multi terabyte machines. And after filtering final vmcore could be just few GB. So extending thin pool to say 12TB might very well fail and we fail to save dump.
May be use above trick for dd and cp core_collectors as they will not filter anything.
And for makedumpfile, run it twice. First run only gives size estimate and second run actually saves the dump. And do this only for thin volumes targets. This will almost double dump saving time.
So ideally it will be nice if we can enable automatic thin pool extension from initramfs.
Vivek
Dne 27. 05. 22 v 19:26 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 12:16:38PM -0500, David Teigland wrote:
On Fri, May 27, 2022 at 01:05:57PM -0400, Vivek Goyal wrote:
So if we fill up thin pool completely, it might fail to activate over reboot? I do remember there were issues w.r.t filling up thin pool compltely and it was not desired.
So above does not involve growing thin pool at all? Above just says, query currently available space in thin pool and when it is about to be full, stop writing to it? This is suboptimal if there is free space in underlying volume group.
Ok, this is going to be ugly given how kdump works right now. We have this config option core_collector where user can specify how vmcore should be saved (dd, cp, makedumpfile, .....)
None of these tools know about streaming and thin pool extension etc.
I guess one could think of making maekdumpfile aware of thin pool. But given there can be so many dump targets, it will be really ugly from design point of view. Embedding knowledge of a target in a generic filtering tool.
Alternatively we could probably write a tool of our own and pipe makedumpfile output to it. But then user will have to specify it in core_collector for thin pool targets only.
None of the solutions look clean or fit well into the current design.
Maybe I'm not following, but all this sounds unnecessarily complicated. Roughly estimate largest possible kdump size (X MB). Check that the thin pool has X MB free. If not, lvextend -L+XMB the thin pool. If lvextend doesn't find X MB in the vg, then quit without kdump.
Estimation is hard. We could just look at raw (unfiltered /proc/vmcore size) and extend it. But problem there is we also support kdump on multi terabyte machines. And after filtering final vmcore could be just few GB. So extending thin pool to say 12TB might very well fail and we fail to save dump.
May be use above trick for dd and cp core_collectors as they will not filter anything.
And for makedumpfile, run it twice. First run only gives size estimate and second run actually saves the dump. And do this only for thin volumes targets. This will almost double dump saving time.
So ideally it will be nice if we can enable automatic thin pool extension from initramfs.
For kdump environment - this is certainly not ideal - is this itself requires lot of RAM - buffered processing should be doable even in plain bash - if you can pipe 'dd' to it.
As mentioned previously - it would be also good to make sure thin-pool leaves some 'configured' free space - so i.e. those multiTIB do not overfill thin-pool and make possibly system hard to use after such captured kdump (although one could imagine) just to 'drop' kdump if thin-pool runs over some threshold) to keep things simple.
(So i.e. if kdump fills thinpool over >99% - drop it - so use could use thin-pool after reboot - better to have usable system in this case I'd say)
Regards
Zdenek
Dne 27. 05. 22 v 19:40 Zdenek Kabelac napsal(a):
Dne 27. 05. 22 v 19:26 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 12:16:38PM -0500, David Teigland wrote:
On Fri, May 27, 2022 at 01:05:57PM -0400, Vivek Goyal wrote:
So if we fill up thin pool completely, it might fail to activate over reboot? I do remember there were issues w.r.t filling up thin pool compltely and it was not desired.
So above does not involve growing thin pool at all? Above just says, query currently available space in thin pool and when it is about to be full, stop writing to it? This is suboptimal if there is free space in underlying volume group.
Ok, this is going to be ugly given how kdump works right now. We have this config option core_collector where user can specify how vmcore should be saved (dd, cp, makedumpfile, .....)
None of these tools know about streaming and thin pool extension etc.
I guess one could think of making maekdumpfile aware of thin pool. But given there can be so many dump targets, it will be really ugly from design point of view. Embedding knowledge of a target in a generic filtering tool.
Alternatively we could probably write a tool of our own and pipe makedumpfile output to it. But then user will have to specify it in core_collector for thin pool targets only.
None of the solutions look clean or fit well into the current design.
Maybe I'm not following, but all this sounds unnecessarily complicated. Roughly estimate largest possible kdump size (X MB). Check that the thin pool has X MB free. If not, lvextend -L+XMB the thin pool. If lvextend doesn't find X MB in the vg, then quit without kdump.
Estimation is hard. We could just look at raw (unfiltered /proc/vmcore size) and extend it. But problem there is we also support kdump on multi terabyte machines. And after filtering final vmcore could be just few GB. So extending thin pool to say 12TB might very well fail and we fail to save dump.
May be use above trick for dd and cp core_collectors as they will not filter anything.
And for makedumpfile, run it twice. First run only gives size estimate and second run actually saves the dump. And do this only for thin volumes targets. This will almost double dump saving time.
So ideally it will be nice if we can enable automatic thin pool extension from initramfs.
For kdump environment - this is certainly not ideal - is this itself requires lot of RAM - buffered processing should be doable even in plain bash - if you can pipe 'dd' to it.
As mentioned previously - it would be also good to make sure thin-pool leaves some 'configured' free space - so i.e. those multiTIB do not overfill thin-pool and make possibly system hard to use after such captured kdump (although one could imagine) just to 'drop' kdump if thin-pool runs over some threshold) to keep things simple.
(So i.e. if kdump fills thinpool over >99% - drop it - so use could use thin-pool after reboot - better to have usable system in this case I'd say)
Actually - since you are in 'ramdisk' and you are the 'only' user - there is possibly even more simple way -
you could write your 'very own' autoextending 'shell' - something in naive approximation:
while kdumping do
lvextend --use-policies
if no_more_free_space then kill dumping process (use 'dmsetup --force remove thinLV - replaces with error target) & lvremove thinLV ; happy_reboot_without_kdump
sleep 5
done
On 05/27/22 at 08:20am, Vivek Goyal wrote:
On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote:
If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for monitor and autoextend the size of thin pool. Otherwise the vmcore dumped to a no-enough-space target will be incomplete and unable for further analysis.
In this patch, lvm2-monitor.service will be started before kdump-capture .service for 2nd kernel, then be stopped in kdump post.d phase. So the thin pool monitoring and size-autoextend can be ensured during kdump.
Signed-off-by: Tao Liu ltao@redhat.com
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kexec-tools.spec | 2 ++ 3 files changed, 33 insertions(+) create mode 100644 dracut-lvm2-monitor.service
diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service
This seems to be a copy of /lib/systemd/system/lvm2-monitor.service. Wondering if we can dirctly include that file in initramfs when generating image. But I am fuzzy on details of dracut implementation. It has been too long since I played with it. So Bao and kdump team will be best to comment on this.
Thanks for looking into this series, Vivek. Seems I don't set the filter of this ML correctly, so this mail didn't go into my inbox, sorry for the late reponse.
This service file is different than /lib/systemd/system/lvm2-monitor.service, its 'Before' and 'After' is customized for kdump. So including the lvm2 service should not work as expected.
Vivek
new file mode 100644 index 0000000..88e79e1 --- /dev/null +++ b/dracut-lvm2-monitor.service @@ -0,0 +1,15 @@ +[Unit] +Description=Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling +Documentation=man:dmeventd(8) man:lvcreate(8) man:lvchange(8) man:vgchange(8) +After=initrd.target initrd-parse-etc.service sysroot.mount +After=dracut-initqueue.service dracut-pre-mount.service dracut-mount.service dracut-pre-pivot.service +Before=initrd-cleanup.service kdump-capture.service shutdown.target local-fs-pre.target +DefaultDependencies=no +Conflicts=shutdown.target
+[Service] +Type=oneshot +Environment=LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES=1 +ExecStart=/usr/sbin/lvm vgchange --monitor y +ExecStop=/usr/sbin/lvm vgchange --monitor n +RemainAfterExit=yes \ No newline at end of file diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index c319fc2..19c0f46 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -1016,6 +1016,20 @@ remove_cpu_online_rule() { sed -i '/SUBSYSTEM=="cpu"/d' "$file" }
+kdump_install_lvm2_monitor_service() +{
- inst "$moddir/lvm2-monitor.service" "$systemdsystemunitdir/lvm2-monitor.service"
- systemctl -q --root "$initdir" add-wants initrd.target lvm2-monitor.service
- # We should stop lvm2-monitor service after kdump. SIGTERM is ignored
- # by dmeventd when device is monitored. So before stopping dmevend, devices
- # shall be unmonitored. This can save the waiting time between systemd-shutdown
- # Sending SIGTERM and SIGKILL to remaining processes.
- mkdir -p "${initdir}/etc/kdump/post.d"
- echo "systemctl stop lvm2-monitor" > "${initdir}/etc/kdump/post.d/stop-lvm2-monitor.sh"
- chmod +x "${initdir}/etc/kdump/post.d/stop-lvm2-monitor.sh"
+}
install() { local arch
@@ -1058,6 +1072,8 @@ install() { inst "$moddir/kdump.sh" "/usr/bin/kdump.sh" inst "$moddir/kdump-capture.service" "$systemdsystemunitdir/kdump-capture.service" systemctl -q --root "$initdir" add-wants initrd.target kdump-capture.service
- is_lvm2_thinp_dump_target &&
# Replace existing emergency service and emergency target cp "$moddir/kdump-emergency.service" "$initdir/$systemdsystemunitdir/emergency.service" cp "$moddir/kdump-emergency.target" "$initdir/$systemdsystemunitdir/emergency.target"kdump_install_lvm2_monitor_service
diff --git a/kexec-tools.spec b/kexec-tools.spec index 6673000..5f4344d 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -60,6 +60,7 @@ Source109: dracut-early-kdump-module-setup.sh
Source200: dracut-fadump-init-fadump.sh Source201: dracut-fadump-module-setup.sh +Source202: dracut-lvm2-monitor.service
%ifarch ppc64 ppc64le Requires(post): servicelog @@ -240,6 +241,7 @@ cp %{SOURCE102} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpb cp %{SOURCE104} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE104}} cp %{SOURCE106} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE106}} cp %{SOURCE107} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE107}} +cp %{SOURCE202} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE202}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}} mkdir -p -m755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99earlykdump -- 2.33.1
On 05/27/22 at 11:39am, Vivek Goyal wrote:
On Fri, May 27, 2022 at 04:59:38PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 16:50 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 04:42:25PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 14:20 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote:
If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for monitor and autoextend the size of thin pool. Otherwise the vmcore dumped to a no-enough-space target will be incomplete and unable for further analysis.
In this patch, lvm2-monitor.service will be started before kdump-capture .service for 2nd kernel, then be stopped in kdump post.d phase. So the thin pool monitoring and size-autoextend can be ensured during kdump.
Signed-off-by: Tao Liu ltao@redhat.com
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kexec-tools.spec | 2 ++ 3 files changed, 33 insertions(+) create mode 100644 dracut-lvm2-monitor.service
diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service
This seems to be a copy of /lib/systemd/system/lvm2-monitor.service. Wondering if we can dirctly include that file in initramfs when generating image. But I am fuzzy on details of dracut implementation. It has been too long since I played with it. So Bao and kdump team will be best to comment on this.
This is quite interesting - monitoring should in fact never be started wthin 'ramdisk' so I'm acutlly wondering what is this service file doing there.
Design was to start 'monitoring' of devices just after switch to 'rootfs' - since running 'dmeventd' out of ramdisk does not make any sense at all.
Hi Zdenek,
In case of kdump, we save core dump from initramfs context and reboot back into primary kernel. And that's why this need of dm monitoring ( and thin pool auto extension) working from inside the initramfs context.
So IMHO this although does not look like the best approach. AFAIK the lvm.conf within ramdisk is also a modified version.
It looks like there should be a better alternative - like 'after' activation checking there is 'enough' room in thin-pool for use with thinLV - should be 'computable' and in case the size is not good enough - try to extend thin-pool prior use/mount of thinLV (size of space in thin-pool %DATA & %METATDATA and occupancy of %DATA thinLV could be obtained by 'lvs' tool)
One potential problem here is that we don't know what's the size of vmcore in advance. It gets filtered and saved and we dont know in advance, how many kernel pages will be there.
Is that still right, Bao?
Yes, it's still right.
We have features in makedumpfile to estimate the expected disk space for vmcore dumping. E.g System RAM is 2TB, makedumpfile running tells 256GB disk space is needed for storing vmcore, by filtering out zero pages, unused pages, etc. However, that estimation is done in 1st kernel, and the running kernel could dynamically allocate pages. So the estimation can only give very rough data, in magnitude level. E.g you have 1TB memory, while the disk space is only 200GB, that's obviously not enough.
Technically speaking, one could first run makedumpfile to just determine what will be size of vmcore and then actually save vmcore in second round. But that will double the filtering time.
Yeah. Besides, memory content of system is changing dynamically all the time. E.g your oracle DB is running or not running, the user space data is defintely not the same. And two times of work need involve people's manual work, automation is still expected if can be made.
Running very resource hungry dmeventd (looks all the process memory in RAM
- could be many many MB) in kdump environment is not IMHO worst option
here - I'd prefer to avoid execution of dmeventd in this ramfs image.
I understand. We also want to keep the size of kdump initramfs to the minimum.
Right.
I talked to Tao, he tested on kvm guest with 500M memory and 100M disk space to trigger the insufficient disk space usage. Tao said the dmeventd will cosume about 40MB when executing. I am not familiar with dmeventd, if its running will cost about constant 40M memory, no matter how much disk space need be extended at one time, we can adjust our kdump script to increase the default crashkernel= value if lvm2 thinp is detected. It looks acceptable in kdump side.
On 05/27/22 at 01:05pm, Vivek Goyal wrote:
On Fri, May 27, 2022 at 06:05:27PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 17:39 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 04:59:38PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 16:50 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 04:42:25PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 14:20 Vivek Goyal napsal(a): > On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote: > > If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for > > monitor and autoextend the size of thin pool. Otherwise the vmcore > > dumped to a no-enough-space target will be incomplete and unable for > > further analysis. > > > > In this patch, lvm2-monitor.service will be started before kdump-capture > > .service for 2nd kernel, then be stopped in kdump post.d phase. So > > the thin pool monitoring and size-autoextend can be ensured during kdump. > > > > Signed-off-by: Tao Liu ltao@redhat.com > > --- > > dracut-lvm2-monitor.service | 15 +++++++++++++++ > > dracut-module-setup.sh | 16 ++++++++++++++++ > > kexec-tools.spec | 2 ++ > > 3 files changed, 33 insertions(+) > > create mode 100644 dracut-lvm2-monitor.service > > > > diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service > This seems to be a copy of /lib/systemd/system/lvm2-monitor.service. > Wondering if we can dirctly include that file in initramfs when generating > image. But I am fuzzy on details of dracut implementation. It has been > too long since I played with it. So Bao and kdump team will be best > to comment on this. > This is quite interesting - monitoring should in fact never be started wthin 'ramdisk' so I'm acutlly wondering what is this service file doing there.
Design was to start 'monitoring' of devices just after switch to 'rootfs' - since running 'dmeventd' out of ramdisk does not make any sense at all.
Hi Zdenek,
In case of kdump, we save core dump from initramfs context and reboot back into primary kernel. And that's why this need of dm monitoring ( and thin pool auto extension) working from inside the initramfs context.
So IMHO this although does not look like the best approach. AFAIK the lvm.conf within ramdisk is also a modified version.
It looks like there should be a better alternative - like 'after' activation checking there is 'enough' room in thin-pool for use with thinLV - should be 'computable' and in case the size is not good enough - try to extend thin-pool prior use/mount of thinLV (size of space in thin-pool %DATA & %METATDATA and occupancy of %DATA thinLV could be obtained by 'lvs' tool)
One potential problem here is that we don't know what's the size of vmcore in advance. It gets filtered and saved and we dont know in advance, how many kernel pages will be there.
Is that still right, Bao?
Technically speaking, one could first run makedumpfile to just determine what will be size of vmcore and then actually save vmcore in second round. But that will double the filtering time.
You could likely 'stream/buffer' these kdump data in form of i.e. '4MiB ~ 128MiB' chunks (or any other suitable size which will be 'quick enough) and before each new write of such chunk just compare there is enough free space in thin-pool with lvs - should be still better then running 'dmeventd' in the background -
and gives you also the best control over the deadlock in case you run completely out-of-space (i.e. leaving enough room in thin-pool and avoiding full dump so user could still 'boot')
So if we fill up thin pool completely, it might fail to activate over reboot? I do remember there were issues w.r.t filling up thin pool compltely and it was not desired.
So above does not involve growing thin pool at all? Above just says, query currently available space in thin pool and when it is about to be full, stop writing to it? This is suboptimal if there is free space in underlying volume group.
Ok, this is going to be ugly given how kdump works right now. We have this config option core_collector where user can specify how vmcore should be saved (dd, cp, makedumpfile, .....)
None of these tools know about streaming and thin pool extension etc.
Totally agree. makedumpfile, cp , dd, they all are unware of streaming and thinp, unless we adapt them to let them know this.
I guess one could think of making maekdumpfile aware of thin pool. But given there can be so many dump targets, it will be really ugly from design point of view. Embedding knowledge of a target in a generic filtering tool.
Alternatively we could probably write a tool of our own and pipe makedumpfile output to it. But then user will have to specify it in core_collector for thin pool targets only.
None of the solutions look clean or fit well into the current design.
Yeah, the only thing we need to make clear is how much memory is extraly needed for dmeventd? Is the memory consumption of dmeventd related to disk space size extended, or just stand alone?
Otherwise, the implementation of this series looks reasonable.
Hi Vivek,
On Fri, May 27, 2022 at 8:56 PM Vivek Goyal vgoyal@redhat.com wrote:
On Fri, May 27, 2022 at 02:45:12PM +0800, Tao Liu wrote:
Thin provision is a mechanism that you can allocate a lvm volume which has a large virtual size for file systems but actually in a small physical size. The physical size can be autoextended in use if thin pool reached a threshold specified in /etc/lvm/lvm.conf.
So what's the core requirement? Is it about that root filesystem of the machine is on a thin device and after crash we are trying to save dump to rootfs.
Typically rootfs might have been setup for auto-extension. But problem here is that you are dumping the core from initrafsm context and auto extension might not work out of the box? So you are basically trying to pack dmeventd systemd unit file into initramfs and /etc/lvm/lvm.conf in initramfs to make auto extension work.
This patch set enables the feature that kdump to a lvm2 thinp volume. If the target is thinp volume, then the case of thinp pool autoextension should be considered. For thinp volume, it reports to have 1G space for its file system within, but it only allocated 100M physical space from thinp pool for example. As data is written in, more physical space is needed by thinp volume, thus thinp pool must be able to autoextend to provide more physical space to thinp volume. The autoextend threshold is defined in lvm.conf, and it is achieved by a monitor service defined in a dmeventd systemd unit file.
So lvm.conf and the systemd unit file should be packed into the initramfs image, for the use of lvm thinp during kdump. The requirement above is generic, it doesn't matter if the rootfs we are dumping to is on a thinp volume.
Thanks, Tao Liu
Vivek
There are 2 works should be handled when enable lvm2 thinp for kdump:
- Check if the dump target device or directory is thinp device.
- Monitor the thin pool and autoextend its size when it reached the threshold during kdump.
According to my testing, the memory consumption procedure for lvm2 thinp is the thin pool size-autoextend phase. For fedora and rhel9, the default crashkernel value is enough. But for rhel8, the default crashkernel value 1G-4G:160M is not enough, so it should be handled particularly.
v1 -> v2:
- Modified the usage of lvs cmd when check if target is lvm2 thinp device.
- Removed the sync flag way of mounting for lvm2 thinp target during kdump, use "sync -f vmcore" to force sync data, and handle the error if fails.
v2 -> v3:
- Removed "sync -f vmcore" patch out of the patch set, for it is
addressing an issue which is not specifically to lvm2 thinp support for kdump.
Tao Liu (3): Add lvm2 thin provision dump target checker Add lvm2-monitor.service for kdump when lvm2 thinp enabled lvm.conf should be check modified if lvm2 thinp enabled
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kdump-lib-initramfs.sh | 20 ++++++++++++++++++++ kdumpctl | 1 + kexec-tools.spec | 2 ++ 5 files changed, 54 insertions(+) create mode 100644 dracut-lvm2-monitor.service
-- 2.33.1
Hi Tao,
On Mon, 30 May 2022 at 11:31, Tao Liu ltao@redhat.com wrote:
Hi Vivek,
On Fri, May 27, 2022 at 8:56 PM Vivek Goyal vgoyal@redhat.com wrote:
On Fri, May 27, 2022 at 02:45:12PM +0800, Tao Liu wrote:
Thin provision is a mechanism that you can allocate a lvm volume which has a large virtual size for file systems but actually in a small physical size. The physical size can be autoextended in use if thin pool reached a threshold specified in /etc/lvm/lvm.conf.
So what's the core requirement? Is it about that root filesystem of the machine is on a thin device and after crash we are trying to save dump to rootfs.
Typically rootfs might have been setup for auto-extension. But problem here is that you are dumping the core from initrafsm context and auto extension might not work out of the box? So you are basically trying to pack dmeventd systemd unit file into initramfs and /etc/lvm/lvm.conf in initramfs to make auto extension work.
This patch set enables the feature that kdump to a lvm2 thinp volume. If the target is thinp volume, then the case of thinp pool autoextension should be considered. For thinp volume, it reports to have 1G space for its file system within, but it only allocated 100M physical space from thinp pool for example. As data is written in, more physical space is needed by thinp volume, thus thinp pool must be able to autoextend to provide more physical space to thinp volume. The autoextend threshold is defined in lvm.conf, and it is achieved by a monitor service defined in a dmeventd systemd unit file.
So lvm.conf and the systemd unit file should be packed into the initramfs image, for the use of lvm thinp during kdump. The requirement above is generic, it doesn't matter if the rootfs we are dumping to is on a thinp volume.
I think Vivek means about we should follow how dracut handle lvm in initramfs, Vivek, please correct me if this is not you want.
This patchset enable lvm thin in initramfs, I think the right place to handle this is the 90lvm dracut module or to add another 91lvmthin dracut module. And in dracut initramfs code, it just parse the kernel cmdline to setup lvm instead of adding the lvm config files, the lvm thin support could be done in similar way.
Thanks, Tao Liu
Vivek
There are 2 works should be handled when enable lvm2 thinp for kdump:
- Check if the dump target device or directory is thinp device.
- Monitor the thin pool and autoextend its size when it reached the threshold during kdump.
According to my testing, the memory consumption procedure for lvm2 thinp is the thin pool size-autoextend phase. For fedora and rhel9, the default crashkernel value is enough. But for rhel8, the default crashkernel value 1G-4G:160M is not enough, so it should be handled particularly.
v1 -> v2:
- Modified the usage of lvs cmd when check if target is lvm2 thinp device.
- Removed the sync flag way of mounting for lvm2 thinp target during kdump, use "sync -f vmcore" to force sync data, and handle the error if fails.
v2 -> v3:
- Removed "sync -f vmcore" patch out of the patch set, for it is
addressing an issue which is not specifically to lvm2 thinp support for kdump.
Tao Liu (3): Add lvm2 thin provision dump target checker Add lvm2-monitor.service for kdump when lvm2 thinp enabled lvm.conf should be check modified if lvm2 thinp enabled
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kdump-lib-initramfs.sh | 20 ++++++++++++++++++++ kdumpctl | 1 + kexec-tools.spec | 2 ++ 5 files changed, 54 insertions(+) create mode 100644 dracut-lvm2-monitor.service
-- 2.33.1
On Mon, May 30, 2022 at 12:57 PM Dave Young dyoung@redhat.com wrote:
Hi Tao,
On Mon, 30 May 2022 at 11:31, Tao Liu ltao@redhat.com wrote:
Hi Vivek,
On Fri, May 27, 2022 at 8:56 PM Vivek Goyal vgoyal@redhat.com wrote:
On Fri, May 27, 2022 at 02:45:12PM +0800, Tao Liu wrote:
Thin provision is a mechanism that you can allocate a lvm volume which has a large virtual size for file systems but actually in a small physical size. The physical size can be autoextended in use if thin pool reached a threshold specified in /etc/lvm/lvm.conf.
So what's the core requirement? Is it about that root filesystem of the machine is on a thin device and after crash we are trying to save dump to rootfs.
Typically rootfs might have been setup for auto-extension. But problem here is that you are dumping the core from initrafsm context and auto extension might not work out of the box? So you are basically trying to pack dmeventd systemd unit file into initramfs and /etc/lvm/lvm.conf in initramfs to make auto extension work.
This patch set enables the feature that kdump to a lvm2 thinp volume. If the target is thinp volume, then the case of thinp pool autoextension should be considered. For thinp volume, it reports to have 1G space for its file system within, but it only allocated 100M physical space from thinp pool for example. As data is written in, more physical space is needed by thinp volume, thus thinp pool must be able to autoextend to provide more physical space to thinp volume. The autoextend threshold is defined in lvm.conf, and it is achieved by a monitor service defined in a dmeventd systemd unit file.
So lvm.conf and the systemd unit file should be packed into the initramfs image, for the use of lvm thinp during kdump. The requirement above is generic, it doesn't matter if the rootfs we are dumping to is on a thinp volume.
I think Vivek means about we should follow how dracut handle lvm in initramfs, Vivek, please correct me if this is not you want.
This patchset enable lvm thin in initramfs, I think the right place to handle this is the 90lvm dracut module or to add another 91lvmthin dracut module. And in dracut initramfs code, it just parse the kernel cmdline to setup lvm instead of adding the lvm config files, the lvm thin support could be done in similar way.
Hi Dave,
Thanks for the info, I think solving the thinp support in dracut is a good solution. I will take an inspection to see if it is doable.
Thanks, Tao Liu
Thanks, Tao Liu
Vivek
There are 2 works should be handled when enable lvm2 thinp for kdump:
- Check if the dump target device or directory is thinp device.
- Monitor the thin pool and autoextend its size when it reached the threshold during kdump.
According to my testing, the memory consumption procedure for lvm2 thinp is the thin pool size-autoextend phase. For fedora and rhel9, the default crashkernel value is enough. But for rhel8, the default crashkernel value 1G-4G:160M is not enough, so it should be handled particularly.
v1 -> v2:
- Modified the usage of lvs cmd when check if target is lvm2 thinp device.
- Removed the sync flag way of mounting for lvm2 thinp target during kdump, use "sync -f vmcore" to force sync data, and handle the error if fails.
v2 -> v3:
- Removed "sync -f vmcore" patch out of the patch set, for it is
addressing an issue which is not specifically to lvm2 thinp support for kdump.
Tao Liu (3): Add lvm2 thin provision dump target checker Add lvm2-monitor.service for kdump when lvm2 thinp enabled lvm.conf should be check modified if lvm2 thinp enabled
dracut-lvm2-monitor.service | 15 +++++++++++++++ dracut-module-setup.sh | 16 ++++++++++++++++ kdump-lib-initramfs.sh | 20 ++++++++++++++++++++ kdumpctl | 1 + kexec-tools.spec | 2 ++ 5 files changed, 54 insertions(+) create mode 100644 dracut-lvm2-monitor.service
-- 2.33.1
Dne 30. 05. 22 v 4:34 Baoquan He napsal(a):
On 05/27/22 at 11:39am, Vivek Goyal wrote:
On Fri, May 27, 2022 at 04:59:38PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 16:50 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 04:42:25PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 14:20 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote: > If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for > monitor and autoextend the size of thin pool. Otherwise the vmcore > dumped to a no-enough-space target will be incomplete and unable for > further analysis. > > In this patch, lvm2-monitor.service will be started before kdump-capture > .service for 2nd kernel, then be stopped in kdump post.d phase. So > the thin pool monitoring and size-autoextend can be ensured during kdump. > > Signed-off-by: Tao Liu ltao@redhat.com > --- > dracut-lvm2-monitor.service | 15 +++++++++++++++ > dracut-module-setup.sh | 16 ++++++++++++++++ > kexec-tools.spec | 2 ++ > 3 files changed, 33 insertions(+) > create mode 100644 dracut-lvm2-monitor.service > > diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service This seems to be a copy of /lib/systemd/system/lvm2-monitor.service. Wondering if we can dirctly include that file in initramfs when generating image. But I am fuzzy on details of dracut implementation. It has been too long since I played with it. So Bao and kdump team will be best to comment on this.
This is quite interesting - monitoring should in fact never be started wthin 'ramdisk' so I'm acutlly wondering what is this service file doing there.
Design was to start 'monitoring' of devices just after switch to 'rootfs' - since running 'dmeventd' out of ramdisk does not make any sense at all.
Hi Zdenek,
In case of kdump, we save core dump from initramfs context and reboot back into primary kernel. And that's why this need of dm monitoring ( and thin pool auto extension) working from inside the initramfs context.
So IMHO this although does not look like the best approach. AFAIK the lvm.conf within ramdisk is also a modified version.
It looks like there should be a better alternative - like 'after' activation checking there is 'enough' room in thin-pool for use with thinLV - should be 'computable' and in case the size is not good enough - try to extend thin-pool prior use/mount of thinLV (size of space in thin-pool %DATA & %METATDATA and occupancy of %DATA thinLV could be obtained by 'lvs' tool)
One potential problem here is that we don't know what's the size of vmcore in advance. It gets filtered and saved and we dont know in advance, how many kernel pages will be there.
Is that still right, Bao?
Yes, it's still right.
We have features in makedumpfile to estimate the expected disk space for vmcore dumping. E.g System RAM is 2TB, makedumpfile running tells 256GB disk space is needed for storing vmcore, by filtering out zero pages, unused pages, etc. However, that estimation is done in 1st kernel, and the running kernel could dynamically allocate pages. So the estimation can only give very rough data, in magnitude level. E.g you have 1TB memory, while the disk space is only 200GB, that's obviously not enough.
Technically speaking, one could first run makedumpfile to just determine what will be size of vmcore and then actually save vmcore in second round. But that will double the filtering time.
Yeah. Besides, memory content of system is changing dynamically all the time. E.g your oracle DB is running or not running, the user space data is defintely not the same. And two times of work need involve people's manual work, automation is still expected if can be made.
Running very resource hungry dmeventd (looks all the process memory in RAM
- could be many many MB) in kdump environment is not IMHO worst option
here - I'd prefer to avoid execution of dmeventd in this ramfs image.
I understand. We also want to keep the size of kdump initramfs to the minimum.
Right.
I talked to Tao, he tested on kvm guest with 500M memory and 100M disk space to trigger the insufficient disk space usage. Tao said the dmeventd will cosume about 40MB when executing. I am not familiar with dmeventd, if its running will cost about constant 40M memory, no matter how much disk space need be extended at one time, we can adjust our kdump script to increase the default crashkernel= value if lvm2 thinp is detected. It looks acceptable in kdump side.
Dmeventd runs in 'mlockall()' mode - so the whole executable with all libraries and all the memory allocations are pinned in RAM (so IMHO 40MiB is way small number)
Reason for this is - in the normal 'running' mode lvm2 protects dmeventd from being blocked when it would run out of rootfs and it would suspend DM with rootfs on it - so by having the whole binary mlocked in RAM it cannot cause 'deadlock' waiting on itsefl when it suspends given DM device.
For kdump executional environment in ramdisk this is not really relevant condition (but dmeventd was not designed to be executed in such environment). However as mentioned in my previous post - it's actually more useful to run 'lvm lvextend --use-policies' with given thin-pool name in a plain shell parallel loop - as it basically gives same result with way less memory 'obstruction' and with far better control as well (i.e. leaving user a defined minimum to be sure it can actuall boot afterwards - so dumping only when there really is some space...)
Regards
Zdenek
On Mon, May 30, 2022 at 11:28:42AM +0200, Zdenek Kabelac wrote:
Dne 30. 05. 22 v 4:34 Baoquan He napsal(a):
On 05/27/22 at 11:39am, Vivek Goyal wrote:
On Fri, May 27, 2022 at 04:59:38PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 16:50 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 04:42:25PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 14:20 Vivek Goyal napsal(a): > On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote: > > If lvm2 thinp is enabled in kdump, lvm2-monitor.service is needed for > > monitor and autoextend the size of thin pool. Otherwise the vmcore > > dumped to a no-enough-space target will be incomplete and unable for > > further analysis. > > > > In this patch, lvm2-monitor.service will be started before kdump-capture > > .service for 2nd kernel, then be stopped in kdump post.d phase. So > > the thin pool monitoring and size-autoextend can be ensured during kdump. > > > > Signed-off-by: Tao Liu ltao@redhat.com > > --- > > dracut-lvm2-monitor.service | 15 +++++++++++++++ > > dracut-module-setup.sh | 16 ++++++++++++++++ > > kexec-tools.spec | 2 ++ > > 3 files changed, 33 insertions(+) > > create mode 100644 dracut-lvm2-monitor.service > > > > diff --git a/dracut-lvm2-monitor.service b/dracut-lvm2-monitor.service > This seems to be a copy of /lib/systemd/system/lvm2-monitor.service. > Wondering if we can dirctly include that file in initramfs when generating > image. But I am fuzzy on details of dracut implementation. It has been > too long since I played with it. So Bao and kdump team will be best > to comment on this. > This is quite interesting - monitoring should in fact never be started wthin 'ramdisk' so I'm acutlly wondering what is this service file doing there.
Design was to start 'monitoring' of devices just after switch to 'rootfs' - since running 'dmeventd' out of ramdisk does not make any sense at all.
Hi Zdenek,
In case of kdump, we save core dump from initramfs context and reboot back into primary kernel. And that's why this need of dm monitoring ( and thin pool auto extension) working from inside the initramfs context.
So IMHO this although does not look like the best approach. AFAIK the lvm.conf within ramdisk is also a modified version.
It looks like there should be a better alternative - like 'after' activation checking there is 'enough' room in thin-pool for use with thinLV - should be 'computable' and in case the size is not good enough - try to extend thin-pool prior use/mount of thinLV (size of space in thin-pool %DATA & %METATDATA and occupancy of %DATA thinLV could be obtained by 'lvs' tool)
One potential problem here is that we don't know what's the size of vmcore in advance. It gets filtered and saved and we dont know in advance, how many kernel pages will be there.
Is that still right, Bao?
Yes, it's still right.
We have features in makedumpfile to estimate the expected disk space for vmcore dumping. E.g System RAM is 2TB, makedumpfile running tells 256GB disk space is needed for storing vmcore, by filtering out zero pages, unused pages, etc. However, that estimation is done in 1st kernel, and the running kernel could dynamically allocate pages. So the estimation can only give very rough data, in magnitude level. E.g you have 1TB memory, while the disk space is only 200GB, that's obviously not enough.
Technically speaking, one could first run makedumpfile to just determine what will be size of vmcore and then actually save vmcore in second round. But that will double the filtering time.
Yeah. Besides, memory content of system is changing dynamically all the time. E.g your oracle DB is running or not running, the user space data is defintely not the same. And two times of work need involve people's manual work, automation is still expected if can be made.
Running very resource hungry dmeventd (looks all the process memory in RAM
- could be many many MB) in kdump environment is not IMHO worst option
here - I'd prefer to avoid execution of dmeventd in this ramfs image.
I understand. We also want to keep the size of kdump initramfs to the minimum.
Right.
I talked to Tao, he tested on kvm guest with 500M memory and 100M disk space to trigger the insufficient disk space usage. Tao said the dmeventd will cosume about 40MB when executing. I am not familiar with dmeventd, if its running will cost about constant 40M memory, no matter how much disk space need be extended at one time, we can adjust our kdump script to increase the default crashkernel= value if lvm2 thinp is detected. It looks acceptable in kdump side.
Dmeventd runs in 'mlockall()' mode - so the whole executable with all libraries and all the memory allocations are pinned in RAM (so IMHO 40MiB is way small number)
Reason for this is - in the normal 'running' mode lvm2 protects dmeventd from being blocked when it would run out of rootfs and it would suspend DM with rootfs on it - so by having the whole binary mlocked in RAM it cannot cause 'deadlock' waiting on itsefl when it suspends given DM device.
For kdump executional environment in ramdisk this is not really relevant condition (but dmeventd was not designed to be executed in such environment). However as mentioned in my previous post - it's actually more useful to run 'lvm lvextend --use-policies' with given thin-pool name in a plain shell parallel loop - as it basically gives same result with way less memory 'obstruction' and with far better control as well (i.e. leaving user a defined minimum to be sure it can actuall boot afterwards - so dumping only when there really is some space...)
Hi Zdenek,
Is running "lvm extend --use-policies"racy as well. I mean, it is possible that dump process fills up the pool before lvm extend gets a chance to extend it? Or it is fine even if thin pool gets full. Once it is extended again, it will unblock dumping process automatically?
But this still does not protect again filling up data LV completely and making rootfs unusable/unbootable.
Bao mentioned that makedumpfile has capability to estimate the size of core dump. May be we should run that instead in second kernel, extend the thin pool accordingly and the ninitiate the dump. For core collectors like, dd/cp, we know the size of /proc/vmcore and we can use that instead to make sure there is enough free space in thin pool otherwise abort.
So yes, I think estimating the size of space required to dump and then extending thin pool accordingly is probably the best way to go about it given the options.
Thanks Vivek
Regards
Zdenek
Dne 01. 06. 22 v 14:51 Vivek Goyal napsal(a):
On Mon, May 30, 2022 at 11:28:42AM +0200, Zdenek Kabelac wrote:
Dne 30. 05. 22 v 4:34 Baoquan He napsal(a):
On 05/27/22 at 11:39am, Vivek Goyal wrote:
On Fri, May 27, 2022 at 04:59:38PM +0200, Zdenek Kabelac wrote:
Dne 27. 05. 22 v 16:50 Vivek Goyal napsal(a):
On Fri, May 27, 2022 at 04:42:25PM +0200, Zdenek Kabelac wrote: > Dne 27. 05. 22 v 14:20 Vivek Goyal napsal(a): >> On Fri, May 27, 2022 at 02:45:14PM +0800, Tao Liu wrote: >>
Technically speaking, one could first run makedumpfile to just determine what will be size of vmcore and then actually save vmcore in second round. But that will double the filtering time.
Yeah. Besides, memory content of system is changing dynamically all the time. E.g your oracle DB is running or not running, the user space data is defintely not the same. And two times of work need involve people's manual work, automation is still expected if can be made.
Running very resource hungry dmeventd (looks all the process memory in RAM
- could be many many MB) in kdump environment is not IMHO worst option
here - I'd prefer to avoid execution of dmeventd in this ramfs image.
I understand. We also want to keep the size of kdump initramfs to the minimum.
Right.
I talked to Tao, he tested on kvm guest with 500M memory and 100M disk space to trigger the insufficient disk space usage. Tao said the dmeventd will cosume about 40MB when executing. I am not familiar with dmeventd, if its running will cost about constant 40M memory, no matter how much disk space need be extended at one time, we can adjust our kdump script to increase the default crashkernel= value if lvm2 thinp is detected. It looks acceptable in kdump side.
Dmeventd runs in 'mlockall()' mode - so the whole executable with all libraries and all the memory allocations are pinned in RAM (so IMHO 40MiB is way small number)
Reason for this is - in the normal 'running' mode lvm2 protects dmeventd from being blocked when it would run out of rootfs and it would suspend DM with rootfs on it - so by having the whole binary mlocked in RAM it cannot cause 'deadlock' waiting on itsefl when it suspends given DM device.
For kdump executional environment in ramdisk this is not really relevant condition (but dmeventd was not designed to be executed in such environment). However as mentioned in my previous post - it's actually more useful to run 'lvm lvextend --use-policies' with given thin-pool name in a plain shell parallel loop - as it basically gives same result with way less memory 'obstruction' and with far better control as well (i.e. leaving user a defined minimum to be sure it can actuall boot afterwards - so dumping only when there really is some space...)
Hi Zdenek,
Is running "lvm extend --use-policies"racy as well. I mean, it is possible that dump process fills up the pool before lvm extend gets a chance to extend it? Or it is fine even if thin pool gets full. Once it is extended again, it will unblock dumping process automatically?
This is the *very* same command dmeventd will run internally (with luxury of being locked in RAM).
By default there is 60sec 'delay' before thin-pool starts to 'reject' IO operation on overfilled thin-pool so it should not present any obstacle. Yeah - it might be slightly delayed before extension happens (depending on sleep value in shell loop)
But this still does not protect again filling up data LV completely and making rootfs unusable/unbootable.
It actually gives you some position where you can better 'estimate' whether you actually do want to kdump or not - by calculating kdump space and free space and ensuring there will be left some guaranteed free space (since you are the only user of thin-pool in this moment)
Bao mentioned that makedumpfile has capability to estimate the size of core dump. May be we should run that instead in second kernel, extend the thin pool accordingly and the ninitiate the dump. For
Yep - if you know how much data you want to store - and ensure there is enough free space in thin-pool to store them - it's the best case.
Regards
Zdenek