Thin provision is a mechanism that you can allocate a lvm volume which has a large virtual size for file systems but actually in a small physical size. The physical size can be autoextended in use if thin pool reached a threshold specified in /etc/lvm/lvm.conf.
There are 2 works should be handled when enable lvm2 thinp for kdump:
1) Check if the dump target device or directory is thinp device. 2) Monitor the thin pool and autoextend its size when it reached the threshold during kdump.
According to my testing, the memory consumption procedure for lvm2 thinp is the thin pool size-autoextend phase. For fedora and rhel9, the default crashkernel value is enough. But for rhel8, the default crashkernel value 1G-4G:160M is not enough, so it should be handled particularly.
v1 -> v2:
1) Modified the usage of lvs cmd when check if target is lvm2 thinp device. 2) Removed the sync flag way of mounting for lvm2 thinp target during kdump, use "sync -f vmcore" to force sync data, and handle the error if fails.
v2 -> v3:
1) Removed "sync -f vmcore" patch out of the patch set, for it is addressing an issue which is not specifically to lvm2 thinp support for kdump.
v3 -> v4:
1) Removed lvm2-monitor.service, implemented the monitor service with a loop function within a shell script instead. 2) Add lvm2 thinp support for dump_raw, for it is addressing the similar issue as dump_fs. 3) Dave suggested me to implement the lvm2 thin support in dracut modules instead of kexec-tools. If you are OK with the loop-function-shell-script technical way, I will give a try to migrate it to dracut.
Tao Liu (4): Add lvm2 thin provision dump target checker Add monitor_lvm2_thinp_autoextend script for kdump when lvm2 thinp enabled Add lvm2 thinp support for dump_raw lvm.conf should be check modified if lvm2 thinp enabled
dracut-kdump.sh | 24 +++++++++++++++----- dracut-module-setup.sh | 11 +++++++++- dracut-monitor_lvm2_thinp_autoextend | 33 ++++++++++++++++++++++++++++ kdump-lib-initramfs.sh | 10 +++++++++ kdump-lib.sh | 10 +++++++++ kdumpctl | 1 + kexec-tools.spec | 2 ++ 7 files changed, 85 insertions(+), 6 deletions(-) create mode 100644 dracut-monitor_lvm2_thinp_autoextend
We need to check if a directory or a device is lvm2 thinp target.
First, we use get_block_dump_target() to convert dump path into block device, then we check if the device is lvm2 thinp target by cmd lvs.
is_lvm2_thinp_device is now located in kdump-lib-initramfs.sh, for it will be used in 2nd kernel. is_lvm2_thinp_dump_target is located in kdump-lib.sh, for it is only used in 1st kernel, and it has dependencies which exist in kdump-lib.sh.
Signed-off-by: Tao Liu ltao@redhat.com --- kdump-lib-initramfs.sh | 9 +++++++++ kdump-lib.sh | 10 ++++++++++ 2 files changed, 19 insertions(+)
diff --git a/kdump-lib-initramfs.sh b/kdump-lib-initramfs.sh index 84e6bf7..bcf9927 100755 --- a/kdump-lib-initramfs.sh +++ b/kdump-lib-initramfs.sh @@ -131,3 +131,12 @@ is_fs_dump_target() { [ -n "$(kdump_get_conf_val "ext[234]|xfs|btrfs|minix")" ] } + +is_lvm2_thinp_device() +{ + _device_path=$1 + _lvm2_thin_device=$(lvm lvs -S 'lv_layout=sparse && lv_layout=thin' \ + --nosuffix --noheadings -o vg_name,lv_name "$_device_path" 2>/dev/null) + + [ -n "$_lvm2_thin_device" ] && return $? +} \ No newline at end of file diff --git a/kdump-lib.sh b/kdump-lib.sh index b137c89..534b8b6 100755 --- a/kdump-lib.sh +++ b/kdump-lib.sh @@ -117,6 +117,16 @@ is_dump_to_rootfs() [[ $(kdump_get_conf_val 'failure_action|default') == dump_to_rootfs ]] }
+is_lvm2_thinp_dump_target() +{ + _target=$(get_block_dump_target) + if [ -n "$_target" ]; then + is_lvm2_thinp_device "$_target" + else + return 1 + fi +} + get_failure_action_target() { local _target
If lvm2 thinp is enabled in kdump, a lvm2 thin pool monitor service is needed for monitoring and autoextend the size of thin pool when it reached the threshold.
Usually the monitor service is achieved by enabling lvm2-monitor.service for systemd, which will start the monitoring with dmeventd. When threshold reaches, dmeventd will perform the autoextension for thin pool.
However dmeventd is not designed for running in ramdisk environment as kdump does, for it will mlockall() the whole executable with all libraries and the memory allocations pinned in RAM, which will consume lot of memory and unsuitable for kdump use. For example we can notice OOM kill of dmeventd on rhel8 with crashkernel value 1G-4G:160M when autoextension starts.
In this patch, we will use a shell script to achieve the same. It runs in parallel background with vmcore collector and sync, and it calls lvextend looply. If threshold reaches, autoextend it, if not, just exit. After kdump finishes, the script will be terminated by TERM signal.
Comparing to lvm2-monitor.service/dmeventd version, this version will extend thin pool at a fixed time interval(5s for now), it may take a longer kdump time but without the memory shortage risk of mlockall().
Signed-off-by: Tao Liu ltao@redhat.com --- dracut-kdump.sh | 9 ++++++++ dracut-module-setup.sh | 5 +++++ dracut-monitor_lvm2_thinp_autoextend | 33 ++++++++++++++++++++++++++++ kexec-tools.spec | 2 ++ 4 files changed, 49 insertions(+) create mode 100644 dracut-monitor_lvm2_thinp_autoextend
diff --git a/dracut-kdump.sh b/dracut-kdump.sh index 4852c01..7ec7ebd 100755 --- a/dracut-kdump.sh +++ b/dracut-kdump.sh @@ -164,6 +164,12 @@ dump_fs()
mkdir -p "$_dump_fs_path" || return 1
+ _target=$(get_target_from_path "$_dump_fs_path") + if is_lvm2_thinp_device "$_target"; then + /kdumpscripts/monitor_lvm2_thinp_autoextend "$_target" "$CORE_COLLECTOR" "${DMESG_COLLECTOR}" "cp" & + _mon_pid=$! + fi + save_vmcore_dmesg_fs ${DMESG_COLLECTOR} "$_dump_fs_path" save_opalcore_fs "$_dump_fs_path"
@@ -178,12 +184,15 @@ dump_fs() _sync_exitcode=$? if [ $_sync_exitcode -eq 0 ]; then mv "$_dump_fs_path/vmcore-incomplete" "$_dump_fs_path/vmcore" + [ -n "$_mon_pid" ] && kill $_mon_pid dinfo "saving vmcore complete" else + [ -n "$_mon_pid" ] && kill $_mon_pid derror "sync vmcore failed, exitcode:$_sync_exitcode" return 1 fi else + [ -n "$_mon_pid" ] && kill $_mon_pid derror "saving vmcore failed, exitcode:$_dump_exitcode" return 1 fi diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index c319fc2..db3d91b 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -1038,7 +1038,9 @@ install() { fi dracut_install -o /etc/adjtime /etc/localtime inst "$moddir/monitor_dd_progress" "/kdumpscripts/monitor_dd_progress" + inst "$moddir/monitor_lvm2_thinp_autoextend" "/kdumpscripts/monitor_lvm2_thinp_autoextend" chmod +x "${initdir}/kdumpscripts/monitor_dd_progress" + chmod +x "${initdir}/kdumpscripts/monitor_lvm2_thinp_autoextend" inst "/bin/dd" "/bin/dd" inst "/bin/tail" "/bin/tail" inst "/bin/date" "/bin/date" @@ -1053,6 +1055,9 @@ install() { inst "/usr/bin/printf" "/sbin/printf" inst "/usr/bin/logger" "/sbin/logger" inst "/usr/bin/chmod" "/sbin/chmod" + inst "/usr/bin/pidof" "/sbin/pidof" + inst "/usr/bin/df" "/sbin/df" + inst "/usr/bin/kill" "/sbin/kill" inst "/lib/kdump/kdump-lib-initramfs.sh" "/lib/kdump-lib-initramfs.sh" inst "/lib/kdump/kdump-logger.sh" "/lib/kdump-logger.sh" inst "$moddir/kdump.sh" "/usr/bin/kdump.sh" diff --git a/dracut-monitor_lvm2_thinp_autoextend b/dracut-monitor_lvm2_thinp_autoextend new file mode 100644 index 0000000..5d5c44f --- /dev/null +++ b/dracut-monitor_lvm2_thinp_autoextend @@ -0,0 +1,33 @@ +#!/bin/sh + +THIN_LV=$1 +THIN_POOL=$(lvm lvs -S 'lv_layout=sparse && lv_layout=thin' \ + --nosuffix --noheadings -o vg_name,pool_lv "$THIN_LV" | \ + awk '{printf("%s/%s",$1,$2);}') +shift +COLLECTORS=() +for collector in "$@"; do + COLLECTORS+=($(echo "$collector" | awk '{print $1}')) +done + +while true +do + # Wait when vmcore dumping or syncfs starts. + if [ -n "$(pidof sync ${COLLECTORS[@]})" ]; then + break + fi + sleep 0.5 +done + +while true +do + # Quit when vmcore dumping and syncfs finishes. + if ! [ -n "$(pidof sync ${COLLECTORS[@]})" ]; then + break + fi + + # Use 'monitoring=0' to override the value in lvm.conf, in case + # dmeventd monitoring been started after the calling. + lvm lvextend --use-policies --config "activation {monitoring=0}" "$THIN_POOL" + sleep 5 +done diff --git a/kexec-tools.spec b/kexec-tools.spec index 6673000..c12f6f6 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -60,6 +60,7 @@ Source109: dracut-early-kdump-module-setup.sh
Source200: dracut-fadump-init-fadump.sh Source201: dracut-fadump-module-setup.sh +Source202: dracut-monitor_lvm2_thinp_autoextend
%ifarch ppc64 ppc64le Requires(post): servicelog @@ -240,6 +241,7 @@ cp %{SOURCE102} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpb cp %{SOURCE104} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE104}} cp %{SOURCE106} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE106}} cp %{SOURCE107} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE107}} +cp %{SOURCE202} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE202}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}} chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}} mkdir -p -m755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99earlykdump
Now the write mode of thin pool will be: write mode -> (queue IO) mode -> write mode -> (queue IO) mode -> write mode ... Please see the dmesg log as follows:
[ 3.878793] kdump[523]: saving vmcore [ 4.452306] device-mapper: thin: 253:2: reached low water mark for data device: sending event. [ 4.484872] device-mapper: thin: 253:2: switching pool to out-of-data-space (queue IO) mode [ 4.444225] kdump.sh[524]: Checking for memory holes : [ 0.0 %] / Checking for memory h oles : [100.0 %] | Excluding unnecessary pages : [100.0 %] \ [ 4.461538] kdump.sh[527]: WARNING: Sum of all thin volume sizes (300.00 MiB) exceeds the size of thin pools and the size of whole volume group (296.00 MiB). [ 4.491146] kdump.sh[527]: Size of logical volume vg00/thinpool_tdata changed from 12.00 MiB (3 extents) to 20.00 MiB (5 extents).^ M [ 5.106573] device-mapper: thin: 253:2: switching pool to write mode [ 5.115955] device-mapper: thin: 253:2: growing the data device from 192 to 320 blocks [ 5.141481] device-mapper: thin: 253:2: reached low water mark for data device: sending event. [ 4.629948] kdump.sh[527]: Logical volume vg00/thinpool_tdata successfully resized. [ 5.199211] device-mapper: thin: 253:2: switching pool to out-of-data-space (queue IO) mode [ 9.783912] kdump.sh[524]: Copying data : [ 66.1 %] - eta: 1s [ 9.793645] kdump.sh[552]: WARNING: Sum of all thin volume sizes (300.00 MiB) exceeds the size of thin pools and the size of whole volume group (296.00 MiB). [ 9.807207] kdump.sh[552]: Size of logical volume vg00/thinpool_tdata changed from 20.00 MiB (5 extents) to 32.00 MiB (8 extents).^ M [ 10.388059] device-mapper: thin: 253:2: switching pool to write mode [ 10.400524] device-mapper: thin: 253:2: growing the data device from 320 to 512 blocks [ 10.431529] device-mapper: thin: 253:2: reached low water mark for data device: sending event. [ 9.919487] kdump.sh[552]: Logical volume vg00/thinpool_tdata successfully resized. [ 10.494141] device-mapper: thin: 253:2: switching pool to out-of-data-space (queue IO) mode [ 10.067391] kdump.sh[524]: Copying data : [ 96.1 %] / eta: 0sCopying data : [100.0 %] | eta: 0sCopying data : [100.0 %] \ eta: 0s [ 10.087939] kdump.sh[524]: The dumpfile is saved to /kdumproot/mnt/var/crash/127.0.0.1-2022-06-06-14:30:24//vmcore-incomplete. [ 10.097779] kdump.sh[524]: makedumpfile Completed. [ 15.016903] kdump.sh[580]: WARNING: Sum of all thin volume sizes (300.00 MiB) exceeds the size of thin pools and the size of whole volume group (296.00 MiB). [ 15.030921] kdump.sh[580]: Size of logical volume vg00/thinpool_tdata changed from 32.00 MiB (8 extents) to 48.00 MiB (12 extents).
[ 15.591895] device-mapper: thin: 253:2: switching pool to write mode [ 15.604532] device-mapper: thin: 253:2: growing the data device from 512 to 768 blocks [ 15.628054] device-mapper: thin: 253:2: reached low water mark for data device: sending event. [ 15.124263] kdump.sh[580]: Logical volume vg00/thinpool_tdata successfully resized. [ 15.221498] kdump[607]: saving vmcore complete
dump_raw is similar to dump_fs when dumping to lvm2 thinp device. Except "dd oflag=sync" is used as a substitute of "sync -f vmcore".
Signed-off-by: Tao Liu ltao@redhat.com --- dracut-kdump.sh | 15 ++++++++++----- dracut-module-setup.sh | 6 +++++- 2 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/dracut-kdump.sh b/dracut-kdump.sh index 7ec7ebd..456e77d 100755 --- a/dracut-kdump.sh +++ b/dracut-kdump.sh @@ -389,12 +389,17 @@ dump_raw() /kdumpscripts/monitor_dd_progress $_src_size_mb & fi
- dinfo "saving vmcore" - $CORE_COLLECTOR /proc/vmcore | dd of="$1" bs=$DD_BLKSIZE >> /tmp/dd_progress_file 2>&1 || return 1 - sync + if is_lvm2_thinp_device "$1"; then + /kdumpscripts/monitor_lvm2_thinp_autoextend "$1" "$CORE_COLLECTOR" "dd" & + _mon_pid=$! + fi
- dinfo "saving vmcore complete" - return 0 + dinfo "saving vmcore" + $CORE_COLLECTOR /proc/vmcore | dd of="$1" bs=$DD_BLKSIZE oflag=sync >> /tmp/dd_progress_file 2>&1 + ret=$? + [ -n "$_mon_pid" ] && kill $_mon_pid + [ $ret -eq 0 ] && dinfo "saving vmcore complete" + return $ret }
# $1: ssh key file diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index db3d91b..1216c4c 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -670,7 +670,11 @@ kdump_install_conf() { # remove inline comments after the end of a directive. case "$_opt" in raw) - _pdev=$(persistent_policy="by-id" kdump_get_persistent_dev "$_val") + if is_lvm2_thinp_device "$_val"; then + _pdev=$(kdump_get_persistent_dev "$_val") + else + _pdev=$(persistent_policy="by-id" kdump_get_persistent_dev "$_val") + fi sed -i -e "s#^${_opt}[[:space:]]+$_val#$_opt $_pdev#" "${initdir}/tmp/$$-kdump.conf" ;; ext[234] | xfs | btrfs | minix)
lvm2 relies on /etc/lvm/lvm.conf to determine its behaviour. The important configs such as thin_pool_autoextend_threshold and thin_pool_autoextend_percent will be used during kdump in 2nd kernel. So if the file is modified, the initramfs should be rebuild to include the latest.
Signed-off-by: Tao Liu ltao@redhat.com --- kdump-lib-initramfs.sh | 1 + kdumpctl | 1 + 2 files changed, 2 insertions(+)
diff --git a/kdump-lib-initramfs.sh b/kdump-lib-initramfs.sh index bcf9927..80da64a 100755 --- a/kdump-lib-initramfs.sh +++ b/kdump-lib-initramfs.sh @@ -8,6 +8,7 @@ DEFAULT_SSHKEY="/root/.ssh/kdump_id_rsa" KDUMP_CONFIG_FILE="/etc/kdump.conf" FENCE_KDUMP_CONFIG_FILE="/etc/sysconfig/fence_kdump" FENCE_KDUMP_SEND="/usr/libexec/fence_kdump_send" +LVM_CONF="/etc/lvm/lvm.conf"
# Read kdump config in well formated style kdump_read_conf() diff --git a/kdumpctl b/kdumpctl index 6188d47..b157eb8 100755 --- a/kdumpctl +++ b/kdumpctl @@ -383,6 +383,7 @@ check_files_modified()
# HOOKS is mandatory and need to check the modification time files="$files $HOOKS" + is_lvm2_thinp_dump_target && files="$files $LVM_CONF" check_exist "$files" && check_executable "$EXTRA_BINS" || return 2
for file in $files; do