This patch set implements firmware-assisted dump support for kdump service. Firmware-assisted dump support depends on existing kdump infrastructure (kdump scripts) present in userland to save dump to the disk. Though existing kdump script will work seemlessly, it still needs to modified to make it aware of presense of firmware- assisted dump feature during service start and stop. These changes are tested successfully on a power box with fedora19.
---
Hari Bathini (6): kdump: Modify status() routine to check for firmware-assisted dump kdump: Modify kdump script to start the firmware assisted dump. kdump: Modify kdump script to stop firmware assisted dump kdump: Take a backup of original default initrd before rebuilding. kdump: Rebuild default initrd for firmware assisted dump kdump: Check for /proc/vmcore existence before capturing the vmcore.
dracut-kdump.sh | 3 + kdumpctl | 181 ++++++++++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 169 insertions(+), 15 deletions(-)
This patch enables kdump script to check if firmware-assisted dump is enabled or not by reading value from '/sys/kernel/fadump_enabled'.
Modify status() routine to check if firmware assisted dump is enabled or not by reading value from '/sys/kernel/fadump_enabled' file. If enabled and value from '/sys/kernel/fadump_registered' file is set to '1' then return status=0 else return status=1.
0 <= Firmware assisted is enabled and running 1 <= Firmware assisted is enabled but not running
Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com --- kdumpctl | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
diff --git a/kdumpctl b/kdumpctl index 46ae633..1255089 100755 --- a/kdumpctl +++ b/kdumpctl @@ -9,6 +9,8 @@ MKDUMPRD="/sbin/mkdumprd -f" SAVE_PATH=/var/crash SSH_KEY_LOCATION="/root/.ssh/kdump_id_rsa" DUMP_TARGET="" +FADUMP_ENABLED_SYS_NODE="/sys/kernel/fadump_enabled" +FADUMP_REGISTER_SYS_NODE="/sys/kernel/fadump_registered"
. /lib/kdump/kdump-lib.sh
@@ -358,8 +360,38 @@ function propagate_ssh_key() }
+function is_fadump_capable() +{ + # Check if firmware-assisted dump is enabled + # if yes, check fadump status Otherwise fallback to kdump check + if [ -f $FADUMP_ENABLED_SYS_NODE ] && \ + [ -f $FADUMP_REGISTER_SYS_NODE ] + then + rc=`cat $FADUMP_ENABLED_SYS_NODE` + [ $rc -eq 1 ] && return 0 + fi + return 1 +} + +function is_fadump_registered() +{ + # Check if firmware-assisted has been registered. + rc=`cat $FADUMP_REGISTER_SYS_NODE` + [ $rc -eq 1 ] && return 0 + return 1 +} + function status() { + # Check if firmware-assisted dump is enabled + # if yes, check fadump status Otherwise fallback to kdump check + if is_fadump_capable; then + if is_fadump_registered; then + return 0 + fi + return 1 + fi + if [ ! -e /sys/kernel/kexec_crash_loaded ] then return 2
On Tue, Jan 21, 2014 at 10:47:59PM +0530, Hari Bathini wrote:
This patch enables kdump script to check if firmware-assisted dump is enabled or not by reading value from '/sys/kernel/fadump_enabled'.
Modify status() routine to check if firmware assisted dump is enabled or not by reading value from '/sys/kernel/fadump_enabled' file. If enabled and value from '/sys/kernel/fadump_registered' file is set to '1' then return status=0 else return status=1.
0 <= Firmware assisted is enabled and running 1 <= Firmware assisted is enabled but not running
Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com
kdumpctl | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
diff --git a/kdumpctl b/kdumpctl index 46ae633..1255089 100755 --- a/kdumpctl +++ b/kdumpctl @@ -9,6 +9,8 @@ MKDUMPRD="/sbin/mkdumprd -f" SAVE_PATH=/var/crash SSH_KEY_LOCATION="/root/.ssh/kdump_id_rsa" DUMP_TARGET="" +FADUMP_ENABLED_SYS_NODE="/sys/kernel/fadump_enabled" +FADUMP_REGISTER_SYS_NODE="/sys/kernel/fadump_registered"
. /lib/kdump/kdump-lib.sh
@@ -358,8 +360,38 @@ function propagate_ssh_key() }
+function is_fadump_capable() +{
- # Check if firmware-assisted dump is enabled
- # if yes, check fadump status Otherwise fallback to kdump check
- if [ -f $FADUMP_ENABLED_SYS_NODE ] && \
[ -f $FADUMP_REGISTER_SYS_NODE ]
- then
rc=`cat $FADUMP_ENABLED_SYS_NODE`
[ $rc -eq 1 ] && return 0
- fi
- return 1
+}
+function is_fadump_registered() +{
- # Check if firmware-assisted has been registered.
- rc=`cat $FADUMP_REGISTER_SYS_NODE`
- [ $rc -eq 1 ] && return 0
- return 1
+}
function status() {
- # Check if firmware-assisted dump is enabled
- # if yes, check fadump status Otherwise fallback to kdump check
- if is_fadump_capable; then
if is_fadump_registered; then
return 0
fi
return 1
- fi
What's the difference between fadump cabable and fadump enabled?
So if a machine is fadump capable, we always expect it to use fadump only?
Thanks Vivek
On 02/05/2014 11:48 PM, Vivek Goyal wrote:
On Tue, Jan 21, 2014 at 10:47:59PM +0530, Hari Bathini wrote:
This patch enables kdump script to check if firmware-assisted dump is enabled or not by reading value from '/sys/kernel/fadump_enabled'.
Modify status() routine to check if firmware assisted dump is enabled or not by reading value from '/sys/kernel/fadump_enabled' file. If enabled and value from '/sys/kernel/fadump_registered' file is set to '1' then return status=0 else return status=1.
0 <= Firmware assisted is enabled and running 1 <= Firmware assisted is enabled but not running
Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com
kdumpctl | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
diff --git a/kdumpctl b/kdumpctl index 46ae633..1255089 100755 --- a/kdumpctl +++ b/kdumpctl @@ -9,6 +9,8 @@ MKDUMPRD="/sbin/mkdumprd -f" SAVE_PATH=/var/crash SSH_KEY_LOCATION="/root/.ssh/kdump_id_rsa" DUMP_TARGET="" +FADUMP_ENABLED_SYS_NODE="/sys/kernel/fadump_enabled" +FADUMP_REGISTER_SYS_NODE="/sys/kernel/fadump_registered"
. /lib/kdump/kdump-lib.sh
@@ -358,8 +360,38 @@ function propagate_ssh_key() }
+function is_fadump_capable() +{
- # Check if firmware-assisted dump is enabled
- # if yes, check fadump status Otherwise fallback to kdump check
- if [ -f $FADUMP_ENABLED_SYS_NODE ] && \
[ -f $FADUMP_REGISTER_SYS_NODE ]
- then
rc=`cat $FADUMP_ENABLED_SYS_NODE`
[ $rc -eq 1 ] && return 0
- fi
- return 1
+}
+function is_fadump_registered() +{
- # Check if firmware-assisted has been registered.
- rc=`cat $FADUMP_REGISTER_SYS_NODE`
- [ $rc -eq 1 ] && return 0
- return 1
+}
- function status() {
- # Check if firmware-assisted dump is enabled
- # if yes, check fadump status Otherwise fallback to kdump check
- if is_fadump_capable; then
if is_fadump_registered; then
return 0
fi
return 1
- fi
What's the difference between fadump cabable and fadump enabled?
So if a machine is fadump capable, we always expect it to use fadump only?
A fadump capable machine can use either fadump or kdump. fadump can be enabled by setting "fadump" kernel parameter to "on", else we fall-back to kdump.
Thanks Hari
During service kdump start, if firmware assisted dump is not enabled then fallback to starting of existing kexec based kdump. If firmware assisted is enabled but not running, then start firmware assisted dump by echo'ing 1 to '/sys/kernel/fadump_registered' file.
Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com --- kdumpctl | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/kdumpctl b/kdumpctl index 1255089..84df073 100755 --- a/kdumpctl +++ b/kdumpctl @@ -506,6 +506,18 @@ selinux_relabel() done }
+function start_fadump() +{ + echo 1 > $FADUMP_REGISTER_SYS_NODE + if is_fadump_registered; then + echo "Starting firmware assisted dump: [OK]" + return 0 + else + echo "Starting firmware assisted dump: [FAILED]" + return 1 + fi +} + function start() { check_config @@ -547,7 +559,11 @@ function start() echo "Starting kdump: [FAILED]" return 1 fi - load_kdump + if is_fadump_capable; then + start_fadump + else + load_kdump + fi if [ $? != 0 ]; then echo "Starting kdump: [FAILED]" return 1
During service kdump stop, if firmware assisted dump is enabled and running, then stop firmware assisted dump by echo'ing 0 to '/sys/kernel/fadump_registered' file.
Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com --- kdumpctl | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/kdumpctl b/kdumpctl index 84df073..41e4c69 100755 --- a/kdumpctl +++ b/kdumpctl @@ -572,8 +572,27 @@ function start() echo "Starting kdump: [OK]" }
+function stop_fadump() +{ + is_fadump_registered && echo 0 > $FADUMP_REGISTER_SYS_NODE + if is_fadump_registered; then + echo "fadump: failed to un-register firmware assisted dump" + echo "Stopping kdump: [FAILED]" + return 1 + else + echo "fadump: un-register firmware assisted dump" + echo "Stopping kdump: [OK]" + return 0 + fi +} + function stop() { + if is_fadump_capable; then + stop_fadump + return $? + fi + $KEXEC -p -u 2>/dev/null if [ $? == 0 ]; then echo "kexec: unloaded kdump kernel"
Take a backup of original initrd when fadump is used first time or when user has switched from kdump to fadump. This will allow us to fall back to original initrd when kdump service fails to rebuild the fadump ready default initrd. Also, if the user switches from fadump to kdump, then the original initrd will be restored when kdump script is run first time after the switch.
Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com Signed-off-by: Hari Bathini hbathini@linux.vnet.ibm.com --- kdumpctl | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 71 insertions(+), 9 deletions(-)
diff --git a/kdumpctl b/kdumpctl index 41e4c69..f4760b5 100755 --- a/kdumpctl +++ b/kdumpctl @@ -15,6 +15,7 @@ FADUMP_REGISTER_SYS_NODE="/sys/kernel/fadump_registered" . /lib/kdump/kdump-lib.sh
standard_kexec_args="-p" +declare -i image_time
if [ -f /etc/sysconfig/kdump ]; then . /etc/sysconfig/kdump @@ -70,6 +71,10 @@ function save_core()
function rebuild_initrd() { + if is_fadump_capable; then + backup_default_initrd + fi + $MKDUMPRD $kdump_initrd $kdump_kver if [ $? != 0 ]; then echo "mkdumprd: failed to make kdump initrd" >&2 @@ -99,6 +104,65 @@ function check_executable() done }
+function backup_default_initrd() +{ + # Check if backup initrd is already present. If not, then + # this is the first time fadump is being used OR user + # has switched from kdump to fadump. + # Take a backup of the original default initrd before + # we rebuild default initrd for fadump support. + if [ ! -e $default_initrd_bak ];then + echo "Backing up default initrd" + cp $default_initrd $default_initrd_bak + sync + fi +} + +function check_fadump() +{ + default_initrd_bak="$default_initrd.default.bak" + if is_fadump_capable; then + if [ -e $kdump_initrd ];then + # This means user has switched from kdump to fadump. + # Remove kdump initrd which is no longer needed + rm -f $kdump_initrd + fi + else + if [ -e $default_initrd_bak ];then + # !fadump and original initrd backup file exists. + # This means user has switched from fadump to kdump. + # Restore the original default initrd. + mv $default_initrd_bak $default_initrd + sync + fi + fi +} + +function find_initrd_image_time() +{ + image_time=0 + + # Check to see if dependent files have been modified + # since last build of the image file + if [ -f $kdump_initrd ]; then + image_time=`stat -c "%Y" $kdump_initrd 2>/dev/null` + return + else + # if fadump is not used, image_time=0 + if ! is_fadump_capable; then + return + fi + fi + + # If this is the first time we are using fadump then let image_time + # be zero to force rebuild intital initrd. The non-existance of backup + # initrd means this is the first time fadump is being used. if exists + # then return the image time of default initrd. + if [ -e $default_initrd_bak ]; then + image_time=`stat -c "%Y" $default_initrd 2>/dev/null` + fi +} + function check_config() { local nr @@ -146,7 +210,9 @@ function check_rebuild() fi
kdump_kernel="${KDUMP_BOOTDIR}/${KDUMP_IMG}-${kdump_kver}${KDUMP_IMG_EXT}" + default_initrd="${KDUMP_BOOTDIR}/initramfs-${kdump_kver}.img" kdump_initrd="${KDUMP_BOOTDIR}/initramfs-${kdump_kver}kdump.img" + check_fadump
_force_rebuild=`grep ^force_rebuild $KDUMP_CONFIG_FILE 2>/dev/null` if [ $? -eq 0 ]; then @@ -157,17 +223,13 @@ function check_rebuild() fi fi
- #will rebuild every time if extra_modules are specified + # Will rebuild every time if extra_modules are specified extra_modules=`grep ^extra_modules $KDUMP_CONFIG_FILE` [ -n "$extra_modules" ] && force_rebuild="1"
- #check to see if dependent files has been modified - #since last build of the image file - if [ -f $kdump_initrd ]; then - image_time=`stat -c "%Y" $kdump_initrd 2>/dev/null` - else - image_time=0 - fi + # Find initrd image time based on whether dependent files have been + # modified since last build of the image file + find_initrd_image_time
EXTRA_BINS=`grep ^kdump_post $KDUMP_CONFIG_FILE | cut -d\ -f2` CHECK_FILES=`grep ^kdump_pre $KDUMP_CONFIG_FILE | cut -d\ -f2` @@ -191,7 +253,7 @@ function check_rebuild() elif [ "$force_rebuild" != "0" ]; then echo -n "Force rebuild $kdump_initrd"; echo elif [ -n "$modified_files" ]; then - echo "Detected change(s) the following file(s):" + echo "Detected change(s) in the following file(s):" echo -n " "; echo "$modified_files" | sed 's/\s/\n /g' else return 0
On Tue, Jan 21, 2014 at 10:48:20PM +0530, Hari Bathini wrote:
[..]
+function backup_default_initrd() +{
- # Check if backup initrd is already present. If not, then
- # this is the first time fadump is being used OR user
- # has switched from kdump to fadump.
- # Take a backup of the original default initrd before
- # we rebuild default initrd for fadump support.
- if [ ! -e $default_initrd_bak ];then
echo "Backing up default initrd"
cp $default_initrd $default_initrd_bak
sync
- fi
+}
+function check_fadump() +{
I think this function should be named better. Say handle_dump_mode_switch().
In general there are too many checks for what state we are in, whether we have transitioned modes (from fadump to kdump or vice versa).
I think we should write a top level function where we should do all this processing, instead of it being sprinkled all over.
That is determine in what mode kdump service is supposed to start, kdump or fadump and store that state in a variable, say "dump_mode". And use that variable in rest of the places to take kdump or fadump specific actions.
So your top level function could be say, determine_dump_mode(). And with-in that function you could also handle mode transition and call handle_dump_mode_switch().
Once this is taken care of reading up rest of the code will become little easier.
Thanks Vivek
The current kdump infrastructure builds a separate initrd which then gets loaded into memory by kexec-tools for use by kdump kernel. But firmware assisted dump (FADUMP) does not use kexec-based approach. After crash, firmware reboots the partition and loads grub loader like the normal booting process does. Hence in the FADUMP approach, the second kernel (after crash) will always use the default initrd (OS built). So, to support FADUMP, change is required, as in to add dump capturing steps, in default initrd.
The current kdumpctl script implementation already has the code to build initrd using mkdumprd. This patch uses the new '--rebuild' option introduced, in dracut, to incrementally build the initramfs image. Once kdump (fadump) initrd is successfully rebuild from the default initrd image, this patch replaces the default initrd image with the newly built initrd.
check_config() -> rebuild_initrd() Rebuild default initrd with fadump support -> handle_fadump_initrd() Replace default with initrd built for fadump.
Kexec-tools package in rhel7 is now enhanced to insert a out-of-tree kdump module for dracut, which is responsible for adding vmcore capture steps into initrd, if dracut is invoked with "IN_KDUMP" environment variable set to 1. mkdumprd script exports "IN_KDUMP=1" environment variable before invoking dracut to build kdump initrd. This patch relies on this current mechanism of kdump init script.
Dracut patch that introduces '--rebuild' option is posted upstream, awaiting approval. Link for reference: http://www.spinics.net/lists/linux-initramfs/msg03495.html
Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com Signed-off-by: Hari Bathini hbathini@linux.vnet.ibm.com --- kdumpctl | 36 +++++++++++++++++++++++++++++------- 1 file changed, 29 insertions(+), 7 deletions(-)
diff --git a/kdumpctl b/kdumpctl index f4760b5..976dfe8 100755 --- a/kdumpctl +++ b/kdumpctl @@ -72,13 +72,25 @@ function save_core() function rebuild_initrd() { if is_fadump_capable; then + if [ ! -s "$default_initrd" ]; then + echo "No default initrd found to rebuild for fadump support!" + return 1 + fi backup_default_initrd - fi - - $MKDUMPRD $kdump_initrd $kdump_kver - if [ $? != 0 ]; then - echo "mkdumprd: failed to make kdump initrd" >&2 - return 1 + echo "Rebuilding $default_initrd with fadump support" + $MKDUMPRD --rebuild $default_initrd --kver $kdump_kver + if [ $? != 0 ]; then + echo "mkdumprd: failed to make initrd with fadump support" >&2 + restore_default_initrd + return 1 + fi + else + echo "Rebuilding $kdump_initrd" + $MKDUMPRD $kdump_initrd $kdump_kver + if [ $? != 0 ]; then + echo "mkdumprd: failed to make kdump initrd" >&2 + return 1 + fi fi }
@@ -138,6 +150,17 @@ function check_fadump() fi }
+function restore_default_initrd() +{ + # We have failed to rebuild initrd for fadump support. + # Restore the original default initrd. + if [ -f $default_initrd_bak ];then + echo "Restored default initrd" + mv $default_initrd_bak $default_initrd + sync + fi +} + function find_initrd_image_time() { image_time=0 @@ -259,7 +282,6 @@ function check_rebuild() return 0 fi
- echo "Rebuilding $kdump_initrd" rebuild_initrd return $? }
On Tue, Jan 21, 2014 at 10:48:28PM +0530, Hari Bathini wrote:
The current kdump infrastructure builds a separate initrd which then gets loaded into memory by kexec-tools for use by kdump kernel. But firmware assisted dump (FADUMP) does not use kexec-based approach. After crash, firmware reboots the partition and loads grub loader like the normal booting process does. Hence in the FADUMP approach, the second kernel (after crash) will always use the default initrd (OS built). So, to support FADUMP, change is required, as in to add dump capturing steps, in default initrd.
The current kdumpctl script implementation already has the code to build initrd using mkdumprd. This patch uses the new '--rebuild' option introduced, in dracut, to incrementally build the initramfs image. Once kdump (fadump) initrd is successfully rebuild from the default initrd image, this patch replaces the default initrd image with the newly built initrd.
check_config() -> rebuild_initrd() Rebuild default initrd with fadump support -> handle_fadump_initrd() Replace default with initrd built for fadump.
I think above description is wrong. You seem to be doing reverse. Directly rebuilding initrd and if it fails, replace it with backup copy.
Can you please fix changelogs.
[..]
diff --git a/kdumpctl b/kdumpctl index f4760b5..976dfe8 100755 --- a/kdumpctl +++ b/kdumpctl @@ -72,13 +72,25 @@ function save_core() function rebuild_initrd() { if is_fadump_capable; then
if [ ! -s "$default_initrd" ]; then
echo "No default initrd found to rebuild for fadump support!"
return 1
backup_default_initrdfi
- fi
- $MKDUMPRD $kdump_initrd $kdump_kver
- if [ $? != 0 ]; then
echo "mkdumprd: failed to make kdump initrd" >&2
return 1
echo "Rebuilding $default_initrd with fadump support"
$MKDUMPRD --rebuild $default_initrd --kver $kdump_kver
if [ $? != 0 ]; then
echo "mkdumprd: failed to make initrd with fadump support" >&2
restore_default_initrd
return 1
fi
- else
echo "Rebuilding $kdump_initrd"
$MKDUMPRD $kdump_initrd $kdump_kver
if [ $? != 0 ]; then
echo "mkdumprd: failed to make kdump initrd" >&2
return 1
fi
How about splitting it in two functions. Something like.
rebuild_initrd() { if (dump_mode == kdump) rebuild_kdump_initrd() else rebuild_fadump_initrd() fi }
Thanks Vivek
The script dracut-kdump.sh is responsible for capturing vmcore during second kernel boot. Currently this script gets installed into kdump initrd as part of kdumpbase dracut module. Since it's always installed into kdump initrd, this script assumes that '/proc/vmcore' will always be present when it is invoked.
With fadump support, 'dracut-kdump.sh' script also gets installed into default initrd to capture vmcore generated by firmware assisted dump. Thus in fadump case, the same initrd is going to be used for normal boot as well as boot after system crash. Hence a check is required to see if '/proc/vmcore' file exists before executing steps to capture vmcore. This check will help to bypass the vmcore capture steps during normal boot process.
Signed-off-by: Mahesh Salgaonkar mahesh@linux.vnet.ibm.com Signed-off-by: Hari Bathini hbathini@linux.vnet.ibm.com --- dracut-kdump.sh | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/dracut-kdump.sh b/dracut-kdump.sh index 4d8616f..ce977d9 100755 --- a/dracut-kdump.sh +++ b/dracut-kdump.sh @@ -1,5 +1,8 @@ #!/bin/sh
+# continue only if /proc/vmcore is present. +[ ! -f /proc/vmcore ] && return + exec &> /dev/console . /lib/dracut-lib.sh . /lib/kdump-lib.sh
On Tue, Jan 21, 2014 at 10:47:52PM +0530, Hari Bathini wrote:
This patch set implements firmware-assisted dump support for kdump service. Firmware-assisted dump support depends on existing kdump infrastructure (kdump scripts) present in userland to save dump to the disk. Though existing kdump script will work seemlessly, it still needs to modified to make it aware of presense of firmware- assisted dump feature during service start and stop. These changes are tested successfully on a power box with fedora19.
I have some general comments.
- Can you please write a .txt file say fadump-powerpc-howto.txt and put some details about how fadump works, how to configure it etc. For details you can refer to kernel Documentation/powerpc/fadump*.txt file. I am more interested here in putting details about how initrd is generated and managed etc.
- Please put details about how user can switch between kdump and fadump modes.
- I did not see any code which does not reboot machine after capturing dump.
- How would you handle "default" in kdump.conf. I think all these defaults don't make sense in case of fadump? IOW, if dump capturing fails what would you do? Continue normal boot?
IF yes, we need to put all these details in fadump howto file.
- If "default" are ignored in fadump mode, we might want to give a warning to user about it.
- Are you still saving dump from initramfs context? Why?
Thanks Vivek