This is a patchset to add fence kdump support.
In cluster environment, fence kdump is used to notify all the other nodes that current is crashed and stop from being fenced off.
The patchset has the following features:
1. rebuild kdump initrd regarding timestamp of fence kdump config or cluster configuration. 2. setup a required working environment for fence kdump in 2nd kernel. 3. fence_kdump_send notify other nodes to stop the crashed one being fenced off before dumping process. 4. add kdump-in-cluster-environment.txt
WANG Chao (5): kdump-lib: add common variables and function for fence kdump kdumpctl: rebuild kdump initramfs if cluster or fence_kdump config is changed. kdump.sh: send fence kdump message to other nodes in the cluster module-setup.sh: setup fence kdump environment module-setup: remove duplicated ip= line
arthur (1): doc: Add kdump-in-cluster-environment.txt
dracut-kdump.sh | 15 +++++++++++ dracut-module-setup.sh | 54 ++++++++++++++++++++++++++++++++++++-- kdump-in-cluster-environment.txt | 56 ++++++++++++++++++++++++++++++++++++++++ kdump-lib.sh | 17 +++++++++++- kdumpctl | 26 +++++++++++++++++++ kexec-tools.spec | 3 +++ 6 files changed, 168 insertions(+), 3 deletions(-) create mode 100644 kdump-in-cluster-environment.txt
From: arthur zzou@redhat.com
Since kdump already support dump in cluster environment, this patch add a howto file to RPM package to describe how to configure kdump in cluster environment.
Signed-off-by: arthur zzou@redhat.com --- kdump-in-cluster-environment.txt | 56 ++++++++++++++++++++++++++++++++++++++++ kexec-tools.spec | 3 +++ 2 files changed, 59 insertions(+) create mode 100644 kdump-in-cluster-environment.txt
diff --git a/kdump-in-cluster-environment.txt b/kdump-in-cluster-environment.txt new file mode 100644 index 0000000..1e6a43a --- /dev/null +++ b/kdump-in-cluster-environment.txt @@ -0,0 +1,56 @@ +Kdump-in-cluster-environment HOWTO + +Introduction + +Kdump is a kexec based crash dumping mechansim for Linux. This docuement +illustrate how to configure kdump in cluster environment to allow the kdump +crash recovery service complete without being preempted by traditional power +fencing methods. + +Overview + +Kexec/Kdump + +Details about Kexec/Kdump are available in Kexec-Kdump-howto file and will not +be described here. + +fence_kdump + +fence_kdump is an I/O fencing agent to be used with the kdump crash recovery +service. When the fence_kdump agent is invoked, it will listen for a message +from the failed node that acknowledges that the failed node it executing the +kdump crash kernel. Note that fence_kdump is not a replacement for traditional +fencing methods. The fence_kdump agent can only detect that a node has entered +the kdump crash recovery service. This allows the kdump crash recovery service +complete without being preempted by traditional power fencing methods. + +How to configure cluster environment: + +If we want to use kdump in cluster environment, fence-agents-kdump should be +installed in every nodes in the cluster. You can achieve this via the following +command: + + # yum install -y fence-agents-kdump + +Next is to add kdump_fence to the cluster. Assuming that the cluster consists +of three nodes, they are node1, node2 and node3, and use Pacemaker to perform +resource management and pcs as cli configuration tool. + +With pcs it is easy to add a stonith resource to the cluster. For example, add +a stonith resource named mykdumpfence with fence type of fence_kdump via the +following commands: + + # pcs stonith create mykdumpfence fence_kdump \ + pcmk_host_check=static-list pcmk_host_list="node1 node2 node3" + # pcs stonith update mykdumpfence pcmk_monitor_action=metadata --force + # pcs stonith update mykdumpfence pcmk_status_action=metadata --force + # pcs stonith update mykdumpfence pcmk_reboot_action=off --force + +Then enable stonith + # pcs property set stonith-enabled=true + +How to configure kdump: + +Actually there is nothing special in configuration between normal kdump and +cluster environment kdump. So please refer to Kexec-Kdump-howto file for more +information. diff --git a/kexec-tools.spec b/kexec-tools.spec index 0b948c2..219bef1 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -25,6 +25,7 @@ Source17: rhcrashkernel-param Source18: kdump.sysconfig.s390x Source19: eppic_030413.tar.gz Source20: kdump-lib.sh +Source21: kdump-in-cluster-environment.txt
####################################### # These are sources for mkdumpramfs @@ -162,6 +163,7 @@ export CFLAGS="-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2" rm -f kexec-tools.spec.in # setup the docs cp %{SOURCE10} . +cp %{SOURCE21} .
make %ifarch %{ix86} x86_64 ia64 ppc64 s390x @@ -346,6 +348,7 @@ done %doc COPYING %doc TODO %doc kexec-kdump-howto.txt +%doc kdump-in-cluster-environment.txt
%ifarch %{ix86} x86_64 ia64 ppc64 s390x %files eppic
On Mon, Jan 13, 2014 at 06:23:07PM +0800, WANG Chao wrote:
From: arthur zzou@redhat.com
Since kdump already support dump in cluster environment, this patch add a howto file to RPM package to describe how to configure kdump in cluster environment.
Signed-off-by: arthur zzou@redhat.com
kdump-in-cluster-environment.txt | 56 ++++++++++++++++++++++++++++++++++++++++ kexec-tools.spec | 3 +++ 2 files changed, 59 insertions(+) create mode 100644 kdump-in-cluster-environment.txt
diff --git a/kdump-in-cluster-environment.txt b/kdump-in-cluster-environment.txt new file mode 100644 index 0000000..1e6a43a --- /dev/null +++ b/kdump-in-cluster-environment.txt @@ -0,0 +1,56 @@ +Kdump-in-cluster-environment HOWTO
+Introduction
+Kdump is a kexec based crash dumping mechansim for Linux. This docuement +illustrate how to configure kdump in cluster environment to allow the kdump +crash recovery service complete without being preempted by traditional power +fencing methods.
+Overview
+Kexec/Kdump
+Details about Kexec/Kdump are available in Kexec-Kdump-howto file and will not +be described here.
+fence_kdump
+fence_kdump is an I/O fencing agent to be used with the kdump crash recovery +service. When the fence_kdump agent is invoked, it will listen for a message +from the failed node that acknowledges that the failed node it executing the
^^^ s/it/is
+kdump crash kernel. Note that fence_kdump is not a replacement for traditional +fencing methods. The fence_kdump agent can only detect that a node has entered +the kdump crash recovery service. This allows the kdump crash recovery service +complete without being preempted by traditional power fencing methods.
Who sends the message that a node is saving crash dump?
+How to configure cluster environment:
+If we want to use kdump in cluster environment, fence-agents-kdump should be +installed in every nodes in the cluster. You can achieve this via the following +command:
- # yum install -y fence-agents-kdump
+Next is to add kdump_fence to the cluster. Assuming that the cluster consists +of three nodes, they are node1, node2 and node3, and use Pacemaker to perform +resource management and pcs as cli configuration tool.
+With pcs it is easy to add a stonith resource to the cluster. For example, add +a stonith resource named mykdumpfence with fence type of fence_kdump via the +following commands:
- # pcs stonith create mykdumpfence fence_kdump \
pcmk_host_check=static-list pcmk_host_list="node1 node2 node3"
- # pcs stonith update mykdumpfence pcmk_monitor_action=metadata --force
- # pcs stonith update mykdumpfence pcmk_status_action=metadata --force
- # pcs stonith update mykdumpfence pcmk_reboot_action=off --force
+Then enable stonith
- # pcs property set stonith-enabled=true
+How to configure kdump:
+Actually there is nothing special in configuration between normal kdump and +cluster environment kdump. So please refer to Kexec-Kdump-howto file for more +information.
I think we need to put some information here that how kdump sends the information to other nodes after crash and what configuration file is used to get node info etc.
Thanks Vivek
----- Original Message -----
On Mon, Jan 13, 2014 at 06:23:07PM +0800, WANG Chao wrote:
From: arthur zzou@redhat.com
Since kdump already support dump in cluster environment, this patch add a howto file to RPM package to describe how to configure kdump in cluster environment.
Signed-off-by: arthur zzou@redhat.com
kdump-in-cluster-environment.txt | 56 ++++++++++++++++++++++++++++++++++++++++ kexec-tools.spec | 3 +++ 2 files changed, 59 insertions(+) create mode 100644 kdump-in-cluster-environment.txt
diff --git a/kdump-in-cluster-environment.txt b/kdump-in-cluster-environment.txt new file mode 100644 index 0000000..1e6a43a --- /dev/null +++ b/kdump-in-cluster-environment.txt @@ -0,0 +1,56 @@ +Kdump-in-cluster-environment HOWTO
+Introduction
+Kdump is a kexec based crash dumping mechansim for Linux. This docuement +illustrate how to configure kdump in cluster environment to allow the kdump +crash recovery service complete without being preempted by traditional power +fencing methods.
+Overview
+Kexec/Kdump
+Details about Kexec/Kdump are available in Kexec-Kdump-howto file and will not +be described here.
+fence_kdump
+fence_kdump is an I/O fencing agent to be used with the kdump crash recovery +service. When the fence_kdump agent is invoked, it will listen for a message +from the failed node that acknowledges that the failed node it executing the
^^^
s/it/is
Hi Vivek, The fence_dump agent is invoked by pacemaker(cluster manager) in every nodes in the cluster. It runs like a deamon to listen for a message from the failed node(in our case is the one who is doing kdump). The failed node should send the message to other nodes in cluster to acknowledges itself is failed. In our case, That means when a node is executing the kdump crash kernel(means it is failed), itself should send the message to other nodes using fence_kdump_send command in the second kernel.
+kdump crash kernel. Note that fence_kdump is not a replacement for traditional +fencing methods. The fence_kdump agent can only detect that a node has entered +the kdump crash recovery service. This allows the kdump crash recovery service +complete without being preempted by traditional power fencing methods.
Who sends the message that a node is saving crash dump?
It is the node who is executing kdump crash kernel.
+How to configure cluster environment:
+If we want to use kdump in cluster environment, fence-agents-kdump should be +installed in every nodes in the cluster. You can achieve this via the following +command:
- # yum install -y fence-agents-kdump
+Next is to add kdump_fence to the cluster. Assuming that the cluster consists +of three nodes, they are node1, node2 and node3, and use Pacemaker to perform +resource management and pcs as cli configuration tool.
+With pcs it is easy to add a stonith resource to the cluster. For example, add +a stonith resource named mykdumpfence with fence type of fence_kdump via the +following commands:
- # pcs stonith create mykdumpfence fence_kdump \
pcmk_host_check=static-list pcmk_host_list="node1 node2 node3"
- # pcs stonith update mykdumpfence pcmk_monitor_action=metadata --force
- # pcs stonith update mykdumpfence pcmk_status_action=metadata --force
- # pcs stonith update mykdumpfence pcmk_reboot_action=off --force
+Then enable stonith
- # pcs property set stonith-enabled=true
+How to configure kdump:
+Actually there is nothing special in configuration between normal kdump and +cluster environment kdump. So please refer to Kexec-Kdump-howto file for more +information.
I think we need to put some information here that how kdump sends the information to other nodes after crash and what configuration file is used to get node info etc.
no problem.
Thanks arthur
Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.fedoraproject.org https://lists.fedoraproject.org/mailman/listinfo/kexec
Since kdump already support dump in cluster environment, this patch add a howto file to RPM package to describe how to configure kdump in cluster environment.
Signed-off-by: arthur zzou@redhat.com --- kdump-in-cluster-environment.txt | 66 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 kdump-in-cluster-environment.txt
diff --git a/kdump-in-cluster-environment.txt b/kdump-in-cluster-environment.txt new file mode 100644 index 0000000..c27a5d7 --- /dev/null +++ b/kdump-in-cluster-environment.txt @@ -0,0 +1,66 @@ +Kdump-in-cluster-environment HOWTO + +Introduction + +Kdump is a kexec based crash dumping mechansim for Linux. This docuement +illustrate how to configure kdump in cluster environment to allow the kdump +crash recovery service complete without being preempted by traditional power +fencing methods. + +Overview + +Kexec/Kdump + +Details about Kexec/Kdump are available in Kexec-Kdump-howto file and will not +be described here. + +fence_kdump + +fence_kdump is an I/O fencing agent to be used with the kdump crash recovery +service. When the fence_kdump agent is invoked, it will listen for a message +from the failed node that acknowledges that the failed node is executing the +kdump crash kernel. Note that fence_kdump is not a replacement for traditional +fencing methods. The fence_kdump agent can only detect that a node has entered +the kdump crash recovery service. This allows the kdump crash recovery service +complete without being preempted by traditional power fencing methods. + +fence_kdump_send + +fence_kdump_send is a utility used to send messages that acknowledge that the +node itself has entered the kdump crash recovery service. The fence_kdump_send +utility is typically run in the kdump kernel after a cluster node has +encountered a kernel panic. Once the cluster node has entered the kdump crash +recovery service, fence_kdump_send will periodically send messages to all +cluster nodes. When the fence_kdump agent receives a valid message from the +failed nodes, fencing is complete. + +How to configure cluster environment: + +If we want to use kdump in cluster environment, fence-agents-kdump should be +installed in every nodes in the cluster. You can achieve this via the following +command: + + # yum install -y fence-agents-kdump + +Next is to add kdump_fence to the cluster. Assuming that the cluster consists +of three nodes, they are node1, node2 and node3, and use Pacemaker to perform +resource management and pcs as cli configuration tool. + +With pcs it is easy to add a stonith resource to the cluster. For example, add +a stonith resource named mykdumpfence with fence type of fence_kdump via the +following commands: + + # pcs stonith create mykdumpfence fence_kdump \ + pcmk_host_check=static-list pcmk_host_list="node1 node2 node3" + # pcs stonith update mykdumpfence pcmk_monitor_action=metadata --force + # pcs stonith update mykdumpfence pcmk_status_action=metadata --force + # pcs stonith update mykdumpfence pcmk_reboot_action=off --force + +Then enable stonith + # pcs property set stonith-enabled=true + +How to configure kdump: + +Actually there is nothing special in configuration between normal kdump and +cluster environment kdump. So please refer to Kexec-Kdump-howto file for more +information.
On 01/16/14 at 02:33pm, Zhi Zou wrote:
Since kdump already support dump in cluster environment, this patch add a howto file to RPM package to describe how to configure kdump in cluster environment.
Signed-off-by: arthur zzou@redhat.com
CCing Vivek, dyoung
kdump-in-cluster-environment.txt | 66 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 kdump-in-cluster-environment.txt
diff --git a/kdump-in-cluster-environment.txt b/kdump-in-cluster-environment.txt new file mode 100644 index 0000000..c27a5d7 --- /dev/null +++ b/kdump-in-cluster-environment.txt @@ -0,0 +1,66 @@ +Kdump-in-cluster-environment HOWTO
+Introduction
+Kdump is a kexec based crash dumping mechansim for Linux. This docuement +illustrate how to configure kdump in cluster environment to allow the kdump +crash recovery service complete without being preempted by traditional power +fencing methods.
+Overview
+Kexec/Kdump
+Details about Kexec/Kdump are available in Kexec-Kdump-howto file and will not +be described here.
+fence_kdump
+fence_kdump is an I/O fencing agent to be used with the kdump crash recovery +service. When the fence_kdump agent is invoked, it will listen for a message +from the failed node that acknowledges that the failed node is executing the +kdump crash kernel. Note that fence_kdump is not a replacement for traditional +fencing methods. The fence_kdump agent can only detect that a node has entered +the kdump crash recovery service. This allows the kdump crash recovery service +complete without being preempted by traditional power fencing methods.
+fence_kdump_send
+fence_kdump_send is a utility used to send messages that acknowledge that the +node itself has entered the kdump crash recovery service. The fence_kdump_send +utility is typically run in the kdump kernel after a cluster node has +encountered a kernel panic. Once the cluster node has entered the kdump crash +recovery service, fence_kdump_send will periodically send messages to all +cluster nodes. When the fence_kdump agent receives a valid message from the +failed nodes, fencing is complete.
+How to configure cluster environment:
+If we want to use kdump in cluster environment, fence-agents-kdump should be +installed in every nodes in the cluster. You can achieve this via the following +command:
- # yum install -y fence-agents-kdump
+Next is to add kdump_fence to the cluster. Assuming that the cluster consists +of three nodes, they are node1, node2 and node3, and use Pacemaker to perform +resource management and pcs as cli configuration tool.
+With pcs it is easy to add a stonith resource to the cluster. For example, add +a stonith resource named mykdumpfence with fence type of fence_kdump via the +following commands:
- # pcs stonith create mykdumpfence fence_kdump \
pcmk_host_check=static-list pcmk_host_list="node1 node2 node3"
- # pcs stonith update mykdumpfence pcmk_monitor_action=metadata --force
- # pcs stonith update mykdumpfence pcmk_status_action=metadata --force
- # pcs stonith update mykdumpfence pcmk_reboot_action=off --force
+Then enable stonith
- # pcs property set stonith-enabled=true
+How to configure kdump:
+Actually there is nothing special in configuration between normal kdump and +cluster environment kdump. So please refer to Kexec-Kdump-howto file for more
+information.
1.8.4.2
kexec mailing list kexec@lists.fedoraproject.org https://lists.fedoraproject.org/mailman/listinfo/kexec
Hi, Arthur
Some cosmetic issues see below, I think Vivek would have more thoughts on both the topics and english itself.
On 01/17/14 at 12:47pm, WANG Chao wrote:
On 01/16/14 at 02:33pm, Zhi Zou wrote:
Since kdump already support dump in cluster environment, this patch add a howto file to RPM package to describe how to configure kdump in cluster environment.
Signed-off-by: arthur zzou@redhat.com
CCing Vivek, dyoung
kdump-in-cluster-environment.txt | 66 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 kdump-in-cluster-environment.txt
diff --git a/kdump-in-cluster-environment.txt b/kdump-in-cluster-environment.txt new file mode 100644 index 0000000..c27a5d7 --- /dev/null +++ b/kdump-in-cluster-environment.txt @@ -0,0 +1,66 @@ +Kdump-in-cluster-environment HOWTO
+Introduction
+Kdump is a kexec based crash dumping mechansim for Linux. This docuement +illustrate how to configure kdump in cluster environment to allow the kdump +crash recovery service complete without being preempted by traditional power +fencing methods.
+Overview
+Kexec/Kdump
+Details about Kexec/Kdump are available in Kexec-Kdump-howto file and will not +be described here.
The file name is kexec-kdump-howto.txt
+fence_kdump
+fence_kdump is an I/O fencing agent to be used with the kdump crash recovery +service. When the fence_kdump agent is invoked, it will listen for a message +from the failed node that acknowledges that the failed node is executing the +kdump crash kernel. Note that fence_kdump is not a replacement for traditional +fencing methods. The fence_kdump agent can only detect that a node has entered +the kdump crash recovery service. This allows the kdump crash recovery service +complete without being preempted by traditional power fencing methods.
+fence_kdump_send
+fence_kdump_send is a utility used to send messages that acknowledge that the +node itself has entered the kdump crash recovery service. The fence_kdump_send +utility is typically run in the kdump kernel after a cluster node has +encountered a kernel panic. Once the cluster node has entered the kdump crash +recovery service, fence_kdump_send will periodically send messages to all +cluster nodes. When the fence_kdump agent receives a valid message from the +failed nodes, fencing is complete.
+How to configure cluster environment:
+If we want to use kdump in cluster environment, fence-agents-kdump should be +installed in every nodes in the cluster. You can achieve this via the following +command:
- # yum install -y fence-agents-kdump
+Next is to add kdump_fence to the cluster. Assuming that the cluster consists +of three nodes, they are node1, node2 and node3, and use Pacemaker to perform +resource management and pcs as cli configuration tool.
+With pcs it is easy to add a stonith resource to the cluster. For example, add +a stonith resource named mykdumpfence with fence type of fence_kdump via the +following commands:
- # pcs stonith create mykdumpfence fence_kdump \
pcmk_host_check=static-list pcmk_host_list="node1 node2 node3"
- # pcs stonith update mykdumpfence pcmk_monitor_action=metadata --force
- # pcs stonith update mykdumpfence pcmk_status_action=metadata --force
- # pcs stonith update mykdumpfence pcmk_reboot_action=off --force
+Then enable stonith
- # pcs property set stonith-enabled=true
+How to configure kdump:
+Actually there is nothing special in configuration between normal kdump and +cluster environment kdump. So please refer to Kexec-Kdump-howto file for more +information.
kexec-kdump-howto.txt
BTW, there's some white space at end of lines, please remove them.
Thanks Dave
Add following common variables and function:
$FENCE_KDUMP_CONIFG: configuration file /etc/sysconfig/fence_kdump $FENCE_KDUMP_NODES: configuration file /etc/fence_kdump_nodes $FENCE_KDUMP_SEND: executable /usr/libexec/fence_kdump_send is_fence_kdump(): used to determine if the system is in a cluster and configured with fence_kdump.
Signed-off-by: WANG Chao chaowang@redhat.com --- kdump-lib.sh | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/kdump-lib.sh b/kdump-lib.sh index e73ac09..aac0c5f 100755 --- a/kdump-lib.sh +++ b/kdump-lib.sh @@ -1,8 +1,12 @@ #!/bin/sh # -# Kdump common functions +# Kdump common variables and functions #
+FENCE_KDUMP_CONFIG="/etc/sysconfig/fence_kdump" +FENCE_KDUMP_SEND="/usr/libexec/fence_kdump_send" +FENCE_KDUMP_NODES="/etc/fence_kdump_nodes" + is_ssh_dump_target() { grep -q "^ssh[[:blank:]].*@" /etc/kdump.conf @@ -22,3 +26,14 @@ strip_comments() { echo $@ | sed -e 's/(.*)#.*/\1/' } + +# Check if fence kdump is configured in cluster +is_fence_kdump() +{ + # no pcs or fence_kdump_send executables installed? + type -P pcs > /dev/null || return 1 + [ -x $FENCE_KDUMP_SEND ] || return 1 + + # fence kdump not configured? + (pcs cluster cib | grep -q 'type="fence_kdump"') &> /dev/null || return 1 +}
If the system is configured fence kdump, we need to update kdump initramfs if cluster or fence_kdump config is newer.
In RHEL7, cluster config is no longer keeping locally but stored remotely. Fortunately we can use a pcs tool to retrieve the xml based config and parse the last changed time from that.
/etc/sysconfig/fence_kdump is used to configure runtime arguments to fence_kdump_send. So We have to pass the arguments to 2nd kernel.
When cluster config or /etc/sysconfig/fence_kdump is newer than local kdump initramfs, we must rebuild initramfs to adapt changes in cluster.
For example:
Detected change(s) the following file(s):
cluster-cib /etc/sysconfig/fence_kdump Rebuilding /boot/initramfs-xxxkdump.img [..]
Signed-off-by: WANG Chao chaowang@redhat.com --- kdumpctl | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/kdumpctl b/kdumpctl index 46ae633..abcdffd 100755 --- a/kdumpctl +++ b/kdumpctl @@ -132,6 +132,25 @@ function check_config() return 0 }
+# check_fence_kdump <image timestamp> +# return 0 if fence_kdump is configured and kdump initrd needs to be rebuilt +function check_fence_kdump() +{ + local image_time=$1 + local cib_time + + is_fence_kdump || return 1 + + cib_time=`pcs cluster cib | xmllint --xpath 'string(/cib/@cib-last-written)' - | \ + xargs -0 date +%s --date` + + if [ -z $cib_time -o $cib_time -le $image_time ]; then + return 1 + fi + + return 0 +} + function check_rebuild() { local extra_modules modified_files="" @@ -167,6 +186,9 @@ function check_rebuild() image_time=0 fi
+ #also rebuild when cluster conf is changed and fence kdump is enabled. + check_fence_kdump $image_time && modified_files="cluster-cib" + EXTRA_BINS=`grep ^kdump_post $KDUMP_CONFIG_FILE | cut -d\ -f2` CHECK_FILES=`grep ^kdump_pre $KDUMP_CONFIG_FILE | cut -d\ -f2` EXTRA_BINS="$EXTRA_BINS $CHECK_FILES" @@ -174,6 +196,10 @@ function check_rebuild() EXTRA_BINS="$EXTRA_BINS $CHECK_FILES" files="$KDUMP_CONFIG_FILE $kdump_kernel $EXTRA_BINS"
+ if [ -f $FENCE_KDUMP_CONFIG ]; then + files="$files $FENCE_KDUMP_CONFIG" + fi + check_exist "$files" && check_executable "$EXTRA_BINS" [ $? -ne 0 ] && return 1
On Mon, Jan 13, 2014 at 06:23:09PM +0800, WANG Chao wrote:
If the system is configured fence kdump, we need to update kdump initramfs if cluster or fence_kdump config is newer.
In RHEL7, cluster config is no longer keeping locally but stored remotely. Fortunately we can use a pcs tool to retrieve the xml based config and parse the last changed time from that.
/etc/sysconfig/fence_kdump is used to configure runtime arguments to fence_kdump_send. So We have to pass the arguments to 2nd kernel.
When cluster config or /etc/sysconfig/fence_kdump is newer than local kdump initramfs, we must rebuild initramfs to adapt changes in cluster.
For example:
Detected change(s) the following file(s):
cluster-cib /etc/sysconfig/fence_kdump
Chao,
What is cluster-cib file and what info does it contain?
Thanks Vivek
Rebuilding /boot/initramfs-xxxkdump.img [..]
Signed-off-by: WANG Chao chaowang@redhat.com
kdumpctl | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/kdumpctl b/kdumpctl index 46ae633..abcdffd 100755 --- a/kdumpctl +++ b/kdumpctl @@ -132,6 +132,25 @@ function check_config() return 0 }
+# check_fence_kdump <image timestamp> +# return 0 if fence_kdump is configured and kdump initrd needs to be rebuilt +function check_fence_kdump() +{
- local image_time=$1
- local cib_time
- is_fence_kdump || return 1
- cib_time=`pcs cluster cib | xmllint --xpath 'string(/cib/@cib-last-written)' - | \
xargs -0 date +%s --date`
- if [ -z $cib_time -o $cib_time -le $image_time ]; then
return 1
- fi
- return 0
+}
function check_rebuild() { local extra_modules modified_files="" @@ -167,6 +186,9 @@ function check_rebuild() image_time=0 fi
- #also rebuild when cluster conf is changed and fence kdump is enabled.
- check_fence_kdump $image_time && modified_files="cluster-cib"
- EXTRA_BINS=`grep ^kdump_post $KDUMP_CONFIG_FILE | cut -d\ -f2` CHECK_FILES=`grep ^kdump_pre $KDUMP_CONFIG_FILE | cut -d\ -f2` EXTRA_BINS="$EXTRA_BINS $CHECK_FILES"
@@ -174,6 +196,10 @@ function check_rebuild() EXTRA_BINS="$EXTRA_BINS $CHECK_FILES" files="$KDUMP_CONFIG_FILE $kdump_kernel $EXTRA_BINS"
- if [ -f $FENCE_KDUMP_CONFIG ]; then
files="$files $FENCE_KDUMP_CONFIG"
- fi
- check_exist "$files" && check_executable "$EXTRA_BINS" [ $? -ne 0 ] && return 1
-- 1.8.4.2
kexec mailing list kexec@lists.fedoraproject.org https://lists.fedoraproject.org/mailman/listinfo/kexec
On 01/21/14 at 04:32pm, Vivek Goyal wrote:
On Mon, Jan 13, 2014 at 06:23:09PM +0800, WANG Chao wrote:
If the system is configured fence kdump, we need to update kdump initramfs if cluster or fence_kdump config is newer.
In RHEL7, cluster config is no longer keeping locally but stored remotely. Fortunately we can use a pcs tool to retrieve the xml based config and parse the last changed time from that.
/etc/sysconfig/fence_kdump is used to configure runtime arguments to fence_kdump_send. So We have to pass the arguments to 2nd kernel.
When cluster config or /etc/sysconfig/fence_kdump is newer than local kdump initramfs, we must rebuild initramfs to adapt changes in cluster.
For example:
Detected change(s) the following file(s):
cluster-cib /etc/sysconfig/fence_kdump
Chao,
What is cluster-cib file and what info does it contain?
Hi, Vivek
AFAICT cib.xml is a cluster configuration file, functioning just like /etc/cluster.conf does in RHEL6. In RHEL7, /etc/cluster.conf is no longer exists, instead user can run `pcs cluster cib > cib.xml` retrieve it as a xml format file.
I was given a sample cib.xml by Marek. Here it is:
<cib epoch="5" num_updates="11" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Wed Oct 23 12:12:32 2014" update-origin="slovan" update-client="cibadmin" crm_feature_set="3.0.7" have-quorum="1" dc-uuid="1"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.9-3.fc19-781a388"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/> </cluster_property_set> </crm_config> <nodes> <node id="1" uname="slovan"/> <node id="2" uname="sparta"/> </nodes> <resources> <primitive class="stonith" id="fence_kkk" type="fence_kdump"> <instance_attributes id="fence_kkk-instance_attributes"> <nvpair id="fence_kkk-instance_attributes-ipaddr" name="ipaddr" value="pdu-bar.englab.brq.redhat.com"/> <nvpair id="fence_kkk-instance_attributes-login" name="login" value="labapc"/> <nvpair id="fence_kkk-instance_attributes-passwd" name="passwd" value="labapc"/> <nvpair id="fence_kkk-instance_attributes-port" name="port" value="9"/> </instance_attributes> </primitive> </resources> <constraints/> </configuration> <status> <node_state id="2" uname="sparta" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member"> <lrm id="2"> <lrm_resources> <lrm_resource id="fence_kkk" type="fence_kdump" class="stonith"> <lrm_rsc_op id="fence_kkk_last_0" operation_key="fence_kkk_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.0.7" transition-key="1:10:0:3177ca3b-4aef-4266-ab43-a8c51f9565dc" transition-magic="0:0;1:10:0:3177ca3b-4aef-4266-ab43-a8c51f9565dc" call-id="11" rc-code="0" op-status="0" interval="0" last-run="1382523178" last-rc-change="1382523178" exec-time="0" queue-time="0" op-digest="43708ee5920e76e9ae2d5bb37d67dae6"/> <lrm_rsc_op id="fence_kkk_last_failure_0" operation_key="fence_kkk_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.7" transition-key="5:9:0:3177ca3b-4aef-4266-ab43-a8c51f9565dc" transition-magic="4:1;5:9:0:3177ca3b-4aef-4266-ab43-a8c51f9565dc" call-id="8" rc-code="1" op-status="4" interval="0" last-run="1382523164" last-rc-change="1382523164" exec-time="12519" queue-time="0" op-digest="43708ee5920e76e9ae2d5bb37d67dae6"/> </lrm_resource> </lrm_resources> </lrm> <transient_attributes id="2"> <instance_attributes id="status-2"> <nvpair id="status-2-probe_complete" name="probe_complete" value="true"/> <nvpair id="status-2-fail-count-fence_kkk" name="fail-count-fence_kkk" value="INFINITY"/> <nvpair id="status-2-last-failure-fence_kkk" name="last-failure-fence_kkk" value="1382523177"/> </instance_attributes> </transient_attributes> </node_state> <node_state id="1" uname="slovan" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member"> <lrm id="1"> <lrm_resources> <lrm_resource id="fence_kkk" type="fence_kdump" class="stonith"> <lrm_rsc_op id="fence_kkk_last_0" operation_key="fence_kkk_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.0.7" transition-key="1:8:0:3177ca3b-4aef-4266-ab43-a8c51f9565dc" transition-magic="0:0;1:8:0:3177ca3b-4aef-4266-ab43-a8c51f9565dc" call-id="11" rc-code="0" op-status="0" interval="0" last-run="1382523164" last-rc-change="1382523164" exec-time="0" queue-time="0" op-digest="43708ee5920e76e9ae2d5bb37d67dae6"/> <lrm_rsc_op id="fence_kkk_last_failure_0" operation_key="fence_kkk_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.7" transition-key="7:7:0:3177ca3b-4aef-4266-ab43-a8c51f9565dc" transition-magic="4:1;7:7:0:3177ca3b-4aef-4266-ab43-a8c51f9565dc" call-id="8" rc-code="1" op-status="4" interval="0" last-run="1382523154" last-rc-change="1382523154" exec-time="8406" queue-time="0" op-digest="43708ee5920e76e9ae2d5bb37d67dae6"/> </lrm_resource> </lrm_resources> </lrm> <transient_attributes id="1"> <instance_attributes id="status-1"> <nvpair id="status-1-probe_complete" name="probe_complete" value="true"/> <nvpair id="status-1-fail-count-fence_kkk" name="fail-count-fence_kkk" value="INFINITY"/> <nvpair id="status-1-last-failure-fence_kkk" name="last-failure-fence_kkk" value="1382523164"/> </instance_attributes> </transient_attributes> </node_state> </status> </cib>
Thanks WANG Chao
Thanks Vivek
Rebuilding /boot/initramfs-xxxkdump.img [..]
Signed-off-by: WANG Chao chaowang@redhat.com
kdumpctl | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/kdumpctl b/kdumpctl index 46ae633..abcdffd 100755 --- a/kdumpctl +++ b/kdumpctl @@ -132,6 +132,25 @@ function check_config() return 0 }
+# check_fence_kdump <image timestamp> +# return 0 if fence_kdump is configured and kdump initrd needs to be rebuilt +function check_fence_kdump() +{
- local image_time=$1
- local cib_time
- is_fence_kdump || return 1
- cib_time=`pcs cluster cib | xmllint --xpath 'string(/cib/@cib-last-written)' - | \
xargs -0 date +%s --date`
- if [ -z $cib_time -o $cib_time -le $image_time ]; then
return 1
- fi
- return 0
+}
function check_rebuild() { local extra_modules modified_files="" @@ -167,6 +186,9 @@ function check_rebuild() image_time=0 fi
- #also rebuild when cluster conf is changed and fence kdump is enabled.
- check_fence_kdump $image_time && modified_files="cluster-cib"
- EXTRA_BINS=`grep ^kdump_post $KDUMP_CONFIG_FILE | cut -d\ -f2` CHECK_FILES=`grep ^kdump_pre $KDUMP_CONFIG_FILE | cut -d\ -f2` EXTRA_BINS="$EXTRA_BINS $CHECK_FILES"
@@ -174,6 +196,10 @@ function check_rebuild() EXTRA_BINS="$EXTRA_BINS $CHECK_FILES" files="$KDUMP_CONFIG_FILE $kdump_kernel $EXTRA_BINS"
- if [ -f $FENCE_KDUMP_CONFIG ]; then
files="$files $FENCE_KDUMP_CONFIG"
- fi
- check_exist "$files" && check_executable "$EXTRA_BINS" [ $? -ne 0 ] && return 1
-- 1.8.4.2
kexec mailing list kexec@lists.fedoraproject.org https://lists.fedoraproject.org/mailman/listinfo/kexec
In 2nd kernel, to prevent the crashed system from being fenced off, fence kdump message must be send to other nodes in the cluster periodically before dumping process.
We preserve every node's name in /etc/fence_kdump_nodes in the initrd, so we parse this file and send notify them.
Signed-off-by: WANG Chao chaowang@redhat.com --- dracut-kdump.sh | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/dracut-kdump.sh b/dracut-kdump.sh index 4d8616f..324408f 100755 --- a/dracut-kdump.sh +++ b/dracut-kdump.sh @@ -287,6 +287,19 @@ read_kdump_conf() done < $conf_file }
+fence_kdump_notify() +{ + if [ -f $FENCE_KDUMP_NODES ]; then + if [ -f $FENCE_KDUMP_CONFIG ]; then + . $FENCE_KDUMP_CONFIG + fi + + read nodes < $FENCE_KDUMP_NODES + $FENCE_KDUMP_SEND $FENCE_KDUMP_OPTS $nodes & + fi +} + +fence_kdump_notify read_kdump_conf
if [ -z "$CORE_COLLECTOR" ];then
On Mon, Jan 13, 2014 at 06:23:10PM +0800, WANG Chao wrote:
In 2nd kernel, to prevent the crashed system from being fenced off, fence kdump message must be send to other nodes in the cluster periodically before dumping process.
We preserve every node's name in /etc/fence_kdump_nodes in the initrd, so we parse this file and send notify them.
Signed-off-by: WANG Chao chaowang@redhat.com
dracut-kdump.sh | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/dracut-kdump.sh b/dracut-kdump.sh index 4d8616f..324408f 100755 --- a/dracut-kdump.sh +++ b/dracut-kdump.sh @@ -287,6 +287,19 @@ read_kdump_conf() done < $conf_file }
+fence_kdump_notify() +{
- if [ -f $FENCE_KDUMP_NODES ]; then
if [ -f $FENCE_KDUMP_CONFIG ]; then
. $FENCE_KDUMP_CONFIG
fi
read nodes < $FENCE_KDUMP_NODES
$FENCE_KDUMP_SEND $FENCE_KDUMP_OPTS $nodes &
Chao,
I think make "nodes" local.
Also FENCE_KDUMP_OPTS is set by FENCE_KDUMP_CONFIG?
Thanks Vivek
On 01/22/14 at 11:44am, Vivek Goyal wrote:
On Mon, Jan 13, 2014 at 06:23:10PM +0800, WANG Chao wrote:
In 2nd kernel, to prevent the crashed system from being fenced off, fence kdump message must be send to other nodes in the cluster periodically before dumping process.
We preserve every node's name in /etc/fence_kdump_nodes in the initrd, so we parse this file and send notify them.
Signed-off-by: WANG Chao chaowang@redhat.com
dracut-kdump.sh | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/dracut-kdump.sh b/dracut-kdump.sh index 4d8616f..324408f 100755 --- a/dracut-kdump.sh +++ b/dracut-kdump.sh @@ -287,6 +287,19 @@ read_kdump_conf() done < $conf_file }
+fence_kdump_notify() +{
- if [ -f $FENCE_KDUMP_NODES ]; then
if [ -f $FENCE_KDUMP_CONFIG ]; then
. $FENCE_KDUMP_CONFIG
fi
read nodes < $FENCE_KDUMP_NODES
$FENCE_KDUMP_SEND $FENCE_KDUMP_OPTS $nodes &
Chao,
I think make "nodes" local.
Will change.
Also FENCE_KDUMP_OPTS is set by FENCE_KDUMP_CONFIG?
Yes, like we do for /etc/sysconfig/kdump.
Thanks Vivek
This patch is used to setup fence kdump environment when building kdump initrd: 1. Check if it's cluster and fence_kdump is configured. 2. Get all the nodes in the cluster and pass them to 2nd kernel via /etc/fence_kdump_nodes 3. Setup network interface which will be used by fence kdump notifier in 2nd kernel. 4. Install fence kdump notifier (/usr/libexec/fence_kdump_send) to initrd.
Signed-off-by: WANG Chao chaowang@redhat.com --- dracut-module-setup.sh | 45 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index c013430..02f0280 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -20,6 +20,10 @@ depends() { _dep="$_dep drm" fi
+ if is_fence_kdump; then + _dep="$_dep network" + fi + echo $_dep return 0 } @@ -234,9 +238,14 @@ kdump_install_net() { fi
kdump_setup_netdev "${_netdev}" + #save netdev used for kdump as cmdline - echo "kdumpnic=${_netdev}" > ${initdir}/etc/cmdline.d/60kdumpnic.conf - echo "bootdev=${_netdev}" > ${initdir}/etc/cmdline.d/70bootdev.conf + #fence kdump would override bootdev and kdumpnic, we should avoid that. + if [ ! -f ${initdir}${initdir}/etc/cmdline.d/60kdumpnic.conf ] && + [ ! -f ${initdir}/etc/cmdline.d/70bootdev.conf ]; then + echo "kdumpnic=${_netdev}" > ${initdir}/etc/cmdline.d/60kdumpnic.conf + echo "bootdev=${_netdev}" > ${initdir}/etc/cmdline.d/70bootdev.conf + fi }
#install kdump.conf and what user specifies in kdump.conf @@ -263,6 +272,7 @@ kdump_install_conf() { esac done < /etc/kdump.conf
+ kdump_check_fence_kdump inst "/tmp/$$-kdump.conf" "/etc/kdump.conf" rm -f /tmp/$$-kdump.conf } @@ -393,6 +403,37 @@ kdump_check_iscsi_targets () { }
+# setup fence_kdump in cluster +# setup proper network and install needed files +# also preserve '[node list]' for 2nd kernel /etc/fence_kdump_nodes +kdump_check_fence_kdump () { + local nodes + is_fence_kdump || return 1 + + # get cluster nodes from cluster cib, get interface and ip address + nodelist=`pcs cluster cib | xmllint --xpath "/cib/status/node_state/@uname" -` + + # nodelist is formed as 'uname="node1" uname="node2" ... uname="nodeX"' + # we need to convert each to node1, node2 ... nodeX in each iteration + for node in ${nodelist}; do + # convert $node from 'uname="nodeX"' to 'nodeX' + eval $node + nodename=$uname + # Skip its own node name + if [ "$nodename" = `hostname` ]; then + continue + fi + nodes="$nodes $nodename" + + kdump_install_net $nodename + done + echo + + echo "$nodes" > ${initdir}/$FENCE_KDUMP_NODES + dracut_install $FENCE_KDUMP_SEND + dracut_install -o $FENCE_KDUMP_CONFIG +} + install() { kdump_install_conf >"$initdir/lib/dracut/no-emergency-shell"
On Mon, Jan 13, 2014 at 06:23:11PM +0800, WANG Chao wrote:
[..]
#save netdev used for kdump as cmdline
- echo "kdumpnic=${_netdev}" > ${initdir}/etc/cmdline.d/60kdumpnic.conf
- echo "bootdev=${_netdev}" > ${initdir}/etc/cmdline.d/70bootdev.conf
- #fence kdump would override bootdev and kdumpnic, we should avoid that.
- if [ ! -f ${initdir}${initdir}/etc/cmdline.d/60kdumpnic.conf ] &&
[ ! -f ${initdir}/etc/cmdline.d/70bootdev.conf ]; then
echo "kdumpnic=${_netdev}" > ${initdir}/etc/cmdline.d/60kdumpnic.conf
echo "bootdev=${_netdev}" > ${initdir}/etc/cmdline.d/70bootdev.conf
- fi
Chao,
What's this change. Can you explain a bit.
Thanks Vivek
On 01/22/14 at 12:59pm, Vivek Goyal wrote:
On Mon, Jan 13, 2014 at 06:23:11PM +0800, WANG Chao wrote:
[..]
#save netdev used for kdump as cmdline
- echo "kdumpnic=${_netdev}" > ${initdir}/etc/cmdline.d/60kdumpnic.conf
- echo "bootdev=${_netdev}" > ${initdir}/etc/cmdline.d/70bootdev.conf
- #fence kdump would override bootdev and kdumpnic, we should avoid that.
- if [ ! -f ${initdir}${initdir}/etc/cmdline.d/60kdumpnic.conf ] &&
[ ! -f ${initdir}/etc/cmdline.d/70bootdev.conf ]; then
echo "kdumpnic=${_netdev}" > ${initdir}/etc/cmdline.d/60kdumpnic.conf
echo "bootdev=${_netdev}" > ${initdir}/etc/cmdline.d/70bootdev.conf
- fi
Chao,
What's this change. Can you explain a bit.
Hi, Vivek
In case of network dump, bootdev and kdumpnic should be set to the NIC routing to the network target, so that dracut is aware of which NIC needs to be setup and configured as the gateway. It's fine that we just set bootdev and kdumpnic without checking if these two are already set. Because we know there is only one bootdev and kdumpnic in 2nd kernel.
However when fence_kdump comes into the picture, there might be several NICs connecting to different target (different nodes and network dump target). We need to ensure that the NIC that connects to the network dump target works as the default one in the system (IE. to be bootdev and kdumpnic) and the NICs that connects to other cluster nodes not to be in the default route path.
In kdump-module-setup.sh, I'm re-using the setup network function. So I figure out this way to avoid fence_kdump NIC overriding bootdev and kdumpnic. Because setup dump target NIC is earlier then setup the fence kdump NIC.
I know this part is a little bit tricky. But it does avoid some problems we might face in case of a network dump in cluster environment...
Thanks WANG Chao
On Fri, Jan 24, 2014 at 11:57:03AM +0800, WANG Chao wrote:
On 01/22/14 at 12:59pm, Vivek Goyal wrote:
On Mon, Jan 13, 2014 at 06:23:11PM +0800, WANG Chao wrote:
[..]
#save netdev used for kdump as cmdline
- echo "kdumpnic=${_netdev}" > ${initdir}/etc/cmdline.d/60kdumpnic.conf
- echo "bootdev=${_netdev}" > ${initdir}/etc/cmdline.d/70bootdev.conf
- #fence kdump would override bootdev and kdumpnic, we should avoid that.
- if [ ! -f ${initdir}${initdir}/etc/cmdline.d/60kdumpnic.conf ] &&
[ ! -f ${initdir}/etc/cmdline.d/70bootdev.conf ]; then
echo "kdumpnic=${_netdev}" > ${initdir}/etc/cmdline.d/60kdumpnic.conf
echo "bootdev=${_netdev}" > ${initdir}/etc/cmdline.d/70bootdev.conf
- fi
Chao,
What's this change. Can you explain a bit.
Hi, Vivek
In case of network dump, bootdev and kdumpnic should be set to the NIC routing to the network target, so that dracut is aware of which NIC needs to be setup and configured as the gateway. It's fine that we just set bootdev and kdumpnic without checking if these two are already set. Because we know there is only one bootdev and kdumpnic in 2nd kernel.
However when fence_kdump comes into the picture, there might be several NICs connecting to different target (different nodes and network dump target). We need to ensure that the NIC that connects to the network dump target works as the default one in the system (IE. to be bootdev and kdumpnic) and the NICs that connects to other cluster nodes not to be in the default route path.
In kdump-module-setup.sh, I'm re-using the setup network function. So I figure out this way to avoid fence_kdump NIC overriding bootdev and kdumpnic. Because setup dump target NIC is earlier then setup the fence kdump NIC.
I know this part is a little bit tricky. But it does avoid some problems we might face in case of a network dump in cluster environment...
Ok, got it. So first person who calls kdump_install_net() gets to set the default gateway in kdump kernel?
This is kind of little odd. I guess it will work for time being, but I wished there was a better mechanism to determine which network interface should act as default gateway in second kernel.
Also it would be good to put few lines of comments explaining this in code so that when next time we are reading the code, it is easier to understand.
Thanks Vivek
In the remote dump case, and if fence kdump is configured, it's almost 100% sure that the same network interface will be setup more than once. One time for network dump, the other times for fence kdump. The result is we will have two or more duplicated ip= configuration in 40ip.conf.
These are exactly duplicates, however dracut will refuse to continue and raise a fatal error if there are duplicated configuration for the same interface. We should simply remove the duplicates to avoid this awkward situation.
Signed-off-by: WANG Chao chaowang@redhat.com --- dracut-module-setup.sh | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index 02f0280..725949f 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -179,6 +179,13 @@ kdump_setup_znet() { echo rd.znet=${NETTYPE},${SUBCHANNELS}${_options} > ${initdir}/etc/cmdline.d/30znet.conf }
+# Remove duplicate ip configurations in 40ip.conf +kdump_remove_dupliate_ip_opts() { + mv ${initdir}/etc/cmdline.d/40ip.conf ${initdir}/etc/cmdline.d/40ip.conf.tmp + sort ${initdir}/etc/cmdline.d/40ip.conf.tmp | uniq > ${initdir}/etc/cmdline.d/40ip.conf + rm -f ${initdir}/etc/cmdline.d/40ip.conf.tmp +} + # Setup dracut to bringup a given network interface kdump_setup_netdev() { local _netdev=$1 @@ -210,6 +217,10 @@ kdump_setup_netdev() { echo " ifname=$_netdev:$(kdump_get_mac_addr $_netdev)" >> ${initdir}/etc/cmdline.d/40ip.conf fi
+ # dracut doesn't allow duplicated configuration for same NIC, even they're exactly the same. + # so we have to filter out the duplicates. + kdump_remove_dupliate_ip_opts + kdump_setup_dns "$_netdev" }
On Mon, Jan 13, 2014 at 06:23:12PM +0800, WANG Chao wrote:
In the remote dump case, and if fence kdump is configured, it's almost 100% sure that the same network interface will be setup more than once.
Don't call it 100%. It might happen that there are two network cards. One is serving cluster network and other is service other network where remote destination is.
One time for network dump, the other times for fence kdump. The result is we will have two or more duplicated ip= configuration in 40ip.conf.
These are exactly duplicates, however dracut will refuse to continue and raise a fatal error if there are duplicated configuration for the same interface. We should simply remove the duplicates to avoid this awkward situation.
Signed-off-by: WANG Chao chaowang@redhat.com
dracut-module-setup.sh | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index 02f0280..725949f 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -179,6 +179,13 @@ kdump_setup_znet() { echo rd.znet=${NETTYPE},${SUBCHANNELS}${_options} > ${initdir}/etc/cmdline.d/30znet.conf }
+# Remove duplicate ip configurations in 40ip.conf +kdump_remove_dupliate_ip_opts() {
- mv ${initdir}/etc/cmdline.d/40ip.conf ${initdir}/etc/cmdline.d/40ip.conf.tmp
- sort ${initdir}/etc/cmdline.d/40ip.conf.tmp | uniq > ${initdir}/etc/cmdline.d/40ip.conf
- rm -f ${initdir}/etc/cmdline.d/40ip.conf.tmp
+}
Instead of removing duplicates later, why not check for duplicates while adding it to 40ip.conf file and not add it if same configuration is already present.
Thanks Vivek
On 01/22/14 at 01:13pm, Vivek Goyal wrote:
On Mon, Jan 13, 2014 at 06:23:12PM +0800, WANG Chao wrote:
In the remote dump case, and if fence kdump is configured, it's almost 100% sure that the same network interface will be setup more than once.
Don't call it 100%. It might happen that there are two network cards. One is serving cluster network and other is service other network where remote destination is.
Yes, I'll change.
One time for network dump, the other times for fence kdump. The result is we will have two or more duplicated ip= configuration in 40ip.conf.
These are exactly duplicates, however dracut will refuse to continue and raise a fatal error if there are duplicated configuration for the same interface. We should simply remove the duplicates to avoid this awkward situation.
Signed-off-by: WANG Chao chaowang@redhat.com
dracut-module-setup.sh | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/dracut-module-setup.sh b/dracut-module-setup.sh index 02f0280..725949f 100755 --- a/dracut-module-setup.sh +++ b/dracut-module-setup.sh @@ -179,6 +179,13 @@ kdump_setup_znet() { echo rd.znet=${NETTYPE},${SUBCHANNELS}${_options} > ${initdir}/etc/cmdline.d/30znet.conf }
+# Remove duplicate ip configurations in 40ip.conf +kdump_remove_dupliate_ip_opts() {
- mv ${initdir}/etc/cmdline.d/40ip.conf ${initdir}/etc/cmdline.d/40ip.conf.tmp
- sort ${initdir}/etc/cmdline.d/40ip.conf.tmp | uniq > ${initdir}/etc/cmdline.d/40ip.conf
- rm -f ${initdir}/etc/cmdline.d/40ip.conf.tmp
+}
Instead of removing duplicates later, why not check for duplicates while adding it to 40ip.conf file and not add it if same configuration is already present.
I think either approach is ending up with the same result. But since you and dyoung both suggest to do that, I'll change.
On 01/13/2014 11:23 AM, WANG Chao wrote:
This is a patchset to add fence kdump support.
In cluster environment, fence kdump is used to notify all the other nodes that current is crashed and stop from being fenced off.
The patchset has the following features:
- rebuild kdump initrd regarding timestamp of fence kdump config or cluster configuration.
- setup a required working environment for fence kdump in 2nd kernel.
- fence_kdump_send notify other nodes to stop the crashed one being fenced off before dumping process.
- add kdump-in-cluster-environment.txt
Hi,
I have tested this patch on my cluster environment (2-node virtual cluster) and it works correctly. Great work guys.
There are some steps which are not intuitive enough, so I'm including my test scenario:
1) standard installation and configuration of kexec-tools 2) standard installation of corosync/pacemaker cluster 3) setup fence_kdump [integration is not seamless but problem is not in kexec-tools]
pcs stonith update myfence pcmk_monitor_action=metadata --force pcs stonith update myfence pcmk_status_action=metadata --force pcs stonith update myfence pcmk_reboot_action=off --force
if you have a lot of memory, you should set fence_kdump to wait longer (default 60 seconds) pcs stonith update myfence pcmk_reboot_timeout=600 --force
this is an example output of my fence agent (pcs stonith show myfence): Resource: myfence (class=stonith type=fence_kdump) Attributes: pcmk_host_list="r7a r7b" pcmk_host_check=static-list pcmk_monitor_action=metadata pcmk_status_action=metadata pcmk_reboot_action=off Operations: monitor interval=60s (myfence-monitor-interval-60s)
4) kdumpctl restart (on both nodes) 5) on node B run: echo c > /proc/sysrq-trigger
6) on node A check /var/log/messages, you should find there: Jan 13 12:24:39 nodeA fence_kdump[10862]: waiting for message from '192.168.122.52' Jan 13 12:24:41 nodeA fence_kdump[10862]: received valid message from '192.168.122.52'
This means that nodeB currently executed fence_kdump_send
m,
Hi, Marek
if you have a lot of memory, you should set fence_kdump to wait longer (default 60 seconds) pcs stonith update myfence pcmk_reboot_timeout=600 --force
Large memory system might need hours to finish the vmcore capturing so 60 seconds is not enough, could you help to increase the default value? I think there's no side effect to set it as INT_MAX?
Thanks Dave
On 01/16/14 at 11:35am, Dave Young wrote:
Hi, Marek
if you have a lot of memory, you should set fence_kdump to wait longer (default 60 seconds) pcs stonith update myfence pcmk_reboot_timeout=600 --force
Large memory system might need hours to finish the vmcore capturing so 60 seconds is not enough, could you help to increase the default value? I think there's no side effect to set it as INT_MAX?
I'm a lot confused with the option "pcmk_reboot_timeout".
In kdump environment, fence_kdump_send is running background and send an acknowledge message out every 10 seconds (by default) indefinately (by default).
So how does the timeout works? 1. Will it be reset and count down again when it receives a valid message from the crashed node? 2. Will fence kdump agent wait for time out after the very first message is received?
Thanks WANG Chao
Hi,
On 01/16/2014 06:14 AM, WANG Chao wrote:
On 01/16/14 at 11:35am, Dave Young wrote:
Hi, Marek
if you have a lot of memory, you should set fence_kdump to wait longer (default 60 seconds) pcs stonith update myfence pcmk_reboot_timeout=600 --force
Large memory system might need hours to finish the vmcore capturing so 60 seconds is not enough, could you help to increase the default value? I think there's no side effect to set it as INT_MAX?
I'm a lot confused with the option "pcmk_reboot_timeout".
In kdump environment, fence_kdump_send is running background and send an acknowledge message out every 10 seconds (by default) indefinately (by default).
So how does the timeout works?
- Will it be reset and count down again when it receives a valid message from the crashed node?
- Will fence kdump agent wait for time out after the very first message is received?
pcmk_reboot_timeout: * Specify an alternate timeout to use for reboot actions|. |If the command is not finished in time, it is considered that fencing failed. This option is set on a higher level then in fence agen itself and it controls all fence agents - so it is not impacted by valid message from the crash node. Such thing is possible to control directly from fence_kdump but defaults here are not a problem (if no valid message is obtained in 60 seconds, then fencing failed)
* default value 60s (provided command set it to 600seconds what is more than enough on our testing machines)
m,
On Wed, Jan 22, 2014 at 10:19:45AM +0100, Marek Grac wrote:
Hi,
On 01/16/2014 06:14 AM, WANG Chao wrote:
On 01/16/14 at 11:35am, Dave Young wrote:
Hi, Marek
if you have a lot of memory, you should set fence_kdump to wait longer (default 60 seconds) pcs stonith update myfence pcmk_reboot_timeout=600 --force
Large memory system might need hours to finish the vmcore capturing so 60 seconds is not enough, could you help to increase the default value? I think there's no side effect to set it as INT_MAX?
I'm a lot confused with the option "pcmk_reboot_timeout".
In kdump environment, fence_kdump_send is running background and send an acknowledge message out every 10 seconds (by default) indefinately (by default).
So how does the timeout works?
- Will it be reset and count down again when it receives a valid message from the crashed node?
- Will fence kdump agent wait for time out after the very first message is received?
pcmk_reboot_timeout: * Specify an alternate timeout to use for reboot actions|. |If the command is not finished in time, it is considered that fencing failed. This option is set on a higher level then in fence agen itself and it controls all fence agents - so it is not impacted by valid message from the crash node. Such thing is possible to control directly from fence_kdump but defaults here are not a problem (if no valid message is obtained in 60 seconds, then fencing failed)
What is "fencing failed"? What happens if fencing failed?
So if pcmk_reboot_timeout is 60 seconds, and dump did not finish in 60 seoncds, what happens? Crashed node will be power cycled and kdump will fail?
Thanks Vivek
On 01/22/2014 07:18 PM, Vivek Goyal wrote:
What is "fencing failed"? What happens if fencing failed?
So if pcmk_reboot_timeout is 60 seconds, and dump did not finish in 60 seoncds, what happens? Crashed node will be power cycled and kdump will fail?
It means that cluster will try next attempt to reboot/power off the machine. Usually there is an additional layer (e.g. ipmi, power switches) which just reboot machine, so it can not disturb running cluster (corrupt shared data).
This situation is not fence_kdump specific, it can also happens on hardware fence devices e.g. when you can not login.
m,
On Mon, Jan 13, 2014 at 01:39:11PM +0100, Marek Grac wrote:
[..]
if you have a lot of memory, you should set fence_kdump to wait longer (default 60 seconds) pcs stonith update myfence pcmk_reboot_timeout=600 --force
Hi Marek,
I think this is a problem. How would we know in advance how much it will take for dump to finish. And it will vary depending on so many things. (size of memory, speed of network etc).
By default, why this value can't be very high? Or this value can act more like a watchdog. As long as you keep on getting tick, you keep resetting internal counter. If you don't get a tick (message from node which is saving vmcore) for 60 seconds, then you assume that something went wrong with the node and power cycle it.
Trying to keep an upper limit of 60 seconds and assuming dump will finish in this time, will not help.
Thanks Vivek
On 01/22/2014 07:33 PM, Vivek Goyal wrote:
On Mon, Jan 13, 2014 at 01:39:11PM +0100, Marek Grac wrote:
[..]
if you have a lot of memory, you should set fence_kdump to wait longer (default 60 seconds) pcs stonith update myfence pcmk_reboot_timeout=600 --force
Hi Marek,
I think this is a problem. How would we know in advance how much it will take for dump to finish. And it will vary depending on so many things. (size of memory, speed of network etc).
You don't need to know this in advance. This is set on cluster-side and administrator should be able to set this timeout to proper value.
By default, why this value can't be very high? Or this value can act more like a watchdog. As long as you keep on getting tick, you keep resetting internal counter. If you don't get a tick (message from node which is saving vmcore) for 60 seconds, then you assume that something went wrong with the node and power cycle it.
Trying to keep an upper limit of 60 seconds and assuming dump will finish in this time, will not help.
This is a general fence agent settings in cluster and fence_kdump is only one that uses 'ticking' mechanism, all other should finished in a much more fixed time. Setting this value for kdump agent is fine as fence_kdump itself contains a different timeout mechanism which are based on 'ticks'. I agree that it should be explained in documentation/kbase but it is not something what can be changed on fence agent level.
Cluster (pacemaker/corosync) accepts that some fence agents are slower than others, so it is possible to set this timeout value on per-instance-of-agent with given command.
On Thu, Jan 23, 2014 at 10:53:51AM +0100, Marek Grac wrote:
[..]
I think this is a problem. How would we know in advance how much it will take for dump to finish. And it will vary depending on so many things. (size of memory, speed of network etc).
You don't need to know this in advance. This is set on cluster-side and administrator should be able to set this timeout to proper value.
How would cluster admin know how long will it take to save dump and what's the right value for this parameter?
By default, why this value can't be very high? Or this value can act more like a watchdog. As long as you keep on getting tick, you keep resetting internal counter. If you don't get a tick (message from node which is saving vmcore) for 60 seconds, then you assume that something went wrong with the node and power cycle it.
Trying to keep an upper limit of 60 seconds and assuming dump will finish in this time, will not help.
This is a general fence agent settings in cluster and fence_kdump is only one that uses 'ticking' mechanism, all other should finished in a much more fixed time. Setting this value for kdump agent is fine as fence_kdump itself contains a different timeout mechanism which are based on 'ticks'. I agree that it should be explained in documentation/kbase but it is not something what can be changed on fence agent level.
So are you saying that 60 seconds above is not total time taken to dump. Instead it is the duration in which atleast one message from fence_kdump should be received and timer will reset. And it should receive another message with-in 60 seonds and it keeps going like this.
IOW, as long as fence_kdump keeps on sending message to manager/nodes, every 60 seconds, theoritically dump could take inifinitely long?
Thanks Vivek
On 01/23/2014 04:04 PM, Vivek Goyal wrote:
On Thu, Jan 23, 2014 at 10:53:51AM +0100, Marek Grac wrote:
[..]
I think this is a problem. How would we know in advance how much it will take for dump to finish. And it will vary depending on so many things. (size of memory, speed of network etc).
You don't need to know this in advance. This is set on cluster-side and administrator should be able to set this timeout to proper value.
How would cluster admin know how long will it take to save dump and what's the right value for this parameter?
Documentation but mainly it is matter of experience and testing. It was same in previous versions.
By default, why this value can't be very high? Or this value can act more like a watchdog. As long as you keep on getting tick, you keep resetting internal counter. If you don't get a tick (message from node which is saving vmcore) for 60 seconds, then you assume that something went wrong with the node and power cycle it.
Trying to keep an upper limit of 60 seconds and assuming dump will finish in this time, will not help.
This is a general fence agent settings in cluster and fence_kdump is only one that uses 'ticking' mechanism, all other should finished in a much more fixed time. Setting this value for kdump agent is fine as fence_kdump itself contains a different timeout mechanism which are based on 'ticks'. I agree that it should be explained in documentation/kbase but it is not something what can be changed on fence agent level.
So are you saying that 60 seconds above is not total time taken to dump. Instead it is the duration in which atleast one message from fence_kdump should be received and timer will reset. And it should receive another message with-in 60 seonds and it keeps going like this.
IOW, as long as fence_kdump keeps on sending message to manager/nodes, every 60 seconds, theoritically dump could take inifinitely long?
Nope. Default is 60 seconds for fence agent then cluster decides that it fails - this is tunable.
If you set this value to a really high number (like 1 day) then it will work with fence_kdump because if there is no 'tick' it will fail and timeout will not be applied. In general we can say that admin can set it to such high number and do not risk. But if there is a problem in a fence_kdump (we believe that this is not true), it is possible that node will continue and potentially it can destroy data. I wanted to add a link to a fence_kdump technical paper but unfortunately it is not online anymore (I will contact author)
m,
On Thu, Jan 23, 2014 at 08:21:58PM +0100, Marek Grac wrote:
[..]
How would cluster admin know how long will it take to save dump and what's the right value for this parameter?
Documentation but mainly it is matter of experience and testing. It was same in previous versions.
But dump time varies based on machine type. So if you add a machine to cluster with large amount of memory, it could take 30minutes easily to dump.
And there is no documentation which explains how much time it will take to dump. Nobody knows.
[..]
IOW, as long as fence_kdump keeps on sending message to manager/nodes, every 60 seconds, theoritically dump could take inifinitely long?
Nope. Default is 60 seconds for fence agent then cluster decides that it fails - this is tunable.
If you set this value to a really high number (like 1 day) then it will work with fence_kdump because if there is no 'tick' it will fail and timeout will not be applied. In general we can say that admin can set it to such high number and do not risk. But if there is a problem in a fence_kdump (we believe that this is not true), it is possible that node will continue and potentially it can destroy data. I wanted to add a link to a fence_kdump technical paper but unfortunately it is not online anymore (I will contact author)
I am sorry I still don't understand how does this timeout logic work.
- Is it a tick based mechanism where 60 seconds represent the interval in which atleast one tick should be received.
- Or is it absolute upper limit of time in which dump should be completed.
Thanks Vivek
On 01/24/2014 03:48 PM, Vivek Goyal wrote:
But dump time varies based on machine type. So if you add a machine to cluster with large amount of memory, it could take 30minutes easily to dump.
And there is no documentation which explains how much time it will take to dump. Nobody knows.
Yes, that's true.
I am sorry I still don't understand how does this timeout logic work.
Is it a tick based mechanism where 60 seconds represent the interval in which atleast one tick should be received.
Or is it absolute upper limit of time in which dump should be completed.
Problem is that I did not describe it precisely enough, so don't worry. There are two timeouts: * fence_kdump - tick based mechanism, 60 seconds for valid message, upper bound is infiity - usable everywhere even without cluster
* using fence_kdump with pacemaker/corosync cluster - the most usual combination - cluster has it's upper limit in which fencing has to be finished otherwise it is considered to be failed - this timeout has to be set to a value which is system-dependant - fencing will fail if previous timeout is not enough and if no message was received in 60 seconds (fence_kdump fails -> fencing fails)
m,
On Mon, Jan 27, 2014 at 09:22:39AM +0100, Marek Grac wrote:
On 01/24/2014 03:48 PM, Vivek Goyal wrote:
But dump time varies based on machine type. So if you add a machine to cluster with large amount of memory, it could take 30minutes easily to dump.
And there is no documentation which explains how much time it will take to dump. Nobody knows.
Yes, that's true.
I am sorry I still don't understand how does this timeout logic work.
Is it a tick based mechanism where 60 seconds represent the interval in which atleast one tick should be received.
Or is it absolute upper limit of time in which dump should be completed.
Problem is that I did not describe it precisely enough, so don't worry. There are two timeouts:
- fence_kdump
- tick based mechanism, 60 seconds for valid message, upper
bound is infiity - usable everywhere even without cluster
- using fence_kdump with pacemaker/corosync cluster
- the most usual combination
- cluster has it's upper limit in which fencing has to be
finished otherwise it is considered to be failed - this timeout has to be set to a value which is system-dependant
So what's the default value of this system dependent timeout?
- fencing will fail if previous timeout is not enough and if no
message was received in 60 seconds (fence_kdump fails -> fencing fails)
IIUC, you are saying that there are two timeouts in effect. One says that every 60 seconds a message should be received from fence_kdump. And other timeout is global upper limit set by cluster admin and dump should finish in that time.
So first tick based fence_kdump timeout should not be a problem. Only problem will be this absolute upper limit timeout for cluster. I am curious to know what's the default value for this timeout.
Thanks Vivek
On 01/27/2014 04:08 PM, Vivek Goyal wrote:
IIUC, you are saying that there are two timeouts in effect. One says that every 60 seconds a message should be received from fence_kdump. And other timeout is global upper limit set by cluster admin and dump should finish in that time.
Yes, you are right.
So first tick based fence_kdump timeout should not be a problem. Only problem will be this absolute upper limit timeout for cluster. I am curious to know what's the default value for this timeout.
The default value for this is 60 seconds (variable pcmk_reboot_time)
source: http://clusterlabs.org/man/stonithd.7.html
m,