January 2014 - cluster-commits - Fedora Mailing-Lists

fence-agents: RHEL6 - fence_vmware_soap: Add delay option

by Marek Grác

Gitweb: http://git.fedorahosted.org/git/?p=fence-agents.git;a=commitdiff;h=da403a... Commit: da403ab459474ba70303b2b46c9459c2872d6d68 Parent: 7b86da6001e65c9dc309bf4956f1bced9a894c5f Author: Marek 'marx' Grac <mgrac(a)redhat.com> AuthorDate: Fri Jan 24 13:39:36 2014 +0100 Committer: Marek 'marx' Grac <mgrac(a)redhat.com> CommitterDate: Fri Jan 24 13:39:36 2014 +0100 fence_vmware_soap: Add delay option Remove duplicity of "delay" option in metadata. In general, this should be solved in fencing library but we do not want to change that much in this branch. Resolves: rhbz#1051159 --- fence/agents/vmware_soap/fence_vmware_soap.py | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fence/agents/vmware_soap/fence_vmware_soap.py b/fence/agents/vmware_soap/fence_vmware_soap.py index 8f1ff6a..e6e62ac 100644 --- a/fence/agents/vmware_soap/fence_vmware_soap.py +++ b/fence/agents/vmware_soap/fence_vmware_soap.py @@ -172,7 +172,7 @@ def remove_tmp_dir(tmp_dir): def main(): device_opt = [ "help", "version", "agent", "quiet", "verbose", "debug", "action", "ipaddr", "login", "passwd", "passwd_script", - "ssl", "port", "uuid", "separator", "ipport", "delay", + "ssl", "port", "uuid", "separator", "ipport", "power_timeout", "shell_timeout", "login_timeout", "power_wait" ] atexit.register(atexit_handler)

10 years, 3 months

1
0
0 / 0

fence-agents: RHEL6 - fence_vmware_soap: Add delay option

by Marek Grác

Gitweb: http://git.fedorahosted.org/git/?p=fence-agents.git;a=commitdiff;h=7b86da... Commit: 7b86da6001e65c9dc309bf4956f1bced9a894c5f Parent: 1c392c0c6fa2bfd89a349d7d618f686d07e6fa0b Author: Marek 'marx' Grac <mgrac(a)redhat.com> AuthorDate: Thu Jan 23 19:48:37 2014 +0100 Committer: Marek 'marx' Grac <mgrac(a)redhat.com> CommitterDate: Thu Jan 23 19:48:37 2014 +0100 fence_vmware_soap: Add delay option Resolves: rhbz#1051159 --- fence/agents/vmware_soap/fence_vmware_soap.py | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fence/agents/vmware_soap/fence_vmware_soap.py b/fence/agents/vmware_soap/fence_vmware_soap.py index d73e323..8f1ff6a 100644 --- a/fence/agents/vmware_soap/fence_vmware_soap.py +++ b/fence/agents/vmware_soap/fence_vmware_soap.py @@ -1,6 +1,6 @@ #!/usr/bin/python -import sys, re, pexpect, exceptions +import sys, re, pexpect, exceptions, time import shutil, tempfile sys.path.append("@FENCEAGENTSLIBDIR@")

10 years, 3 months

1
0
0 / 0

fence-agents: RHEL6 - fence_vmware_soap: Add delay option

by Marek Grác

Gitweb: http://git.fedorahosted.org/git/?p=fence-agents.git;a=commitdiff;h=1c392c... Commit: 1c392c0c6fa2bfd89a349d7d618f686d07e6fa0b Parent: 9d23663ae93316eedb5c925f719a21ea74e9f59f Author: Marek 'marx' Grac <mgrac(a)redhat.com> AuthorDate: Thu Jan 23 19:14:45 2014 +0100 Committer: Marek 'marx' Grac <mgrac(a)redhat.com> CommitterDate: Thu Jan 23 19:14:45 2014 +0100 fence_vmware_soap: Add delay option Resolves: rhbz#1051159 --- fence/agents/vmware_soap/fence_vmware_soap.py | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/fence/agents/vmware_soap/fence_vmware_soap.py b/fence/agents/vmware_soap/fence_vmware_soap.py index 6abf626..d73e323 100644 --- a/fence/agents/vmware_soap/fence_vmware_soap.py +++ b/fence/agents/vmware_soap/fence_vmware_soap.py @@ -16,6 +16,9 @@ BUILD_DATE="April, 2011" #END_VERSION_GENERATION def soap_login(options): + if options.has_key("-f"): + time.sleep(int(options["-f"])) + if options.has_key("-z"): url = "https://" else: @@ -169,7 +172,7 @@ def remove_tmp_dir(tmp_dir): def main(): device_opt = [ "help", "version", "agent", "quiet", "verbose", "debug", "action", "ipaddr", "login", "passwd", "passwd_script", - "ssl", "port", "uuid", "separator", "ipport", + "ssl", "port", "uuid", "separator", "ipport", "delay", "power_timeout", "shell_timeout", "login_timeout", "power_wait" ] atexit.register(atexit_handler)

10 years, 3 months

1
0
0 / 0

fence-agents: master - fence_kdump: Add vendor-url to metadata

by Marek Grác

Gitweb: http://git.fedorahosted.org/git/?p=fence-agents.git;a=commitdiff;h=849d0d... Commit: 849d0dba262c2111446fb5a03040b22146c35726 Parent: cc04df682a343c6627c250cffc0f4d60383a7baa Author: Marek 'marx' Grac <mgrac(a)redhat.com> AuthorDate: Thu Jan 23 18:29:35 2014 +0100 Committer: Marek 'marx' Grac <mgrac(a)redhat.com> CommitterDate: Thu Jan 23 18:29:35 2014 +0100 fence_kdump: Add vendor-url to metadata Resolves: rhbz#1022529 --- fence/agents/kdump/fence_kdump.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fence/agents/kdump/fence_kdump.c b/fence/agents/kdump/fence_kdump.c index fa1f6a4..cae9842 100644 --- a/fence/agents/kdump/fence_kdump.c +++ b/fence/agents/kdump/fence_kdump.c @@ -178,6 +178,7 @@ do_action_metadata (const char *self) fprintf (stdout, "<longdesc>"); fprintf (stdout, "The fence_kdump agent is intended to be used with with kdump service."); fprintf (stdout, "</longdesc>\n"); + fprintf (stdout, "<vendor-url>http://www.kernel.org/pub/linux/utils/kernel/kexec/</vendor-url>\n"); fprintf (stdout, "<parameters>\n");

10 years, 3 months

1
0
0 / 0

fence-agents: master - fence_scsi: Change path to corosync from /sbin to /usr/sbin

by Marek Grác

Gitweb: http://git.fedorahosted.org/git/?p=fence-agents.git;a=commitdiff;h=cc04df... Commit: cc04df682a343c6627c250cffc0f4d60383a7baa Parent: 116512c174f4acef0faee4459158c45ddf6922d2 Author: Marek 'marx' Grac <mgrac(a)redhat.com> AuthorDate: Thu Jan 23 17:32:25 2014 +0100 Committer: Marek 'marx' Grac <mgrac(a)redhat.com> CommitterDate: Thu Jan 23 17:32:25 2014 +0100 fence_scsi: Change path to corosync from /sbin to /usr/sbin /sbin is just a symlink to /usr/bin - so it does not impact functionality --- fence/agents/scsi/fence_scsi.pl | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fence/agents/scsi/fence_scsi.pl b/fence/agents/scsi/fence_scsi.pl index 3ad0f09..6808ff5 100644 --- a/fence/agents/scsi/fence_scsi.pl +++ b/fence/agents/scsi/fence_scsi.pl @@ -429,7 +429,7 @@ sub get_node_id ($) my $self = (caller(0))[3]; my $node = $_[0]; - my $cmd = "/sbin/corosync-cmapctl nodelist."; + my $cmd = "/usr/sbin/corosync-cmapctl nodelist."; my @out = qx { $cmd 2> /dev/null }; my $err = ($?>>8); @@ -454,7 +454,7 @@ sub get_cluster_id () my $self = (caller(0))[3]; my $cluster_id; - my $cmd = "/sbin/corosync-cmapctl totem.cluster_name"; + my $cmd = "/usr/sbin/corosync-cmapctl totem.cluster_name"; my $out = qx { $cmd 2> /dev/null }; my $err = ($?>>8);

10 years, 3 months

1
0
0 / 0

fence-agents: master - fence_scsi: Replace automatic key generation to work with corosync clusters instead of cman

by Marek Grác

Gitweb: http://git.fedorahosted.org/git/?p=fence-agents.git;a=commitdiff;h=116512... Commit: 116512c174f4acef0faee4459158c45ddf6922d2 Parent: 8b127ebff6a38b0c6dd9c2a1ad738e2d7637e0fa Author: Marek 'marx' Grac <mgrac(a)redhat.com> AuthorDate: Wed Jan 22 15:35:20 2014 +0100 Committer: Marek 'marx' Grac <mgrac(a)redhat.com> CommitterDate: Wed Jan 22 15:35:20 2014 +0100 fence_scsi: Replace automatic key generation to work with corosync clusters instead of cman Resolves: rhbz#994466 --- fence/agents/scsi/fence_scsi.pl | 38 ++++++++++++++++++++++---------------- 1 files changed, 22 insertions(+), 16 deletions(-) diff --git a/fence/agents/scsi/fence_scsi.pl b/fence/agents/scsi/fence_scsi.pl index c959417..3ad0f09 100644 --- a/fence/agents/scsi/fence_scsi.pl +++ b/fence/agents/scsi/fence_scsi.pl @@ -5,6 +5,7 @@ use File::Basename; use File::Path; use Getopt::Std; use POSIX; +use B; #BEGIN_VERSION_GENERATION $RELEASE_VERSION=""; @@ -426,10 +427,10 @@ sub get_key ($) sub get_node_id ($) { my $self = (caller(0))[3]; - my $node_id; + my $node = $_[0]; - my $cmd = "cman_tool nodes -n $_[0] -F id"; - my $out = qx { $cmd 2> /dev/null }; + my $cmd = "/sbin/corosync-cmapctl nodelist."; + my @out = qx { $cmd 2> /dev/null }; my $err = ($?>>8); if ($err != 0) { @@ -438,11 +439,14 @@ sub get_node_id ($) # die "[error]: $self\n" if ($?>>8); - chomp ($out); - - $node_id = $out; - - return ($node_id); + foreach my $line (@out) { + chomp($line); + if ($line =~ /.(\d+?).ring._addr $str$ = ${node}$/) { + return $1; + } + } + + log_error("$self (unable to parse output of corosync-cmapctl or node does not exist)"); } sub get_cluster_id () @@ -450,8 +454,8 @@ sub get_cluster_id () my $self = (caller(0))[3]; my $cluster_id; - my $cmd = "cman_tool status"; - my @out = qx { $cmd 2> /dev/null }; + my $cmd = "/sbin/corosync-cmapctl totem.cluster_name"; + my $out = qx { $cmd 2> /dev/null }; my $err = ($?>>8); if ($err != 0) { @@ -460,12 +464,14 @@ sub get_cluster_id () # die "[error]: $self\n" if ($?>>8); - foreach (@out) { - chomp; - my ($param, $value) = split (/\s*:\s*/, $_); - if ($param =~ /^cluster\s+id/i) { - $cluster_id = $value; - } + chomp($out); + + if ($out =~ /=\s(.*?)$/) { + my $cluster_name = $1; + # tranform string to a number + $cluster_id = (hex B::hash($cluster_name)) % 65536; + } else { + log_error("$self (unable to parse output of corosync-cmapctl)"); } return ($cluster_id);

10 years, 3 months

1
0
0 / 0

fence-agents: master - fencing: Fabric fence agents should have default action "off"

by Marek Grác

Gitweb: http://git.fedorahosted.org/git/?p=fence-agents.git;a=commitdiff;h=8b127e... Commit: 8b127ebff6a38b0c6dd9c2a1ad738e2d7637e0fa Parent: 530e97f05e43bdd5bef9d24c75d4cc3057a491e8 Author: Marek 'marx' Grac <mgrac(a)redhat.com> AuthorDate: Wed Jan 22 13:51:50 2014 +0100 Committer: Marek 'marx' Grac <mgrac(a)redhat.com> CommitterDate: Wed Jan 22 13:51:50 2014 +0100 fencing: Fabric fence agents should have default action "off" Previously, when you have run fence agent without -o XYZ, reboot was performed. Fabric fence agents do not have them so fence agent fails. This update does not fix only this issue but also text --help and in manual pages. Resolves: rhbz#1021392 --- fence/agents/lib/fencing.py.py | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fence/agents/lib/fencing.py.py b/fence/agents/lib/fencing.py.py index 9cc7407..889bb04 100644 --- a/fence/agents/lib/fencing.py.py +++ b/fence/agents/lib/fencing.py.py @@ -618,6 +618,10 @@ def check_input(device_opt, opt): else: all_opt["login"]["required"] = "0" + if device_opt.count("fabric_fencing"): + all_opt["action"]["default"] = "off" + all_opt["action"]["help"] = "-o, --action=[action] Action: status, off (default) or on" + ## Set default values ##### for opt in device_opt:

10 years, 3 months

1
0
0 / 0

cluster: RHEL6 - dlm_controld: adjust fence time comparison

by David Teigland

Gitweb: http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=2d06dd478c2... Commit: 2d06dd478c27bf864ba1a5ac0cbb1ba6c3ed947f Parent: cca7cf733d03a58d94eb4ab3bee7dcc2e39b7ea1 Author: David Teigland <teigland(a)redhat.com> AuthorDate: Fri Jan 10 16:01:35 2014 -0600 Committer: David Teigland <teigland(a)redhat.com> CommitterDate: Fri Jan 10 16:01:35 2014 -0600 dlm_controld: adjust fence time comparison An unusual combination of events can cause the fence time comparison to not work properly, leaving dlm_controld recovery stuck. If fencing in fenced completes very quickly, and the cpg callback into dlm_controld is delayed, the effect is that the fence_time returned from fenced is later than the fail_time recorded in the cpg callback. dlm_controld requires that the fencing time is after the fail time. This is solved by saving the add_time when fail_time is recorded as need_fence_after. The fencing check is then changed to also succeed if fence_time is later than need_fence_after. A simple comparison with add_time does not work as shown in commit 4039bf4817a96b6aab20de948389f43b89ce4a8e. bz 843160 Signed-off-by: David Teigland <teigland(a)redhat.com> --- group/dlm_controld/cpg.c | 17 ++++++++++++++--- 1 files changed, 14 insertions(+), 3 deletions(-) diff --git a/group/dlm_controld/cpg.c b/group/dlm_controld/cpg.c index 6a4023b..795efc4 100644 --- a/group/dlm_controld/cpg.c +++ b/group/dlm_controld/cpg.c @@ -47,6 +47,7 @@ struct node { uint64_t add_time; uint64_t fail_time; uint64_t fence_time; /* for debug */ + uint64_t need_fence_after; uint64_t cluster_add_time; uint64_t cluster_remove_time; uint32_t fence_queries; /* for debug */ @@ -502,6 +503,7 @@ static void node_history_fail(struct lockspace *ls, int nodeid, node->fence_time = 0; node->fence_queries = 0; node->fail_time = time(NULL); + node->need_fence_after = node->add_time; } /* fenced will take care of making sure the quorum value @@ -546,12 +548,20 @@ static int check_fencing_done(struct lockspace *ls) we've seen fenced_time within the same second as fail_time: with external fencing, e.g. fence_node */ - if (last_fenced_time >= node->fail_time) { + /* the comparison with need_fence_after is to deal with + the odd case where fencing completes very quickly in + fenced and there is a delay of the delivery of the cpg + callback (and setting fail_time) in dlm_controld, + placing the fail_time after the fence_time. */ + + if ((last_fenced_time >= node->fail_time) || + (last_fenced_time > node->need_fence_after)) { log_group(ls, "check_fencing %d done " - "add %llu fail %llu last %llu", + "add %llu fail %llu need %llu last %llu", node->nodeid, (unsigned long long)node->add_time, (unsigned long long)node->fail_time, + (unsigned long long)node->need_fence_after, (unsigned long long)last_fenced_time); node->check_fencing = 0; node->add_time = 0; @@ -560,10 +570,11 @@ static int check_fencing_done(struct lockspace *ls) if (!node->fence_queries || node->fence_time != last_fenced_time) { log_group(ls, "check_fencing %d wait " - "add %llu fail %llu last %llu", + "add %llu fail %llu need %llu last %llu", node->nodeid, (unsigned long long)node->add_time, (unsigned long long)node->fail_time, + (unsigned long long)node->need_fence_after, (unsigned long long)last_fenced_time); node->fence_queries++; node->fence_time = last_fenced_time;

10 years, 3 months

1
0
0 / 0

cluster: RHEL6 - gfs_controld: fix first recovery case 2

by David Teigland

Gitweb: http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=cca7cf733d0... Commit: cca7cf733d03a58d94eb4ab3bee7dcc2e39b7ea1 Parent: 72619738e20e2627a7e5fc3268b003d33ce699b2 Author: David Teigland <teigland(a)redhat.com> AuthorDate: Tue Jul 9 11:54:06 2013 -0500 Committer: David Teigland <teigland(a)redhat.com> CommitterDate: Fri Jan 10 14:29:37 2014 -0600 gfs_controld: fix first recovery case 2 - node A is doing first recovery - node B joins the mount group and is waiting for A to finish - node A sets some journals X and Y as needing recovery based on start message from A (it's not clear how/why A has journals X,Y marked as needing recovery if it's doing first recovery.) - node A completes first recovery and sends first recovery done message - node B still has X,Y journals as needing recovery, which prevents the mount group recovery from completing node B should clear the needs recovery state on any journals when it receives first recovery done. bz 982305 Signed-off-by: David Teigland <teigland(a)redhat.com> --- group/gfs_controld/cpg-new.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/group/gfs_controld/cpg-new.c b/group/gfs_controld/cpg-new.c index 8943f62..845d183 100644 --- a/group/gfs_controld/cpg-new.c +++ b/group/gfs_controld/cpg-new.c @@ -1544,12 +1544,20 @@ static void receive_recovery_result(struct mountgroup *mg, static void receive_first_recovery_done(struct mountgroup *mg, struct gfs_header *hd, int len) { + struct journal *j; int master = mg->first_recovery_master; log_group(mg, "receive_first_recovery_done from %d master %d " "mount_client_notified %d", hd->nodeid, master, mg->mount_client_notified); + list_for_each_entry(j, &mg->journals, list) { + if (!j->needs_recovery) + continue; + j->needs_recovery = 0; + log_debug("receive_first_recovery_done clear %d needs_recovery", j->jid); + } + if (list_empty(&mg->changes)) { /* everything is idle, no changes in progress */

10 years, 3 months

1
0
0 / 0

cluster: RHEL6 - gfs_controld: fix first recovery case

by David Teigland

Gitweb: http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=72619738e20... Commit: 72619738e20e2627a7e5fc3268b003d33ce699b2 Parent: e3f8a987f0108b0f5c1c76e8750c35f23fca2191 Author: David Teigland <teigland(a)redhat.com> AuthorDate: Tue Jul 9 11:35:25 2013 -0500 Committer: David Teigland <teigland(a)redhat.com> CommitterDate: Fri Jan 10 14:28:38 2014 -0600 gfs_controld: fix first recovery case - node A is doing first recovery - node B joins the mount group and is waiting for A to finish - node A sets some journals X and Y as needing recovery based on start message from A (it's not clear how/why A has journals X,Y marked as needing recovery if it's doing first recovery.) - node A fails - node B marks A's journal as needing recovery - node B takes over doing first recovery - node B successfully finishes first recovery - node B still has X,Y,A journals as needing recovery, which prevents the mount group recovery from completing First mount recovery allows the first mounter to recover all journals without any other nodes present. This is meant to guarantee that all journals are clean when first mount recovery is done. So, after B completes first mount recovery it should assume all journals are clean and it should clear any needs recovery indication on journals. bz 982305 Signed-off-by: David Teigland <teigland(a)redhat.com> --- group/gfs_controld/cpg-new.c | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/group/gfs_controld/cpg-new.c b/group/gfs_controld/cpg-new.c index 537624d..8943f62 100644 --- a/group/gfs_controld/cpg-new.c +++ b/group/gfs_controld/cpg-new.c @@ -2304,8 +2304,8 @@ void process_recovery_uevent(char *name, int jid, int recover_status, to check below that we've seen uevents for all jids during first recovery before sending first_recovery_done. */ - log_group(mg, "recovery_uevent jid %d first recovery done %d", - jid, mg->first_done_uevent); + log_group(mg, "recovery_uevent jid %d status %d first recovery done %d", + jid, recover_status, mg->first_done_uevent); /* ignore extraneous uevent from others_may_mount */ if (mg->first_done_uevent) @@ -2323,6 +2323,14 @@ void process_recovery_uevent(char *name, int jid, int recover_status, if (first_done) { log_group(mg, "recovery_uevent first_done"); mg->first_done_uevent = 1; + + list_for_each_entry(j, &mg->journals, list) { + if (!j->needs_recovery) + continue; + j->needs_recovery = 0; + log_debug("recovery_uevent first_done clear %d needs_recovery", j->jid); + } + send_first_recovery_done(mg); } }

10 years, 3 months

1
0
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

cluster-commits January 2014