cluster-commits March 2012

cluster-commits@lists.fedorahosted.org

5 participants
33 discussions

cluster: RHEL6 - fenced: fix handling of startup partition merge

by David Teigland

Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=... Commit: 5457043e975ba4c44c17138b3751150b974aa35c Parent: af612b16a8b565a0e3543850367e0b58a43922cd Author: David Teigland <teigland(a)redhat.com> AuthorDate: Mon Oct 31 12:06:41 2011 -0500 Committer: David Teigland <teigland(a)redhat.com> CommitterDate: Thu Mar 1 16:07:29 2012 -0600 fenced: fix handling of startup partition merge The victims created on each side of a partition are cleared after a merge in the receive_complete function, which is meant to clear "initial victims". Clearing the victims is the correct end result, but the code arrives there through an unintended shortcut. Change the code so it clears the victims in a "more correct", and probably safer way: 2 is blocked doing startup fencing 1 joins fence domain partition between 1,2 1 sees confchg for partition, adds victim 2 (and sets init_victim per this patch, since 1 hasn't finished a start cycle yet) partition removed 2 completes startup fencing 2 sees confchg for partition, adds victim 1 1 sees confchg for merge, adds node 2 2 sees confchg for merge, adds node 1 1 processing the merge confchg 2 reduces victim 1 from partition, since 1 had no state (had not yet completed a start cycle) 2 processing the merge confchg 1,2 finish start cycle for merge confchg 2 sends complete for merge confchg 1 clears victim 2 in receive_complete because it set init_victim bz 750314 Signed-off-by: David Teigland <teigland(a)redhat.com> --- fence/fenced/cpg.c | 36 ++++++++++++++++++++++++++++++++++-- 1 files changed, 34 insertions(+), 2 deletions(-) diff --git a/fence/fenced/cpg.c b/fence/fenced/cpg.c index 99e16a0..6634f8c 100644 --- a/fence/fenced/cpg.c +++ b/fence/fenced/cpg.c @@ -1132,8 +1132,11 @@ static void receive_complete(struct fd *fd, struct fd_header *hd, int len) list_for_each_entry_safe(node, safe, &fd->victims, list) { log_debug("receive_complete clear victim nodeid %d init %d", node->nodeid, node->init_victim); - list_del(&node->list); - free(node); + + if (node->init_victim) { + list_del(&node->list); + free(node); + } } } @@ -1319,6 +1322,35 @@ static void add_victims(struct fd *fd, struct change *cg) return; list_add(&node->list, &fd->victims); log_debug("add_victims node %d", node->nodeid); + + /* + * If we haven't completed a start cycle yet, set + * init_victim on any failed node so that receive_complete + * will clear it. This is a hack for one specific scenario: + * + * - node 2 joins domain, blocks in startup fencing + * - node 1 joins domain, waiting for messages in start cycle + * - partition between 1,2 + * - 1 adds victim 2 + * (and sets init_victim below since 1 hasn't completed + * a start cycle yet) + * - partition removed + * - node 2 completes startup fencing + * - 2 gets confchg for partition + * - 2 adds victim 1 (due to partition) + * - 2 gets confchg for merge + * - 2 does join for 1 (due to merge), begins start cycle + * - start cycle adding node 1 finishes, 2 sends complete + * - 2 reduces victim 1 + * - 1 receives complete for its join start cycle, + * and clears victim 2 because we've set init_victim here + */ + + if (!fd->started_count) { + log_debug("add_victims node %d set init_victim", + node->nodeid); + node->init_victim = 1; + } } }

12 years, 4 months

1
0
0 / 0

cluster: RHEL6 - rgmanager: Retry when config is out of sync

by Lon Hohberger

Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=... Commit: af612b16a8b565a0e3543850367e0b58a43922cd Parent: fdcec853307c5d9ce0517cb0088de2e970f76ae7 Author: Lon Hohberger <lhh(a)redhat.com> AuthorDate: Thu Aug 5 16:53:22 2010 -0400 Committer: Lon Hohberger <lhh(a)redhat.com> CommitterDate: Thu Mar 1 14:15:52 2012 -0500 rgmanager: Retry when config is out of sync If you add a service to rgmanager v1 or v2 and that service fails to start on the first node but succeeds in its initial stop operation, there is a chance that the remote instance of rgmanager has not yet reread the configuration, causing the service to be placed into the 'recovering' state without further action. This patch causes the originator of the request to retry the operation. Later versions of rgmanager (ex STABLE3 branch and derivatives) are unlikely to have this problem since configuration updates are not polled, but rather delivered to clients. Update 22-Feb-2012: The above is incorrect, this was reproduced a rgmanager v3 installation. Resolves: rhbz#796272 Signed-off-by: Lon Hohberger <lhh(a)redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto(a)redhat.com> --- rgmanager/src/daemons/rg_state.c | 19 +++++++++++++++++++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/rgmanager/src/daemons/rg_state.c b/rgmanager/src/daemons/rg_state.c index 23a4bec..8c5af5b 100644 --- a/rgmanager/src/daemons/rg_state.c +++ b/rgmanager/src/daemons/rg_state.c @@ -1801,6 +1801,7 @@ handle_relocate_req(char *svcName, int orig_request, int preferred_target, rg_state_t svcStatus; int target = preferred_target, me = my_id(); int ret, x, request = orig_request; + int retries; get_rg_state_local(svcName, &svcStatus); if (svcStatus.rs_state == RG_STATE_DISABLED || @@ -1933,6 +1934,8 @@ handle_relocate_req(char *svcName, int orig_request, int preferred_target, if (target == me) goto exhausted; + retries = 0; +retry: ret = svc_start_remote(svcName, request, target); switch (ret) { case RG_ERUN: @@ -1942,6 +1945,22 @@ handle_relocate_req(char *svcName, int orig_request, int preferred_target, *new_owner = svcStatus.rs_owner; free_member_list(allowed_nodes); return 0; + case RG_ENOSERVICE: + /* + * Configuration update pending on remote node? Give it + * a few seconds to sync up. rhbz#568126 + * + * Configuration updates are synchronized in later releases + * of rgmanager; this should not be needed. + */ + if (retries++ < 4) { + sleep(3); + goto retry; + } + logt_print(LOG_WARNING, "Member #%d has a different " + "configuration than I do; trying next " + "member.", target); + /* Deliberate */ case RG_EDEPEND: case RG_EFAIL: /* Uh oh - we failed to relocate to this node.

12 years, 4 months

1
0
0 / 0

cluster: STABLE32 - rgmanager: Small bug in follow-service.sl

by Lon Hohberger

Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=... Commit: c7d3938a3856f9cb295dc6aed8b7f86762cbed7c Parent: b2012d8fe8b6a30f16091a8c96b5665e34892160 Author: Marc Grimme <grimme(a)atix.de> AuthorDate: Thu Mar 1 14:06:01 2012 -0500 Committer: Lon Hohberger <lhh(a)redhat.com> CommitterDate: Thu Mar 1 14:06:01 2012 -0500 rgmanager: Small bug in follow-service.sl Follow-service was written for use with failover domains. When using follow-service without a failover domain, the available nodelist would be nil. This patch resolves that issue. Signed-off-by: Lon Hohberger <lhh(a)redhat.com> --- rgmanager/src/resources/follow-service.sl | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/rgmanager/src/resources/follow-service.sl b/rgmanager/src/resources/follow-service.sl index 4c711ec..6c17160 100644 --- a/rgmanager/src/resources/follow-service.sl +++ b/rgmanager/src/resources/follow-service.sl @@ -6,7 +6,7 @@ % Author: Marc Grimme, Mark Hlawatschek, October 2008 % Support: support(a)atix.de % License: GNU General Public License (GPL), version 2 or later -% Copyright: (c) 2008-2010 ATIX AG +% Copyright: (c) 2008-2012 ATIX AG debug("*** follow-service.sl"); @@ -21,7 +21,13 @@ define nodelist_online(service_name) { (nofailback, restricted, ordered, node_list) = service_domain_info(service_name); - return intersection(nodes, node_list); + if ((node_list == NULL) or (node_list == 0)) { + debug("service ",service_name, " has no failover domain. Taking all available nodes: ", nodes); + return nodes; + } else { + debug("service ",service_name, " has a failover domain. Taking intersection with available nodes: ", nodes, " => ", node_list); + return intersection(nodes, node_list); + } } %

12 years, 4 months

1
0
0 / 0

← Newer
1
2
3
4
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

cluster-commits March 2012