Gitweb: http://git.fedorahosted.org/git/fence.git?p=fence.git;a=commitdiff;h=73c0f58... Commit: 73c0f58f071d713bd3b634ffcfb1d02feba84fcc Parent: 7181608a3a497bff3d9d1306e77503dbfc4c32f5 Author: David Teigland teigland@redhat.com AuthorDate: Tue Feb 22 15:47:24 2011 -0600 Committer: David Teigland teigland@redhat.com CommitterDate: Tue Feb 22 16:40:30 2011 -0600
fenced: don't ignore victim_done messages for reduced victims
When a victim is "reduced" (i.e. fenced skips fencing it because it rejoins the cluster cleanly before fenced fences it), it is immediately removed from the list of victims, before the "victim_done" message is sent for it. The victim_done message updates the time of the last successful fencing operation for a failed node.
The code that processes received victim_done messages was ignoring the message for the reduced victim because the node couldn't be found in the victims list. This caused the latest fencing information to not be recorded for the node, causing dlm_controld to wait indefinately for fencing to complete for the reduced victim.
The fix is to simply record the information from a victim_done message even if the node is not in the victims list.
bz 678704
Acked-by: Ryan O'Hara rohara@redhat.com Signed-off-by: David Teigland teigland@redhat.com --- fence/fenced/cpg.c | 18 ++++++++++++------ 1 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/fence/fenced/cpg.c b/fence/fenced/cpg.c index eec2ba6..716d4c4 100644 --- a/fence/fenced/cpg.c +++ b/fence/fenced/cpg.c @@ -654,9 +654,9 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
node = get_node_victim(fd, id->nodeid); if (!node) { + /* see comment below about no node */ log_debug("receive_victim_done %d:%u no victim nodeid %d", hd->nodeid, seq, id->nodeid); - return; }
log_debug("receive_victim_done %d:%u remove victim %d time %llu how %d", @@ -672,9 +672,11 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len) if (hd->nodeid == our_nodeid) { /* sanity check, I don't think this should happen; see comment in fence_victims() */ - if (!node->local_victim_done) - log_error("expect local_victim_done"); - node->local_victim_done = 0; + if (node) { + if (!node->local_victim_done) + log_error("expect local_victim_done"); + node->local_victim_done = 0; + } } else { /* save details of fencing operation from master, which master saves at the time it completes it */ @@ -682,8 +684,12 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len) id->fence_how, id->fence_time); }
- list_del(&node->list); - free(node); + /* we can have no node when reduce_victims() removes it, bz 678704 */ + + if (node) { + list_del(&node->list); + free(node); + } }
/* we know that the quorum value here is consistent with the cpg events