Gitweb: http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=c61e71766564cd... Commit: c61e71766564cdeaea3c46cfce061000a0aa3879 Parent: dca24557dceec608a0e5a36efcc1005fd03f4549 Author: David Teigland teigland@redhat.com AuthorDate: Thu Oct 10 11:58:58 2013 -0500 Committer: Christine Caulfield ccaulfie@redhat.com CommitterDate: Tue Dec 17 10:37:50 2013 +0000
gfs_controld: fix plock recovery
When there are two nodes in the cluster, and the the node in charge of the plock checkpoint fails, the remaining node does not unlink the checkpoint that had been created by the failed node. When the failed node returns, and the new node attempts to transfer plock state, it fails to create a new checkpoint because it did not unlink the previous checkpoint created by the failed node. This leads to any existing plock state not being transferred to the newly joined node. The newly joined node will then mistakenly grant plocks to itself that may conflict with plocks that the other node could not transfer. This leads to:
1. conflicting plocks being held concurrently 2. dangling plocks that are not held but not removed
In the explanation above, the reason the remaining node does not unlink the checkpoint that had been created by the other node, is that it does not know that the other node was in charge of the checkpoint. It could only know this if it had been present before and after the previous membership change. Because there are only two nodes, this was not possible. This, however, is also the point exploited to fix the problem. When there are only two members, a new node can assume that the other node is in charge of the checkpoint.
The following test shows the problem/fix using a program "doplock" that requests an exclusive, blocking posix lock on the given file.
node1: mount /gfs node2: mount /gfs node1: touch /gfs/test node1: doplock /gfs/test (granted) node2: doplock /gfs/test (blocks) node1: killed node2: recovery for node1 node2: doplock above granted the lock node1: restarts node1: mount /gfs node1: doplock /gfs/test
In the last step, the node1 doplock should block because node2 holds the lock. Before the fix, it was granted.
Signed-off-by: David Teigland teigland@redhat.com --- group/gfs_controld/plock.c | 7 +++++++ group/gfs_controld/recover.c | 14 ++++++++++++-- 2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/group/gfs_controld/plock.c b/group/gfs_controld/plock.c index 4330a2c..d96604f 100644 --- a/group/gfs_controld/plock.c +++ b/group/gfs_controld/plock.c @@ -2086,6 +2086,13 @@ void store_plocks(struct mountgroup *mg, int nodeid) } if (rv == SA_AIS_ERR_EXIST) { log_group(mg, "store_plocks: ckpt already exists"); + log_error("store_plocks: ckpt already exists"); + /* TODO: best to unlink and retry? */ + /* + _unlink_checkpoint(mg, &name); + sleep(1); + goto open_retry; + */ return; } if (rv != SA_AIS_OK) { diff --git a/group/gfs_controld/recover.c b/group/gfs_controld/recover.c index b33b3fd..f70f798 100644 --- a/group/gfs_controld/recover.c +++ b/group/gfs_controld/recover.c @@ -1257,8 +1257,15 @@ void update_master_nodeid(struct mountgroup *mg) { struct mg_member *memb; int new = -1, low = -1; + int other_nodeid = -1; + int total = 0;
list_for_each_entry(memb, &mg->members, list) { + total++; + + if (memb->nodeid != our_nodeid) + other_nodeid = memb->nodeid; + if (low == -1 || memb->nodeid < low) low = memb->nodeid; if (!memb->finished) @@ -1268,6 +1275,9 @@ void update_master_nodeid(struct mountgroup *mg) } mg->master_nodeid = new; mg->low_nodeid = low; + + if (new == -1 && total == 2) + mg->master_nodeid = other_nodeid; }
/* This can happen before we receive a journals message for our mount. */ @@ -1354,8 +1364,8 @@ void recover_members(struct mountgroup *mg, int num_nodes, *pos_out = pos; *neg_out = neg;
- log_group(mg, "total members %d master_nodeid %d prev %d", - mg->memb_count, mg->master_nodeid, prev_master_nodeid); + log_group(mg, "total members %d master_nodeid %d prev %d failed %d", + mg->memb_count, mg->master_nodeid, prev_master_nodeid, master_failed);
/* The master failed and we're the new master, we need to:
cluster-commits@lists.fedorahosted.org