Gitweb: http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=6d60ac310fcb22... Commit: 6d60ac310fcb224ca921bb78c8c43fd80ec1b03e Parent: e675576929f09ee208f22fdfdcb750dba92fe00f Author: David Teigland teigland@redhat.com AuthorDate: Mon Oct 14 11:42:26 2013 -0500 Committer: David Teigland teigland@redhat.com CommitterDate: Fri Dec 13 10:17:10 2013 -0600
gfs_controld: fix plock transfer during first mount recovery
The plock checkpoint is not unlinked properly during certain first mount recovery situations (lower nodeid mounts while higher nodeid is doing first mounter recovery). This leaves a stray checkpoint that prevents the following checkpoint from being created, which causes plock state to not be transferred to mounting nodes, which can lead to a plock being granted in multiple places at once.
node2: mount /gfs (it does first mount recovery) node1: mount /gfs (while node2 is still doing first mount recovery) node2: creates a plock checkpoint (empty) for node1, then closes checkpoint because new low nodeid is now in charge of it node2: sends journal info to node1 node1: gets journal info from node1 Takes special code path because node1 is still doing first recovery. Does not call retrieve_plocks on this code path because there are no plocks to retrieve in this case. But, the retrieve_plocks function is also responsible for unlinking the existing checkpoint on a new low nodeid, which this is. So, node1 does not unlink the checkpoint as it should. node2: finishes first mount recovery, completes mount node1: notified that node2's first recovery is done, completes mount node2: doplock /gfs/test (granted) node1: killed node1: restarts node1: mount /gfs node2: tries to create checkpoint to transfer the plock state to node1, but this fails because the checkpoint exists, because node1 did not unlink it above. So, plock state is not transfered to node1. node1: doplock /gfs/test (granted)
The result is that both nodes have the same plock granted concurrently.
The solution is for node1 to call retrieve_plocks on the first mounter code path, as it does on the normal code path. retrieve_plocks will unlink the checkpoint in this case.
This patch also adds a lower level backup method to create plock checkpoints if an unlink was missed in some cases. If store_checkpoints finds the checkpoint exists, it will try once to unlink it and recreate it.
Signed-off-by: David Teigland teigland@redhat.com --- group/gfs_controld/plock.c | 11 +++++++---- group/gfs_controld/recover.c | 4 ++++ 2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/group/gfs_controld/plock.c b/group/gfs_controld/plock.c index d96604f..51e0882 100644 --- a/group/gfs_controld/plock.c +++ b/group/gfs_controld/plock.c @@ -2012,6 +2012,7 @@ void store_plocks(struct mountgroup *mg, int nodeid) struct lock_waiter *w; int r_count, lock_count, total_size, section_size, max_section_size; int len, owner; + int retry_count = 0;
if (!plocks_online) return; @@ -2087,13 +2088,15 @@ void store_plocks(struct mountgroup *mg, int nodeid) if (rv == SA_AIS_ERR_EXIST) { log_group(mg, "store_plocks: ckpt already exists"); log_error("store_plocks: ckpt already exists"); - /* TODO: best to unlink and retry? */ - /* + /* We should in general be unlinking the ckpt in the + proper places to avoid hitting this, but there are + probably some cases where we miss the unlink, so + this is a backup method. */ + if (retry_count++) + return; _unlink_checkpoint(mg, &name); sleep(1); goto open_retry; - */ - return; } if (rv != SA_AIS_OK) { log_error("store_plocks: ckpt open error %d %s", rv, mg->name); diff --git a/group/gfs_controld/recover.c b/group/gfs_controld/recover.c index f70f798..87eee63 100644 --- a/group/gfs_controld/recover.c +++ b/group/gfs_controld/recover.c @@ -1018,6 +1018,10 @@ void received_our_jid(struct mountgroup *mg) log_group(mg, "other node doing first mounter recovery, " "set mount_client_delay"); mg->mount_client_delay = 1; + /* There should be no plocks to retrieve since the fs is being + mounted initially, but retrieve is needed to unlink an + existing checkpoint if we are the new master. */ + retrieve_plocks(mg); mg->save_plocks = 0; return; }
cluster-commits@lists.fedorahosted.org