Gitweb: http://git.fedorahosted.org/git/dlm.git?p=dlm.git;a=commitdiff;h=70a61dc4d5d... Commit: 70a61dc4d5d91119da30a7165e70bb9646a6d6c1 Parent: 9ccd853b6d36e99af1100548d1e389fc921f10d4 Author: Jiaju Zhang jjzhang.linux@gmail.com AuthorDate: Mon Nov 8 16:07:31 2010 -0600 Committer: David Teigland teigland@redhat.com CommitterDate: Tue Feb 22 11:09:10 2011 -0600
dlm_controld: Reset fs_notified when check_fs_done
This situation only seems to arise with ocfs2_controld. Copying bug description from email https://www.redhat.com/archives/cluster-devel/2010-November/msg00004.html
About the issue that dlm_controld and fs_controld sit spinning, retrying and replying for the fs_notified check, I have a suspision that another scenario may also hit that logic:
If the node->fs_notified has been set to 1 by previous change, when a new change comes and needs to check the node->fs_notified, because it has not been reset to 0, so check_fs_done will succeed even if dlm_controld has not received the notification from fs_controld this time. For example, given that the following membership changes n, n+1, n+2, we see what happens on node X: Step 1: cg n: node Y leaves with CPG_REASON_NODEDOWN reason, eventually in node X's ls->node_history, node Y's fs_notified = 1 Step 2: cg n+1: node Y joins ... Step 3: cg n+2: node Y leaves with CPG_REASON_NODEDOWN reason, one possible scenario is: before fs_controld's notification arrives, dlm_controld has known node Y is down from CPG message and done a lot of work, and it saw node Y's fs_notified = 1 (been set in Step 1) then passed the fs check wrongly. So node Y's check_fs reset to 0. Step 4: fs_controld's notification arrives, it sees node Y's check_fs = 0 and assumes dlm_controld has not known node Y is down and retries to send the notification. But in fact, dlm_controld has already known this and finished all the work, which will result in the spinning ...
Signed-off-by: Jiaju Zhang jjzhang.linux@gmail.com Signed-off-by: David Teigland teigland@redhat.com --- group/dlm_controld/cpg.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/group/dlm_controld/cpg.c b/group/dlm_controld/cpg.c index 9b0d223..12cb202 100644 --- a/group/dlm_controld/cpg.c +++ b/group/dlm_controld/cpg.c @@ -651,6 +651,7 @@ static int check_fs_done(struct lockspace *ls) if (node->fs_notified) { log_group(ls, "check_fs nodeid %d clear", node->nodeid); node->check_fs = 0; + node->fs_notified = 0; } else { log_group(ls, "check_fs nodeid %d needs fs notify", node->nodeid);
cluster-commits@lists.fedorahosted.org