https://bugzilla.redhat.com/show_bug.cgi?id=1138213
Bug ID: 1138213
Summary: Rabbitmq cluster remains partitioned after short
network partition incident
Product: Fedora
Version: 20
Component: rabbitmq-server
Assignee: lemenkov(a)gmail.com
Reporter: jprovazn(a)redhat.com
QA Contact: extras-qa(a)fedoraproject.org
CC: erlang(a)lists.fedoraproject.org,
hubert.plociniczak(a)gmail.com, jeckersb(a)redhat.com,
lemenkov(a)gmail.com, rjones(a)redhat.com, s(a)shk.io
Description of problem:
In 3-node rabbitmq cluster after network partition happens and then when 3
nodes can communicate again, then cluster remains partitioned if
"pause_minority" policy is used.
This happens if network outage is short enough (~60secs), for longer outages
(>3 minutes) cluster was reconstructed properly. Very similar issue was
reported here:
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2014-March/034639.html
This issue seems to be fixed in upstream already (not sure in which version
exactly, but 3.3.5 works fine).
Version-Release number of selected component (if applicable):
rabbitmq-server-3.1.5-9.fc20.noarch
Steps to Reproduce:
1. create 3-node cluster, use "pause_minority" no-quorum policy
2. stop networking on one of nodes and wait until the node is seen as shutdown
on other nodes
3. start networking on the node again
Actual results:
Cluster remains partitioned:
[root@overcloud-controller1-csugetg5pjql ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@overcloud-controller1-csugetg5pjql' ...
[{nodes,
[{disc,
['rabbit@overcloud-controller0-o6yt2gtaxk6g',
'rabbit@overcloud-controller1-csugetg5pjql',
'rabbit@overcloud-controller2-z3tswnamdzhq']}]},
{running_nodes,
['rabbit@overcloud-controller0-o6yt2gtaxk6g',
'rabbit@overcloud-controller1-csugetg5pjql']},
{partitions,
[{'rabbit@overcloud-controller0-o6yt2gtaxk6g',
['rabbit@overcloud-controller2-z3tswnamdzhq']},
{'rabbit@overcloud-controller1-csugetg5pjql',
['rabbit@overcloud-controller2-z3tswnamdzhq']}]}]
...done.
Expected results:
no partitiones after the node is up again
Additional info:
only related message in log:
Mnesia('rabbit@overcloud-controller0-o6yt2gtaxk6g'): ** ERROR ** mnesia_event
got {inconsistent_database, running_partitioned_network,
'rabbit@overcloud-controller2-z3tswnamdzhq'}
--
You are receiving this mail because:
You are on the CC list for the bug.