https://bugzilla.redhat.com/show_bug.cgi?id=1138213
Bug ID: 1138213 Summary: Rabbitmq cluster remains partitioned after short network partition incident Product: Fedora Version: 20 Component: rabbitmq-server Assignee: lemenkov@gmail.com Reporter: jprovazn@redhat.com QA Contact: extras-qa@fedoraproject.org CC: erlang@lists.fedoraproject.org, hubert.plociniczak@gmail.com, jeckersb@redhat.com, lemenkov@gmail.com, rjones@redhat.com, s@shk.io
Description of problem: In 3-node rabbitmq cluster after network partition happens and then when 3 nodes can communicate again, then cluster remains partitioned if "pause_minority" policy is used.
This happens if network outage is short enough (~60secs), for longer outages (>3 minutes) cluster was reconstructed properly. Very similar issue was reported here: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2014-March/034639.html
This issue seems to be fixed in upstream already (not sure in which version exactly, but 3.3.5 works fine).
Version-Release number of selected component (if applicable): rabbitmq-server-3.1.5-9.fc20.noarch
Steps to Reproduce: 1. create 3-node cluster, use "pause_minority" no-quorum policy 2. stop networking on one of nodes and wait until the node is seen as shutdown on other nodes 3. start networking on the node again
Actual results: Cluster remains partitioned:
[root@overcloud-controller1-csugetg5pjql ~]# rabbitmqctl cluster_status Cluster status of node 'rabbit@overcloud-controller1-csugetg5pjql' ... [{nodes, [{disc, ['rabbit@overcloud-controller0-o6yt2gtaxk6g', 'rabbit@overcloud-controller1-csugetg5pjql', 'rabbit@overcloud-controller2-z3tswnamdzhq']}]}, {running_nodes, ['rabbit@overcloud-controller0-o6yt2gtaxk6g', 'rabbit@overcloud-controller1-csugetg5pjql']}, {partitions, [{'rabbit@overcloud-controller0-o6yt2gtaxk6g', ['rabbit@overcloud-controller2-z3tswnamdzhq']}, {'rabbit@overcloud-controller1-csugetg5pjql', ['rabbit@overcloud-controller2-z3tswnamdzhq']}]}] ...done.
Expected results: no partitiones after the node is up again
Additional info: only related message in log: Mnesia('rabbit@overcloud-controller0-o6yt2gtaxk6g'): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@overcloud-controller2-z3tswnamdzhq'}