Re: fedora-messaging/rabbitmq in staging cluster rebuild tonight

Friday, 14 February 2020

Hey folks,

I thought I'd make a summary of where I'm at. Here are the issues I found
and what I did about it:

- We ran into an Ansible issue that the PR
https://github.com/ansible/ansible/pull/50381 fixes. I've asked pingou to
patch batcave since it's basically a one-liner that will keep working with
the older prod version.

- When starting a RabbitMQ cluster from scratch, there is a race condition
that is documented here:
https://www.rabbitmq.com/cluster-formation.html#initial-formation-race-co...
  On nodes 02 and 03, I've just destroyed the database and let it
auto-detect the cluster again
  # systemctl stop rabbitmq-server && rm -rf /var/lib/rabbitmq/mnesia/ &&
systemctl start rabbitmq-server
  It worked fine. I checked with "rabbitmqctl list_users" that all nodes
had the same users declared.

- I've also fixed a couple things in the playbooks that assumed the cluster
to be up and setup already.

- I've rebuilt collectd-rabbitmq for EPEL8 but we currently only install it
on production apparently (not sure why, I think it could be useful in
staging.

- The nagios-plugins-rabbitmq RPM still fails to install because of a
dependency bug in perl-Monitoring-Plugin, I've opened a ticket about it:
https://bugzilla.redhat.com/show_bug.cgi?id=1803121

Now, we need to recreate the queues, users and bindings, and I don't have
the permissions to run all the playbooks. If someone could run the master
playbook limited on staging and on the rabbitmq_cluster tag, I think it
should recreate all users and queues and we should be all set.

I'm around and on IRC if you need me.

Aurélien

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: fedora-messaging/rabbitmq in staging cluster rebuild tonight