https://bugzilla.redhat.com/show_bug.cgi?id=1324922
Bug ID: 1324922 Summary: Log handler repeatedly crashes Product: Fedora EPEL Version: epel7 Component: erlang Keywords: Regression, ZStream Severity: urgent Priority: urgent Assignee: jeckersb@redhat.com Reporter: jeckersb@redhat.com QA Contact: extras-qa@fedoraproject.org CC: apevec@redhat.com, binarin@binarin.ru, erlang@lists.fedoraproject.org, fdinitto@redhat.com, jeckersb@redhat.com, jschluet@redhat.com, lhh@redhat.com, oblaut@redhat.com, rjones@redhat.com, s@shk.io, ushkalim@redhat.com Depends On: 1322609 Blocks: 1324185
+++ This bug was initially created as a clone of Bug #1322609 +++
Starting with erlang-erts-R16B-03.10min.6.el7ost.x86_64, the log handler repeatedly crashes and fills up the rabbitmq startup_log with entries like:
Event crashed log handler: {info_msg,<0.1719.0>, {<0.1832.0>,"Mirrored ~s: Adding mirror on node ~p: ~p~n", ["queue 'l3_agent_fanout_0f6bc20f4c54484f9de482cd6d83a15a' in vhost '/'", 'rabbit@overcloud-controller-1',<6192.10668.1>]}} function_clause
Meanwhile the rabbitmq log is empty.
Looks like a regression introduced in the "Enable error_logger depth fine tuning" patch.
--- Additional comment from Alexey Lebedeff on 2016-04-07 09:17:10 EDT ---
R16B-03.16.el7 is also affected.
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1322609 [Bug 1322609] Log handler repeatedly crashes https://bugzilla.redhat.com/show_bug.cgi?id=1324185 [Bug 1324185] Log handler repeatedly crashes
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
--- Comment #1 from Fedora Update System updates@fedoraproject.org --- erlang-R16B-03.17.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-e1035fad90
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
Fedora Update System updates@fedoraproject.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |MODIFIED
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
Fedora Update System updates@fedoraproject.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA
--- Comment #2 from Fedora Update System updates@fedoraproject.org --- erlang-R16B-03.17.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-e1035fad90
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
Steven Dake steven.dake@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |steven.dake@gmail.com
--- Comment #3 from Steven Dake steven.dake@gmail.com --- I think your speculation is incorrect that the depth logging change, whatever that was, introduced this regression. The problem was introduced in -16 (adding IPV6 support). This fundamentally changes how epmd operates. epmd either binds to ipv4 or ipv6, depending on config, but not both.
One workaround mentioned here: https://github.com/openstack/kolla/blob/master/ansible/roles/rabbitmq/templa...
in comment #6 works on -16, but triggers epmd to bind to 0.0.0.0 (all interfaces) which could interfere with neutron, then tenant network, etc.
If your going to enable ipv6, might as well fix epmd binding so its handled properly. btw otp-23 patch is a disaster.
I have yet to try 17 with removal of EPMD binding which would be a good short term workaround but not a good long term workaround. Long term this will cause heisenbugs in neutron and other parts of the system that you just haven't discovered yet ;)
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
Alan Pevec apevec@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(jeckersb@redhat.c | |om)
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
--- Comment #4 from Steven Dake steven.dake@gmail.com --- I have tried -17 and it suffers from this same binding problem consistently.
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
--- Comment #5 from Steven Dake steven.dake@gmail.com --- removal of EPMD binding solves the epmd: could not bind to any interface, followed by a erlang crash. Unfortunately with this mode of operation, a wildcard bind is done to all interfaces on the control nodes in OpenStack.
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
--- Comment #6 from Alexey Lebedeff binarin@binarin.ru --- Steven, is there some part of the conversation that is missing or have you posted your comments to a wrong bug? ) Because this one is only about broken logging - all other things should function just fine.
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
--- Comment #7 from Alan Pevec apevec@redhat.com --- Alexey, this is follow up to Bodhi comment in the linked update https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-e1035fad90
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
John Eckersberg jeckersb@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(jeckersb@redhat.c | |om) |
--- Comment #8 from John Eckersberg jeckersb@redhat.com --- OK, let's regroup and clarify some things here before we get more confused. Part of this is my fault because I directed you on IRC to post on the bodhi update about your crash. I didn't realize at first that you were seeing an IPv6 crash and thought it was just the logging crash.
Anyway...
This particular bug is about broken logging. The current released version (R16B-03.16.el7) has broken logging. The only change[1] in the .17 release is to revert the patch that added the broken logging.
So I would ask two things.
(1) Ignore the IPv6 thing for this bug. It would be a huge help if you could just sanity check that .16 has broken logging and that .17 is correct (and update karma on the update accordingly). Then we can either ship that update ASAP or bundle it with an IPv6 fix (if we can get it quickly).
(2) We'll file another bug for the IPv6 issue. We already fixed one crash bug in https://bugzilla.redhat.com/show_bug.cgi?id=1310808 (incidentally this is the update you said introduced your crash). If you can get it to reproduce and capture a coredump of the crash that would be awesome. I will try to reproduce as well by toying with ERL_EPMD_ADDRESS on my end.
[1] http://pkgs.fedoraproject.org/cgit/rpms/erlang.git/commit/?h=epel7&id=65...
https://bugzilla.redhat.com/show_bug.cgi?id=1324922 Bug 1324922 depends on bug 1322609, which changed state.
Bug 1322609 Summary: Log handler repeatedly crashes https://bugzilla.redhat.com/show_bug.cgi?id=1322609
What |Removed |Added ---------------------------------------------------------------------------- Status|RELEASE_PENDING |CLOSED Resolution|--- |ERRATA
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
Alan Pevec apevec@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(steven.dake@gmail | |.com)
--- Comment #9 from Alan Pevec apevec@redhat.com --- Any updates ?
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
Alan Pevec apevec@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(jeckersb@redhat.c | |om)
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
Steven Dake steven.dake@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(steven.dake@gmail | |.com) | |needinfo?(jeckersb@redhat.c | |om) |
--- Comment #10 from Steven Dake steven.dake@gmail.com --- I think what happened here is I confused the .15 and .16 together into one change according to jeckersb's statement.
The issue with (.15 then) is that EPMD wildcard binds which could result in some really weird behavior if anyone in a cloud environment uses that port while neutron is in use on the box. I'm not sure if this is a legitimate situation, but no service in OpenStack should wildcard bind.
That said, .16 is totally bust with logging - your right on that point. I don't recall where the erlang repo is to test -17 with, but if you could provide that I'll test Kolla's current master with it. It takes about 2 hours to test as soon as I have a repo to work with.
Thanks -steve
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
Steven Dake steven.dake@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(jeckersb@redhat.c | |om)
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
--- Comment #11 from Alan Pevec apevec@redhat.com --- Steve, in RDO Mitaka testing repo http://buildlogs.centos.org/centos/7/cloud/x86_64/openstack-mitaka/ we have: erlang-R16B-03.17.el7 rabbitmq-server-3.3.5-17.el7
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
Edu Alcaniz ealcaniz@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |ealcaniz@redhat.com
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
Peter Lemenkov lemenkov@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |lemenkov@gmail.com Assignee|jeckersb@redhat.com |lemenkov@gmail.com
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
--- Comment #12 from Fedora Update System updates@fedoraproject.org --- erlang-R16B-03.17.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
Fedora Update System updates@fedoraproject.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |CLOSED Fixed In Version| |erlang-R16B-03.17.el7 Resolution|--- |ERRATA Last Closed| |2016-07-29 02:50:15
https://bugzilla.redhat.com/show_bug.cgi?id=1324922
John Eckersberg jeckersb@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(jeckersb@redhat.c | |om) |
erlang@lists.fedoraproject.org