On Tue, 2020-05-19 at 17:19 -0700, Adam Williamson wrote:
On Tue, 2020-05-19 at 16:34 -0700, Adam Williamson wrote:
> On Tue, 2020-05-19 at 15:45 -0700, Adam Williamson wrote:
> > still, having trouble pinning down a culprit; it's not kernel-5.7.0-
> > 0.rc6.1.fc33 as the 20200518.n.0 compose failed, and that was run with
> > the previous kernel build, which *succeeded* in the 20200517.n.0
> > compose...
> >
> > I guess we get to poke through everything built around the 17th and try
> > to find a relevant change? :)
>
> This should be *more or less* the list of candidate culprits, which I
> just heath robinson'd out of Bodhi (yet again I wish we got compose
> reports for doomed composes, it'd make this easier...). grub2 and
> openssl are the ones that jump out at me right away...ceph is also in
> there, and ceph is one of the things that *is* linked to libibverbs ,
> though the change in ceph doesn't seem like a significant one...
>
> aha! This looks juicy. systemctl is linked against libpcap.so.1 , part
> of libpcap, which is in the list below, and this is the changelog for
> it:
>
> * Fri May 15 2020 Michal Ruprich <michalruprich(a)gmail.com> - 14:1.9.1-4
> - Enabling rdma support in libpcap
>
> the changelog date is May 15 but the pcap build actually ran on 2020-
> 05-18, and indeed the last successful Rawhide compose (20200517.n.1)
> had 1.9.1-3.fc33, and the first failed compose (20200518.n.0) has
> 1.9.1-4.fc33.
>
> I'm betting we have a circular dependency problem or something here...
Well, seems like I may be half right. We untagged libcap-1.9.1-4.fc33
and I re-ran the nbdkit build. It seems that this has fixed the "error
while loading shared libraries: libibverbs.so.1: cannot open shared
object file: No such file or directory" errors...but not the segfaults
:( we now just get segfaults all the way:
DEBUG util.py:602: Running transaction
DEBUG util.py:602: /var/tmp/rpm-tmp.yHBGhZ: line 1: 1976807 Segmentation fault
(core dumped) systemd-machine-id-setup &> /dev/null
DEBUG util.py:602: /var/tmp/rpm-tmp.yHBGhZ: line 21: 1976809 Segmentation fault
(core dumped) systemctl daemon-reexec &> /dev/null
DEBUG util.py:602: /var/tmp/rpm-tmp.yHBGhZ: line 23: 1976811 Segmentation fault
(core dumped) journalctl --update-catalog &> /dev/null
DEBUG util.py:602: /var/tmp/rpm-tmp.yHBGhZ: line 24: 1976813 Segmentation fault
(core dumped) systemd-tmpfiles --create &> /dev/null
DEBUG util.py:602: /var/tmp/rpm-tmp.yHBGhZ: line 50: 1976821 Segmentation fault
(core dumped) systemctl preset-all &> /dev/null
DEBUG util.py:602: /var/tmp/rpm-tmp.yHBGhZ: line 50: 1976823 Segmentation fault
(core dumped) systemctl --global preset-all &> /dev/null
DEBUG util.py:602: /var/tmp/rpm-tmp.TL9iz0: line 6: 1976826 Segmentation fault
(core dumped) /usr/bin/systemctl --no-reload preset dbus.socket
so there might be two different problems here, possibly?
OK, so I dug into the history of the segfaults a bit. Turns out they've
been going on for a few days. They only seem to happen on builds where
the systemd package gets pulled into the buildroot. The closest delta
I've got is this:
GOOD:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1507622 (Fri, 15 May 2020
18:01:07 UTC)
BAD:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1507650 (Fri, 15 May 2020
19:49:31 UTC)
"GOOD" is the latest build I can find where systemd was pulled in as a
build dep (so the bug "should have" happened) and the bug didn't
happen. "BAD" is the earliest build I can find where the bug *did*
happen.
The most suspicious change between the two build envs that I can see is
openssl. GOOD has openssl-1.1.1g-1.fc33.x86_64 , and BAD has
openssl-1.1.1g-2.fc33.x86_64 . I'm gonna try doing an openssl build
with the patch from -2 reverted and see where that gets us.
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net