Just upgraded a development machine to:
binutils-2.34.0-10.fc33.x86_64 gcc-10.1.1-2.fc33.x86_64 glibc-2.31.9000-21.fc33.x86_64
and a very simple C compile (non-LTO) is now segfaulting:
make[3]: Entering directory '/home/rjones/d/nbdkit/common/protocol' /bin/sh ../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c -o libprotocol_la-protostrings.lo `test -f 'protostrings.c' || echo './'`protostrings.c libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -fPIC -DPIC -o .libs/libprotocol_la-protostrings.o libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -o libprotocol_la-protostrings.o >/dev/null 2>&1 mv -f .deps/libprotocol_la-protostrings.Tpo .deps/libprotocol_la-protostrings.Plo /bin/sh ../../libtool --tag=CC --mode=link gcc -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -O0 -g -Wp,-U_FORTIFY_SOURCE -o libprotocol.la libprotocol_la-protostrings.lo libtool: link: ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o ../../libtool: line 1734: 2572327 Segmentation fault (core dumped) ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o
Core was generated by `ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000000000 in ?? () binutils-2.34.0-10.fc33.x86_64 (gdb) bt Missing separate debuginfos, use: dnf debuginfo-install#0 0x0000000000000000 in ?? () #1 0x00007f15bd3e03d0 in make_relative_prefix_1.part () from /lib64/libbfd-2.34.0.20200522.so #2 0x00007f15bd3d22db in bfd_plugin_object_p.lto_priv () from /lib64/libbfd-2.34.0.20200522.so #3 0x00007f15bd3401ce in bfd_check_format_matches () from /lib64/libbfd-2.34.0.20200522.so #4 0x00007f15bd340e7a in _bfd_write_archive_contents () from /lib64/libbfd-2.34.0.20200522.so #5 0x00007f15bd348b2a in bfd_close () from /lib64/libbfd-2.34.0.20200522.so #6 0x0000559ee83994b6 in write_archive () #7 0x0000559ee8396ac3 in main ()
I can't find any BZ for this. Any ideas what it could be?
Rich.
On Fri, Jul 24, 2020 at 3:30 PM Richard W.M. Jones rjones@redhat.com wrote:
Just upgraded a development machine to:
binutils-2.34.0-10.fc33.x86_64 gcc-10.1.1-2.fc33.x86_64 glibc-2.31.9000-21.fc33.x86_64
and a very simple C compile (non-LTO) is now segfaulting:
[snip]
I can't find any BZ for this. Any ideas what it could be?
See the last few messages in the "Very strange compiler/linker related build failures in rawhide" thread. It looks like something has gone awry with the binutils update. The problem is not limited to ar, either. This is from a mock build of the abc package:
extracting debug info from /builddir/build/BUILDROOT/abc-1.01-27.git20200720.fc33.x86_64/usr/bin/abc /usr/lib/rpm/find-debuginfo.sh: line 262: 6010 Segmentation fault (core dumped) nm -D "$binary" --format=posix --defined-only 6011 Done | awk '{ print $1 }' 6012 Done | sort > "$dynsyms" /usr/lib/rpm/find-debuginfo.sh: line 262: 6013 Segmentation fault (core dumped) nm "$debuginfo" --format=sysv --defined-only 6014 Done | awk -F | '{ if ($4 ~ "FUNC") print $1 }' 6015 Done | sort > "$funcsyms" xz: /tmp/tmp.1oP6gAlDVB: No such file or directory objcopy: cannot open: /tmp/tmp.1oP6gAlDVB.xz: No such file or directory /usr/lib/rpm/find-debuginfo.sh: line 262: 6044 Segmentation fault (core dumped) nm -D "$binary" --format=posix --defined-only 6045 Done | awk '{ print $1 }' 6046 Done | sort > "$dynsyms" /usr/lib/rpm/find-debuginfo.sh: line 262: 6047 Segmentation fault (core dumped) nm "$debuginfo" --format=sysv --defined-only 6048 Done | awk -F | '{ if ($4 ~ "FUNC") print $1 }' 6049 Done | sort > "$funcsyms" xz: /tmp/tmp.rddMhW6CLz: No such file or directory objcopy: cannot open: /tmp/tmp.rddMhW6CLz.xz: No such file or directory
So ar and nm are affected, at least.
On Fri, 2020-07-24 at 22:29 +0100, Richard W.M. Jones wrote:
Just upgraded a development machine to:
binutils-2.34.0-10.fc33.x86_64 gcc-10.1.1-2.fc33.x86_64 glibc-2.31.9000-21.fc33.x86_64
and a very simple C compile (non-LTO) is now segfaulting:
make[3]: Entering directory '/home/rjones/d/nbdkit/common/protocol' /bin/sh ../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c -o libprotocol_la-protostrings.lo `test -f 'protostrings.c' || echo './'`protostrings.c libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -fPIC -DPIC -o .libs/libprotocol_la-protostrings.o libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -o libprotocol_la-protostrings.o >/dev/null 2>&1 mv -f .deps/libprotocol_la-protostrings.Tpo .deps/libprotocol_la-protostrings.Plo /bin/sh ../../libtool --tag=CC --mode=link gcc -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -O0 -g -Wp,-U_FORTIFY_SOURCE -o libprotocol.la libprotocol_la-protostrings.lo libtool: link: ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o ../../libtool: line 1734: 2572327 Segmentation fault (core dumped) ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o
Core was generated by `ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000000000 in ?? () binutils-2.34.0-10.fc33.x86_64 (gdb) bt Missing separate debuginfos, use: dnf debuginfo-install#0 0x0000000000000000 in ?? () #1 0x00007f15bd3e03d0 in make_relative_prefix_1.part () from /lib64/libbfd-2.34.0.20200522.so #2 0x00007f15bd3d22db in bfd_plugin_object_p.lto_priv () from /lib64/libbfd-2.34.0.20200522.so #3 0x00007f15bd3401ce in bfd_check_format_matches () from /lib64/libbfd-2.34.0.20200522.so #4 0x00007f15bd340e7a in _bfd_write_archive_contents () from /lib64/libbfd-2.34.0.20200522.so #5 0x00007f15bd348b2a in bfd_close () from /lib64/libbfd-2.34.0.20200522.so #6 0x0000559ee83994b6 in write_archive () #7 0x0000559ee8396ac3 in main ()
I can't find any BZ for this. Any ideas what it could be?
Hmm, what's interesting here is that it's binutils-2.34, so it's not the update that Nick was doing to do today. I've seen a couple folks trip over this today and just saw it in a couple of my builds.
I'll take a look. I'm not much of a binutils hacker these days, but it's just code.
jeff
On Fri, Jul 24, 2020 at 03:37:05PM -0600, Jeff Law wrote:
Hmm, what's interesting here is that it's binutils-2.34, so it's not the update that Nick was doing to do today. I've seen a couple folks trip over this today and just saw it in a couple of my builds.
I believe it's the version that nickc just built in Rawhide this afternoon.
I'll take a look. I'm not much of a binutils hacker these days, but it's just code.
Even simpler reproducer ...
$ ar cru test.a /dev/null Segmentation fault (core dumped)
Rich.
On Fri, 2020-07-24 at 22:40 +0100, Richard W.M. Jones wrote:
On Fri, Jul 24, 2020 at 03:37:05PM -0600, Jeff Law wrote:
Hmm, what's interesting here is that it's binutils-2.34, so it's not the update that Nick was doing to do today. I've seen a couple folks trip over this today and just saw it in a couple of my builds.
I believe it's the version that nickc just built in Rawhide this afternoon.
Yea, but I'm probably to blame :-)
I'll take a look. I'm not much of a binutils hacker these days, but it's just code.
Even simpler reproducer ...
$ ar cru test.a /dev/null Segmentation fault (core dumped)
Sweet. I'm on it.
jeff
On Fri, 2020-07-24 at 22:40 +0100, Richard W.M. Jones wrote:
On Fri, Jul 24, 2020 at 03:37:05PM -0600, Jeff Law wrote:
Hmm, what's interesting here is that it's binutils-2.34, so it's not the update that Nick was doing to do today. I've seen a couple folks trip over this today and just saw it in a couple of my builds.
I believe it's the version that nickc just built in Rawhide this afternoon.
I'll take a look. I'm not much of a binutils hacker these days, but it's just code.
Even simpler reproducer ...
$ ar cru test.a /dev/null Segmentation fault (core dumped)
But that's not triggering for me:( Let me try with the nbdkit bits
jeff
On Fri, 24 Jul 2020 at 17:51, Jeff Law law@redhat.com wrote:
On Fri, 2020-07-24 at 22:40 +0100, Richard W.M. Jones wrote:
On Fri, Jul 24, 2020 at 03:37:05PM -0600, Jeff Law wrote:
Hmm, what's interesting here is that it's binutils-2.34, so it's not the update that Nick was doing to do today. I've seen a couple folks trip over this today and just saw it in a couple of my builds.
I believe it's the version that nickc just built in Rawhide this afternoon.
I'll take a look. I'm not much of a binutils hacker these days, but it's just code.
Even simpler reproducer ...
$ ar cru test.a /dev/null Segmentation fault (core dumped)
But that's not triggering for me:( Let me try with the nbdkit bits
Would it help if the people having the problem gave a list of what RPMs they have installed and what kernel they are running? That could help cut down some of the guessing.
On Fri, 2020-07-24 at 17:55 -0400, Stephen John Smoogen wrote:
On Fri, 24 Jul 2020 at 17:51, Jeff Law law@redhat.com wrote:
On Fri, 2020-07-24 at 22:40 +0100, Richard W.M. Jones wrote:
On Fri, Jul 24, 2020 at 03:37:05PM -0600, Jeff Law wrote:
Hmm, what's interesting here is that it's binutils-2.34, so it's not the update that Nick was doing to do today. I've seen a couple folks trip over this today and just saw it in a couple of my builds.
I believe it's the version that nickc just built in Rawhide this afternoon.
I'll take a look. I'm not much of a binutils hacker these days, but it's just code.
Even simpler reproducer ...
$ ar cru test.a /dev/null Segmentation fault (core dumped)
But that's not triggering for me:( Let me try with the nbdkit bits
Would it help if the people having the problem gave a list of what RPMs they have installed and what kernel they are running? That could help cut down some of the guessing.
No need. I've reproduced it with the nbdkit package. Proceeding to debugging.
jeff
On Fri, 2020-07-24 at 17:55 -0400, Stephen John Smoogen wrote:
On Fri, 24 Jul 2020 at 17:51, Jeff Law law@redhat.com wrote:
On Fri, 2020-07-24 at 22:40 +0100, Richard W.M. Jones wrote:
On Fri, Jul 24, 2020 at 03:37:05PM -0600, Jeff Law wrote:
Hmm, what's interesting here is that it's binutils-2.34, so it's not the update that Nick was doing to do today. I've seen a couple folks trip over this today and just saw it in a couple of my builds.
I believe it's the version that nickc just built in Rawhide this afternoon.
I'll take a look. I'm not much of a binutils hacker these days, but it's just code.
Even simpler reproducer ...
$ ar cru test.a /dev/null Segmentation fault (core dumped)
But that's not triggering for me:( Let me try with the nbdkit bits
Would it help if the people having the problem gave a list of what RPMs they have installed and what kernel they are running? That could help cut down some of the guessing.
What would help would be if someone could untag that version of binutils so that it doesn't show up in the buildroots anymore. It's clearly fubar'd.
What exceedingly weird is it looks like like we've got a call through the PLT to a routine that should be defined, but the PLT entry is zero. Naturally that causes bad things to happen.
Jeff
On Fri, Jul 24, 2020 at 04:55:31PM -0600, Jeff Law wrote:
On Fri, 2020-07-24 at 17:55 -0400, Stephen John Smoogen wrote:
On Fri, 24 Jul 2020 at 17:51, Jeff Law law@redhat.com wrote:
On Fri, 2020-07-24 at 22:40 +0100, Richard W.M. Jones wrote:
On Fri, Jul 24, 2020 at 03:37:05PM -0600, Jeff Law wrote:
Hmm, what's interesting here is that it's binutils-2.34, so it's not the update that Nick was doing to do today. I've seen a couple folks trip over this today and just saw it in a couple of my builds.
I believe it's the version that nickc just built in Rawhide this afternoon.
I'll take a look. I'm not much of a binutils hacker these days, but it's just code.
Even simpler reproducer ...
$ ar cru test.a /dev/null Segmentation fault (core dumped)
But that's not triggering for me:( Let me try with the nbdkit bits
Would it help if the people having the problem gave a list of what RPMs they have installed and what kernel they are running? That could help cut down some of the guessing.
What would help would be if someone could untag that version of binutils so that it doesn't show up in the buildroots anymore. It's clearly fubar'd.
Done.
kevin
On Fri, Jul 24, 2020 at 6:41 PM Kevin Fenzi kevin@scrye.com wrote:
On Fri, Jul 24, 2020 at 04:55:31PM -0600, Jeff Law wrote:
What would help would be if someone could untag that version of binutils so that it doesn't show up in the buildroots anymore. It's clearly fubar'd.
Done.
Hmmmm. Yet my most recent build attempt, just now, failed with a linker segfault on all arches:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1546752
This is with: annobin-9.24.2-fc33 binutils-2.35-1.fc33 gcc-10.2.1-1.fc33 glibc-2.31.9000-21.fc33
Regards,
On Sun, 2020-07-26 at 09:39 -0600, Jerry James wrote:
On Fri, Jul 24, 2020 at 6:41 PM Kevin Fenzi kevin@scrye.com wrote:
On Fri, Jul 24, 2020 at 04:55:31PM -0600, Jeff Law wrote:
What would help would be if someone could untag that version of binutils so that it doesn't show up in the buildroots anymore. It's clearly fubar'd.
Done.
Hmmmm. Yet my most recent build attempt, just now, failed with a linker segfault on all arches:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1546752
This is with: annobin-9.24.2-fc33 binutils-2.35-1.fc33 gcc-10.2.1-1.fc33 glibc-2.31.9000-21.fc33
Urggh. Probably the same problem showing up in the 2.35 build which must have recently landed.
We need that untagged too.
jeff
On Sun, Jul 26, 2020 at 10:17:03AM -0600, Jeff Law wrote:
On Sun, 2020-07-26 at 09:39 -0600, Jerry James wrote:
On Fri, Jul 24, 2020 at 6:41 PM Kevin Fenzi kevin@scrye.com wrote:
On Fri, Jul 24, 2020 at 04:55:31PM -0600, Jeff Law wrote:
What would help would be if someone could untag that version of binutils so that it doesn't show up in the buildroots anymore. It's clearly fubar'd.
Done.
Hmmmm. Yet my most recent build attempt, just now, failed with a linker segfault on all arches:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1546752
This is with: annobin-9.24.2-fc33 binutils-2.35-1.fc33 gcc-10.2.1-1.fc33 glibc-2.31.9000-21.fc33
Urggh. Probably the same problem showing up in the 2.35 build which must have recently landed.
We need that untagged too.
I can do that. It did indeed land this morning: https://bodhi.fedoraproject.org/updates/FEDORA-2020-d345248228
CC: nickc
kevin
On Sun, 2020-07-26 at 09:39 -0600, Jerry James wrote:
On Fri, Jul 24, 2020 at 6:41 PM Kevin Fenzi kevin@scrye.com wrote:
On Fri, Jul 24, 2020 at 04:55:31PM -0600, Jeff Law wrote:
What would help would be if someone could untag that version of binutils so that it doesn't show up in the buildroots anymore. It's clearly fubar'd.
Done.
Hmmmm. Yet my most recent build attempt, just now, failed with a linker segfault on all arches:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1546752
This is with: annobin-9.24.2-fc33 binutils-2.35-1.fc33 gcc-10.2.1-1.fc33 glibc-2.31.9000-21.fc33
As Kevin mentioned in a followup, he's untagged the 2.35 build so this should be working again.
I think I see the root cause in the linker now. It's probably an uncommon scenario, but I doubt binutils is the only affected package.
The even better news is I think we can go ahead and green light the mass rebuild for Monday. Two reasons. One, I expect the preconditions necessary to trip the bug to be uncommon. Two, I think we can reliably detect a broken binary by the existence of absolute symbols in the dynamic symbol table.
The latter in particular means we've got a method where we can find affected packages while Nick and I iterate on the linker fix. So even if the bug leaks into packages, we can find them and do targeted rebuilds.
I'll find the fesco issue and add some notes there along with my recommendation.
jeff
What would help would be if someone could untag that version of binutils so that it doesn't show up in the buildroots anymore. It's clearly fubar'd.
Done.
Hmmmm. Yet my most recent build attempt, just now, failed with a linker segfault on all arches:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1546752
This is with: annobin-9.24.2-fc33 binutils-2.35-1.fc33 gcc-10.2.1-1.fc33 glibc-2.31.9000-21.fc33
As Kevin mentioned in a followup, he's untagged the 2.35 build so this should be working again.
I think I see the root cause in the linker now. It's probably an uncommon scenario, but I doubt binutils is the only affected package.
The even better news is I think we can go ahead and green light the mass rebuild for Monday. Two reasons. One, I expect the preconditions necessary to trip the bug to be uncommon. Two, I think we can reliably detect a broken binary by the existence of absolute symbols in the dynamic symbol table.
The latter in particular means we've got a method where we can find affected packages while Nick and I iterate on the linker fix. So even if the bug leaks into packages, we can find them and do targeted rebuilds.
For reference I saw an issue yesterday with a build of rpm https://koji.fedoraproject.org/koji/taskinfo?taskID=47871132
On Sun, Jul 26, 2020 at 11:03:58PM -0600, Jeff Law wrote:
On Sun, 2020-07-26 at 09:39 -0600, Jerry James wrote:
On Fri, Jul 24, 2020 at 6:41 PM Kevin Fenzi kevin@scrye.com wrote:
On Fri, Jul 24, 2020 at 04:55:31PM -0600, Jeff Law wrote:
What would help would be if someone could untag that version of binutils so that it doesn't show up in the buildroots anymore. It's clearly fubar'd.
Done.
Hmmmm. Yet my most recent build attempt, just now, failed with a linker segfault on all arches:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1546752
This is with: annobin-9.24.2-fc33 binutils-2.35-1.fc33 gcc-10.2.1-1.fc33 glibc-2.31.9000-21.fc33
As Kevin mentioned in a followup, he's untagged the 2.35 build so this should be working again.
I think I see the root cause in the linker now. It's probably an uncommon scenario, but I doubt binutils is the only affected package.
The even better news is I think we can go ahead and green light the mass rebuild for Monday. Two reasons. One, I expect the preconditions necessary to trip the bug to be uncommon. Two, I think we can reliably detect a broken binary by the existence of absolute symbols in the dynamic symbol table.
The latter in particular means we've got a method where we can find affected packages while Nick and I iterate on the linker fix. So even if the bug leaks into packages, we can find them and do targeted rebuilds.
The problem with that is that if broken builds land in the buildroot of other packages, those dependent packages might either a) fail to build, b) be built incorrectly, for example because feature detection fails. Situation a) happens in mass rebuilds quite a lot anyway, so it's not a big issue, since the build would just be repeated. But b) is more serious. Even if you detect that a package was faulty and needs to be rebuilt, we might have to also rebuild all packages using that faulty package as a build dependency, recursively. This quickly becomes messy :(
Zbyszek
On Mon, 2020-07-27 at 13:32 +0000, Zbigniew Jędrzejewski-Szmek wrote:
On Sun, Jul 26, 2020 at 11:03:58PM -0600, Jeff Law wrote:
On Sun, 2020-07-26 at 09:39 -0600, Jerry James wrote:
On Fri, Jul 24, 2020 at 6:41 PM Kevin Fenzi kevin@scrye.com wrote:
On Fri, Jul 24, 2020 at 04:55:31PM -0600, Jeff Law wrote:
What would help would be if someone could untag that version of binutils so that it doesn't show up in the buildroots anymore. It's clearly fubar'd.
Done.
Hmmmm. Yet my most recent build attempt, just now, failed with a linker segfault on all arches:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1546752
This is with: annobin-9.24.2-fc33 binutils-2.35-1.fc33 gcc-10.2.1-1.fc33 glibc-2.31.9000-21.fc33
As Kevin mentioned in a followup, he's untagged the 2.35 build so this should be working again.
I think I see the root cause in the linker now. It's probably an uncommon scenario, but I doubt binutils is the only affected package.
The even better news is I think we can go ahead and green light the mass rebuild for Monday. Two reasons. One, I expect the preconditions necessary to trip the bug to be uncommon. Two, I think we can reliably detect a broken binary by the existence of absolute symbols in the dynamic symbol table.
The latter in particular means we've got a method where we can find affected packages while Nick and I iterate on the linker fix. So even if the bug leaks into packages, we can find them and do targeted rebuilds.
The problem with that is that if broken builds land in the buildroot of other packages, those dependent packages might either a) fail to build, b) be built incorrectly, for example because feature detection fails. Situation a) happens in mass rebuilds quite a lot anyway, so it's not a big issue, since the build would just be repeated. But b) is more serious. Even if you detect that a package was faulty and needs to be rebuilt, we might have to also rebuild all packages using that faulty package as a build dependency, recursively. This quickly becomes messy :(
I'm aware of that potential. I think the odds of stumbling into this are small.
jeff
On Fri, 2020-07-24 at 22:29 +0100, Richard W.M. Jones wrote:
Just upgraded a development machine to:
binutils-2.34.0-10.fc33.x86_64 gcc-10.1.1-2.fc33.x86_64 glibc-2.31.9000-21.fc33.x86_64
and a very simple C compile (non-LTO) is now segfaulting:
make[3]: Entering directory '/home/rjones/d/nbdkit/common/protocol' /bin/sh ../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c -o libprotocol_la-protostrings.lo `test -f 'protostrings.c' || echo './'`protostrings.c libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -fPIC -DPIC -o .libs/libprotocol_la-protostrings.o libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -o libprotocol_la-protostrings.o >/dev/null 2>&1 mv -f .deps/libprotocol_la-protostrings.Tpo .deps/libprotocol_la-protostrings.Plo /bin/sh ../../libtool --tag=CC --mode=link gcc -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -O0 -g -Wp,-U_FORTIFY_SOURCE -o libprotocol.la libprotocol_la-protostrings.lo libtool: link: ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o ../../libtool: line 1734: 2572327 Segmentation fault (core dumped) ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o
Core was generated by `ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000000000 in ?? () binutils-2.34.0-10.fc33.x86_64 (gdb) bt Missing separate debuginfos, use: dnf debuginfo-install#0 0x0000000000000000 in ?? () #1 0x00007f15bd3e03d0 in make_relative_prefix_1.part () from /lib64/libbfd-2.34.0.20200522.so #2 0x00007f15bd3d22db in bfd_plugin_object_p.lto_priv () from /lib64/libbfd-2.34.0.20200522.so #3 0x00007f15bd3401ce in bfd_check_format_matches () from /lib64/libbfd-2.34.0.20200522.so #4 0x00007f15bd340e7a in _bfd_write_archive_contents () from /lib64/libbfd-2.34.0.20200522.so #5 0x00007f15bd348b2a in bfd_close () from /lib64/libbfd-2.34.0.20200522.so #6 0x0000559ee83994b6 in write_archive () #7 0x0000559ee8396ac3 in main ()
I can't find any BZ for this. Any ideas what it could be?
After banging my head on the wall for a few hours, I think I see what's happening here.
So at a high level ar makes a call to lrealpath. That naturally goes through the PLT. The PLT stub loads the value out of the GOT and jumps to it. The problem is the entry in the GOT is *zero* when it should be pointing to the resolver.
Now lrealpath is provided by libiberty and a copy is in libbfd.so and the GOT entry in libbfd.so looked right. But by the time the program has hit main, the GOT entry has been reset to zero. Naturally that's happening inside the dynamic linker (as expected, confirmed with a watchpoint). If you've ever had to debug ld.so, you'll know it's an insanely painful experience, but it proved fruitful.
The key was finding out that we were not using the libbfd.so linker map to resolve lrealpath, instead we were using the linker map for the main program (ar in this case). So natrually it's time to look a bit more closely at the symbol table for ar.
The main symbol table for ar it doesn't mention lrealpath. But that's just a confusing byproduct of having two symbol tables. What matters to ld.so is the *dynamic* symbol table. And ar has lrealpath in its dynamic symbol table. And here's the kicker, it's an absolute symbol with the value 0:
0000000000000000 A lrealpath
A symbol in the main program takes precedence over a symbol in a DSO. So the dynamic linker was actually doing the right thing given the input it was provided.
Now why (*&@#$ does ar have lrealpath as an absolute symbol? It's got to be related to the fact that when we link ar we pull in another copy of libiberty. In fact, ar links against libiberty twice. Once via -liberty then again against libiberty.a (and kindof a 3rd time indirectly via libbfd). BUt even so that shouldn't be creating an absolute symbol. That's just weird.
This smells like a linker bug to me. Not surprisingly if I force the system to use ld.gold, then I don't see the bogus absolute symbol and the resultant ar works just fine.
It's late and I'll dig further over the weekend, but right now this looks like a linker bug to me. I may turn off LTO globally or in the various instances of binutils -- I need to sleep on that.
Jeff
On Sat, Jul 25, 2020 at 01:11:25AM -0600, Jeff Law wrote:
So at a high level ar makes a call to lrealpath. That naturally goes through the PLT. The PLT stub loads the value out of the GOT and jumps to it. The problem is the entry in the GOT is *zero* when it should be pointing to the resolver.
Now lrealpath is provided by libiberty and a copy is in libbfd.so and the GOT entry in libbfd.so looked right. But by the time the program has hit main, the GOT entry has been reset to zero. Naturally that's happening inside the dynamic linker (as expected, confirmed with a watchpoint). If you've ever had to debug ld.so, you'll know it's an insanely painful experience, but it proved fruitful.
The key was finding out that we were not using the libbfd.so linker map to resolve lrealpath, instead we were using the linker map for the main program (ar in this case). So natrually it's time to look a bit more closely at the symbol table for ar.
The main symbol table for ar it doesn't mention lrealpath. But that's just a confusing byproduct of having two symbol tables. What matters to ld.so is the *dynamic* symbol table. And ar has lrealpath in its dynamic symbol table. And here's the kicker, it's an absolute symbol with the value 0:
0000000000000000 A lrealpath
A symbol in the main program takes precedence over a symbol in a DSO. So the dynamic linker was actually doing the right thing given the input it was provided.
Now why (*&@#$ does ar have lrealpath as an absolute symbol? It's got to be related to the fact that when we link ar we pull in another copy of libiberty. In fact, ar links against libiberty twice. Once via -liberty then again against libiberty.a (and kindof a 3rd time indirectly via libbfd). BUt even so that shouldn't be creating an absolute symbol. That's just weird.
This smells like a linker bug to me. Not surprisingly if I force the system to use ld.gold, then I don't see the bogus absolute symbol and the resultant ar works just fine.
It's late and I'll dig further over the weekend, but right now this looks like a linker bug to me. I may turn off LTO globally or in the various instances of binutils -- I need to sleep on that.
Cool bit of investigation, thanks for looking at that :-)
Rich.
On Sat, 25 Jul 2020 at 09:11, Jeff Law law@redhat.com wrote:
On Fri, 2020-07-24 at 22:29 +0100, Richard W.M. Jones wrote:
Just upgraded a development machine to:
binutils-2.34.0-10.fc33.x86_64 gcc-10.1.1-2.fc33.x86_64 glibc-2.31.9000-21.fc33.x86_64
and a very simple C compile (non-LTO) is now segfaulting:
make[3]: Entering directory '/home/rjones/d/nbdkit/common/protocol' /bin/sh ../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I.
-I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c -o libprotocol_la-protostrings.lo `test -f 'protostrings.c' || echo './'`protostrings.c
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla
-Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -fPIC -DPIC -o .libs/libprotocol_la-protostrings.o
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla
-Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -o libprotocol_la-protostrings.o >/dev/null 2>&1
mv -f .deps/libprotocol_la-protostrings.Tpo
.deps/libprotocol_la-protostrings.Plo
/bin/sh ../../libtool --tag=CC --mode=link gcc -Wall -Wshadow -Wvla
-Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -O0 -g -Wp,-U_FORTIFY_SOURCE -o libprotocol.la libprotocol_la-protostrings.lo
libtool: link: ar cru .libs/libprotocol.a
.libs/libprotocol_la-protostrings.o
../../libtool: line 1734: 2572327 Segmentation fault (core dumped)
ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o
Core was generated by `ar cru .libs/libprotocol.a
.libs/libprotocol_la-protostrings.o'.
Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000000000 in ?? () binutils-2.34.0-10.fc33.x86_64 (gdb) bt Missing separate debuginfos, use: dnf debuginfo-install#0
0x0000000000000000 in ?? ()
#1 0x00007f15bd3e03d0 in make_relative_prefix_1.part () from /lib64/libbfd-2.34.0.20200522.so #2 0x00007f15bd3d22db in bfd_plugin_object_p.lto_priv () from /lib64/libbfd-2.34.0.20200522.so #3 0x00007f15bd3401ce in bfd_check_format_matches () from /lib64/libbfd-2.34.0.20200522.so #4 0x00007f15bd340e7a in _bfd_write_archive_contents () from /lib64/libbfd-2.34.0.20200522.so #5 0x00007f15bd348b2a in bfd_close () from /lib64/
libbfd-2.34.0.20200522.so
#6 0x0000559ee83994b6 in write_archive () #7 0x0000559ee8396ac3 in main ()
I can't find any BZ for this. Any ideas what it could be?
After banging my head on the wall for a few hours, I think I see what's happening here.
So at a high level ar makes a call to lrealpath. That naturally goes through the PLT. The PLT stub loads the value out of the GOT and jumps to it. The problem is the entry in the GOT is *zero* when it should be pointing to the resolver.
Now lrealpath is provided by libiberty and a copy is in libbfd.so and the GOT entry in libbfd.so looked right. But by the time the program has hit main, the GOT entry has been reset to zero. Naturally that's happening inside the dynamic linker (as expected, confirmed with a watchpoint). If you've ever had to debug ld.so, you'll know it's an insanely painful experience, but it proved fruitful.
The key was finding out that we were not using the libbfd.so linker map to resolve lrealpath, instead we were using the linker map for the main program (ar in this case). So natrually it's time to look a bit more closely at the symbol table for ar.
The main symbol table for ar it doesn't mention lrealpath. But that's just a confusing byproduct of having two symbol tables. What matters to ld.so is the *dynamic* symbol table. And ar has lrealpath in its dynamic symbol table. And here's the kicker, it's an absolute symbol with the value 0:
0000000000000000 A lrealpath
A symbol in the main program takes precedence over a symbol in a DSO. So the dynamic linker was actually doing the right thing given the input it was provided.
Now why (*&@#$ does ar have lrealpath as an absolute symbol? It's got to be related to the fact that when we link ar we pull in another copy of libiberty. In fact, ar links against libiberty twice. Once via -liberty then again against libiberty.a (and kindof a 3rd time indirectly via libbfd). BUt even so that shouldn't be creating an absolute symbol. That's just weird.
This smells like a linker bug to me. Not surprisingly if I force the system to use ld.gold, then I don't see the bogus absolute symbol and the resultant ar works just fine.
It's late and I'll dig further over the weekend, but right now this looks like a linker bug to me. I may turn off LTO globally or in the various instances of binutils -- I need to sleep on that.
Jeff
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Super big thanks for investigating this, Jeff!
It suddenly tripped my rawhide build and I started panicking, because it's my first official package :D.
~Andy
On Sat, 2020-07-25 at 01:11 -0600, Jeff Law wrote:
So at a high level ar makes a call to lrealpath. That naturally goes through the PLT. The PLT stub loads the value out of the GOT and jumps to it. The problem is the entry in the GOT is *zero* when it should be pointing to the resolver.
Now lrealpath is provided by libiberty and a copy is in libbfd.so and the GOT entry in libbfd.so looked right. But by the time the program has hit main, the GOT entry has been reset to zero. Naturally that's happening inside the dynamic linker (as expected, confirmed with a watchpoint). If you've ever had to debug ld.so, you'll know it's an insanely painful experience, but it proved fruitful.
The key was finding out that we were not using the libbfd.so linker map to resolve lrealpath, instead we were using the linker map for the main program (ar in this case). So natrually it's time to look a bit more closely at the symbol table for ar.
The main symbol table for ar it doesn't mention lrealpath. But that's just a confusing byproduct of having two symbol tables. What matters to ld.so is the *dynamic* symbol table. And ar has lrealpath in its dynamic symbol table. And here's the kicker, it's an absolute symbol with the value 0:
0000000000000000 A lrealpath
A symbol in the main program takes precedence over a symbol in a DSO. So the dynamic linker was actually doing the right thing given the input it was provided.
Now why (*&@#$ does ar have lrealpath as an absolute symbol? It's got to be related to the fact that when we link ar we pull in another copy of libiberty. In fact, ar links against libiberty twice. Once via -liberty then again against libiberty.a (and kindof a 3rd time indirectly via libbfd). BUt even so that shouldn't be creating an absolute symbol. That's just weird.
This smells like a linker bug to me. Not surprisingly if I force the system to use ld.gold, then I don't see the bogus absolute symbol and the resultant ar works just fine.
It's late and I'll dig further over the weekend, but right now this looks like a linker bug to me. I may turn off LTO globally or in the various instances of binutils -- I need to sleep on that.
I'm seeing the same behavior with man-db, more specifically with accessdb linking to libmandb:
$ nm -D accessdb | grep xmalloc 0000000000000000 A xmalloc
Obviously it segfaults, unless I disable LTO.
Is there a bugzilla for that linker bug?
Thanks, Nikola
On Mon, 2020-07-27 at 18:20 +0200, Nikola Forró wrote:
On Sat, 2020-07-25 at 01:11 -0600, Jeff Law wrote:
So at a high level ar makes a call to lrealpath. That naturally goes through the PLT. The PLT stub loads the value out of the GOT and jumps to it. The problem is the entry in the GOT is *zero* when it should be pointing to the resolver.
Now lrealpath is provided by libiberty and a copy is in libbfd.so and the GOT entry in libbfd.so looked right. But by the time the program has hit main, the GOT entry has been reset to zero. Naturally that's happening inside the dynamic linker (as expected, confirmed with a watchpoint). If you've ever had to debug ld.so, you'll know it's an insanely painful experience, but it proved fruitful.
The key was finding out that we were not using the libbfd.so linker map to resolve lrealpath, instead we were using the linker map for the main program (ar in this case). So natrually it's time to look a bit more closely at the symbol table for ar.
The main symbol table for ar it doesn't mention lrealpath. But that's just a confusing byproduct of having two symbol tables. What matters to ld.so is the *dynamic* symbol table. And ar has lrealpath in its dynamic symbol table. And here's the kicker, it's an absolute symbol with the value 0:
0000000000000000 A lrealpath
A symbol in the main program takes precedence over a symbol in a DSO. So the dynamic linker was actually doing the right thing given the input it was provided.
Now why (*&@#$ does ar have lrealpath as an absolute symbol? It's got to be related to the fact that when we link ar we pull in another copy of libiberty. In fact, ar links against libiberty twice. Once via -liberty then again against libiberty.a (and kindof a 3rd time indirectly via libbfd). BUt even so that shouldn't be creating an absolute symbol. That's just weird.
This smells like a linker bug to me. Not surprisingly if I force the system to use ld.gold, then I don't see the bogus absolute symbol and the resultant ar works just fine.
It's late and I'll dig further over the weekend, but right now this looks like a linker bug to me. I may turn off LTO globally or in the various instances of binutils -- I need to sleep on that.
I'm seeing the same behavior with man-db, more specifically with accessdb linking to libmandb:
$ nm -D accessdb | grep xmalloc 0000000000000000 A xmalloc
Obviously it segfaults, unless I disable LTO.
Is there a bugzilla for that linker bug?
I don't think so. Nick was trying to pull together a simpler testcase and open a discussion with the other binutils developers on a path forward. He's aware of the impacts, so I'm sure he's working diligently on it.
In the immediate term, disabling LTO seems reasonable.
%define _lto_cflags %{nil}
We're going to go through all the opt-outs at some point after the mass rebuild, so we can re-enable once the ld bug is fixed.
jeff
On 27. 07. 20 21:24, Jeff Law wrote:
In the immediate term, disabling LTO seems reasonable.
%define _lto_cflags %{nil}
Can this please be documented at:
https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/master/f/buildflag...
?
I'd do it, but I don't know what useful to write about it.
On Mon, 2020-07-27 at 18:20 +0200, Nikola Forró wrote:
On Sat, 2020-07-25 at 01:11 -0600, Jeff Law wrote:
So at a high level ar makes a call to lrealpath. That naturally goes through the PLT. The PLT stub loads the value out of the GOT and jumps to it. The problem is the entry in the GOT is *zero* when it should be pointing to the resolver.
Now lrealpath is provided by libiberty and a copy is in libbfd.so and the GOT entry in libbfd.so looked right. But by the time the program has hit main, the GOT entry has been reset to zero. Naturally that's happening inside the dynamic linker (as expected, confirmed with a watchpoint). If you've ever had to debug ld.so, you'll know it's an insanely painful experience, but it proved fruitful.
The key was finding out that we were not using the libbfd.so linker map to resolve lrealpath, instead we were using the linker map for the main program (ar in this case). So natrually it's time to look a bit more closely at the symbol table for ar.
The main symbol table for ar it doesn't mention lrealpath. But that's just a confusing byproduct of having two symbol tables. What matters to ld.so is the *dynamic* symbol table. And ar has lrealpath in its dynamic symbol table. And here's the kicker, it's an absolute symbol with the value 0:
0000000000000000 A lrealpath
A symbol in the main program takes precedence over a symbol in a DSO. So the dynamic linker was actually doing the right thing given the input it was provided.
Now why (*&@#$ does ar have lrealpath as an absolute symbol? It's got to be related to the fact that when we link ar we pull in another copy of libiberty. In fact, ar links against libiberty twice. Once via -liberty then again against libiberty.a (and kindof a 3rd time indirectly via libbfd). BUt even so that shouldn't be creating an absolute symbol. That's just weird.
This smells like a linker bug to me. Not surprisingly if I force the system to use ld.gold, then I don't see the bogus absolute symbol and the resultant ar works just fine.
It's late and I'll dig further over the weekend, but right now this looks like a linker bug to me. I may turn off LTO globally or in the various instances of binutils -- I need to sleep on that.
I'm seeing the same behavior with man-db, more specifically with accessdb linking to libmandb:
$ nm -D accessdb | grep xmalloc 0000000000000000 A xmalloc
Obviously it segfaults, unless I disable LTO.
Is there a bugzilla for that linker bug?
Note the linker bug should be fixed now. So you should be able to rebuild man-db with LTO now.
jeff