On Mon, 2020-07-27 at 18:20 +0200, Nikola Forró wrote:
On Sat, 2020-07-25 at 01:11 -0600, Jeff Law wrote:
So at a high level ar makes a call to lrealpath. That naturally goes through the PLT. The PLT stub loads the value out of the GOT and jumps to it. The problem is the entry in the GOT is *zero* when it should be pointing to the resolver.
Now lrealpath is provided by libiberty and a copy is in libbfd.so and the GOT entry in libbfd.so looked right. But by the time the program has hit main, the GOT entry has been reset to zero. Naturally that's happening inside the dynamic linker (as expected, confirmed with a watchpoint). If you've ever had to debug ld.so, you'll know it's an insanely painful experience, but it proved fruitful.
The key was finding out that we were not using the libbfd.so linker map to resolve lrealpath, instead we were using the linker map for the main program (ar in this case). So natrually it's time to look a bit more closely at the symbol table for ar.
The main symbol table for ar it doesn't mention lrealpath. But that's just a confusing byproduct of having two symbol tables. What matters to ld.so is the *dynamic* symbol table. And ar has lrealpath in its dynamic symbol table. And here's the kicker, it's an absolute symbol with the value 0:
0000000000000000 A lrealpath
A symbol in the main program takes precedence over a symbol in a DSO. So the dynamic linker was actually doing the right thing given the input it was provided.
Now why (*&@#$ does ar have lrealpath as an absolute symbol? It's got to be related to the fact that when we link ar we pull in another copy of libiberty. In fact, ar links against libiberty twice. Once via -liberty then again against libiberty.a (and kindof a 3rd time indirectly via libbfd). BUt even so that shouldn't be creating an absolute symbol. That's just weird.
This smells like a linker bug to me. Not surprisingly if I force the system to use ld.gold, then I don't see the bogus absolute symbol and the resultant ar works just fine.
It's late and I'll dig further over the weekend, but right now this looks like a linker bug to me. I may turn off LTO globally or in the various instances of binutils -- I need to sleep on that.
I'm seeing the same behavior with man-db, more specifically with accessdb linking to libmandb:
$ nm -D accessdb | grep xmalloc 0000000000000000 A xmalloc
Obviously it segfaults, unless I disable LTO.
Is there a bugzilla for that linker bug?
I don't think so. Nick was trying to pull together a simpler testcase and open a discussion with the other binutils developers on a path forward. He's aware of the impacts, so I'm sure he's working diligently on it.
In the immediate term, disabling LTO seems reasonable.
%define _lto_cflags %{nil}
We're going to go through all the opt-outs at some point after the mass rebuild, so we can re-enable once the ld bug is fixed.
jeff