Change to DSO-linking semantics of the compiler

John Reiser jreiser at bitwagon.com
Wed Jan 13 17:05:20 UTC 2010


On 01/13/2010 08:24 AM, Nick Clifton wrote:
> Hi Guys,
>
>>> SystemTap is failing on pthread_cancel, which is odd since we have no
>>> mention of pthread in our own sources.  It seems to be pulled in by some
>>> headers in the STL.  Consider this minimal example:
>>>
>>> $ cat string.cxx
>>> #include<string>
>>> int main()
>>> {
>>>       return std::string("foo").length() != 3;
>>> }
>>> $ g++ -c string.cxx
>>> $ nm -C string.o
>>>                    w pthread_cancel
>>> $ g++ -o string string.o
>>>
>>> This is fine, becauses __gthread_active_p is just using the fact that
>>> the weak pthread_cancel symbol becomes 0 if libpthread isn't linked.
>>>
>>> But if one of your dependent libraries uses pthreads, suddenly the main
>>> executable gets the normal pthread_cancel symbol too, and the new linker
>>> serves up death:
>>>
>>> $ g++ -o string string.o -ldb
>>> /usr/bin/ld.bfd: string.11980.test: undefined reference to symbol
>>> 'pthread_cancel@@GLIBC_2.2.5'
>>> /usr/bin/ld.bfd: note: 'pthread_cancel@@GLIBC_2.2.5' is defined in DSO
>>> /lib64/libpthread.so.0 so try adding it to the linker command line
>
> But, you have added an explicit dependency upon libdb to your executable
> by mentioning -ldb on the gcc command line.  Therefore libdb will be
> loaded at execution start-up.  But libdb has a dependency upon
> libpthread, so that library will also be loaded at execution start-up.
> Hence when you run 'string' the pthread_cancel symbol will be resolved
> and so 'string' really does now have a resolved reference to
> pthread_cancel.  Hence the linker is correct in complaining that you
> have a reference to a symbol that is defined in a library which have not
> included on the linker command line.


The original reference to 'pthread_cancel' in 'string.o' was a *weak* and
*undefined* reference with no symbol version specified:
-----
$ readelf --all string.o
Symbol table '.symtab' contains 20 entries:
     19: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND pthread_cancel
No version information found in this file.
-----


The statically-bound reference in 'string' (without "-ldb") remains a
*weak undefined* reference with no symbol version specified:
-----
$ g++ -o string string.o
$ readelf --all string
Symbol table '.dynsym' contains 12 entries:
      8: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND pthread_cancel
Symbol table '.symtab' contains 73 entries:
     65: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND pthread_cancel
Version symbols section '.gnu.version' contains 12 entries:
   008:   0 (*local*)       3 (GLIBCXX_3.4)   5 (GCC_3.0)       4 (CXXABI_1.3)
-----


With -ldb, then the statically-bound reference has been associated with
GLIBC_2.2.5:
-----
$ g++ -o string.db string.o -ldb-4.7   # done on Fedora 12, not Fedora 13.
$ readelf --all string.db
Symbol table '.dynsym' contains 17 entries:
     10: 00000000004007a0     0 FUNC    WEAK   DEFAULT  UND pthread_cancel at GLIBC_2.2.5 (4)
Symbol table '.symtab' contains 73 entries:
     65: 00000000004007a0     0 FUNC    WEAK   DEFAULT  UND pthread_cancel@@GLIBC_2.2.5
Version symbols section '.gnu.version' contains 17 entries:
   008:   3 (GLIBCXX_3.4)   6 (GCC_3.0)       4 (GLIBC_2.2.5)   1 (*global*)
-----


By itself, the association between *weak undefined* pthread_cancel and
GLIBC_2.2.5 is innocuous.  That is what the static linker saw.  The problem
comes when code starts believing that GLIBC_2.2.5 is a requirement for
*weak undefined* pthread_cancel.  In today's rawhide for Fedora 13,
both  the static linker /usr/bin/ld and the runtime linker ld-linux.so
make this error.

*weak undefined* means "I accept *any* definition, or even *no* definition."
Both binutils and glibc must fix their errors of insisting on any particular
symbol version for a *weak undefined* symbol.

-- 


More information about the devel mailing list