Need some ocaml help: coq rebuild failing

Jerry James loganjerry at gmail.com
Fri Jun 15 04:01:37 UTC 2012


I'm having a problem with building the coq package for the new OCaml
4.00.0, and I'm at my wits' end.  There were some bad interactions
between the new OCaml, camlp5, and coq which I think I have
successfully worked around.  (It isn't pretty, but it seems to do the
job.)  But now, after the tools are built, the documentation building
step is failing due to seemingly random segfaults in the coqdoc tool.

The segfault can be triggered reliably on i386, but I've only seen it
once in awhile on x86_64.  In fact, I did a build on an x86_64 Rawhide
VM today that ran all the way to the end.  If I build in koji or mock,
coqdoc (which is invoked lots of times during the build) eventually
segfaults, but on different invocations each time.  I did a mock
build, waited for it to fail, then did "mock --shell" to see what I
could learn.  If I run coqdoc inside of gdb, it *never* segfaults.
I've run it and run it, and it never, ever has segfaulted.  I even did
"set disable-randomization off", but it still ran successfully every
time.  If I run it under valgrind, it *always* segfaults.  Here is
some sample output from valgrind (after manually rebuilding coqdoc to
get debugging symbols; the Makefile strips it once it is built):

==25130==
==25130== Conditional jump or move depends on uninitialised value(s)
==25130==    at 0x809E1C5: invert_pointer_at (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x809E499: do_compaction (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x80903DA: caml_major_collection_slice (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x8090AC3: caml_minor_collection (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x808EBD3: caml_garbage_collection (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x809FC0D: ??? (in /builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x8076DF6: camlPrintf__kfprintf_1356 (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x80559D1: camlOutput__printf_1012 (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==
==25130== Use of uninitialised value of size 4
==25130==    at 0x809E1EB: invert_pointer_at (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x809E499: do_compaction (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x80903DA: caml_major_collection_slice (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x8090AC3: caml_minor_collection (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x808EBD3: caml_garbage_collection (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x809FC0D: ??? (in /builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x8076DF6: camlPrintf__kfprintf_1356 (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x80559D1: camlOutput__printf_1012 (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==
==25130== Use of uninitialised value of size 4
==25130==    at 0x809E1F2: invert_pointer_at (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x809E499: do_compaction (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x80903DA: caml_major_collection_slice (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x8090AC3: caml_minor_collection (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x808EBD3: caml_garbage_collection (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x809FC0D: ??? (in /builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x8076DF6: camlPrintf__kfprintf_1356 (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==    by 0x80559D1: camlOutput__printf_1012 (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==
==25130== Invalid read of size 4
==25130==    at 0x8058AE3: camlOutput__fun_2066 (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==  Address 0x804 is not stack'd, malloc'd or (recently) free'd
==25130==
==25130==
==25130== Process terminating with default action of signal 11 (SIGSEGV)
==25130==  Access not within mapped region at address 0x804
==25130==    at 0x8058AE3: camlOutput__fun_2066 (in
/builddir/build/BUILD/coq-8.3pl4/bin/coqdoc)
==25130==  If you believe this happened as a result of a stack
==25130==  overflow in your program's main thread (unlikely but
==25130==  possible), you can try to increase the size of the
==25130==  main thread stack using the --main-stacksize= flag.
==25130==  The main thread stack size used in this run was 8388608.
==25130==
==25130== HEAP SUMMARY:
==25130==     in use at exit: 146,066,664 bytes in 313 blocks
==25130==   total heap usage: 4,692 allocs, 4,379 frees, 294,975,004
bytes allocated
==25130==
==25130== LEAK SUMMARY:
==25130==    definitely lost: 0 bytes in 0 blocks
==25130==    indirectly lost: 0 bytes in 0 blocks
==25130==      possibly lost: 140,299,840 bytes in 228 blocks
==25130==    still reachable: 5,766,824 bytes in 85 blocks
==25130==         suppressed: 0 bytes in 0 blocks
==25130== Rerun with --leak-check=full to see details of leaked memory
==25130==
==25130== For counts of detected and suppressed errors, rerun with: -v
==25130== Use --track-origins=yes to see where uninitialised values come from
==25130== ERROR SUMMARY: 8951807 errors from 4 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

The first three warnings may or may not be significant; I don't know.
The actual segfaulting address is the second instruction of its
function, which starts like this:

Dump of assembler code for function camlOutput__fun_2066:
   0x08058ae0 <+0>:	sub    $0x10,%esp
   0x08058ae3 <+3>:	mov    0x4(%ebx),%ecx

I don't know how to debug this any further.  This is a problem because
coq has broken dependencies due to the OCaml upgrade, and I have
several other packages with broken dependencies that cannot be rebuilt
until coq is.  Sorry for the large broken deps text at the top of the
Rawhide reports.
-- 
Jerry James
http://www.jamezone.org/


More information about the devel mailing list