Hi,
Cced to crash-catcher@lists.fedorahosted.org.
On Wed, 26 Aug 2009 04:31:12 +0200, Denys Vlasenko wrote:
When we process a crash, we have a core file. We just run gdb on it in batch mode by running
gdb -batch -x FILE
where FILE contains:
file BINARY code COREFILE
^^^^->core-file
Fedora GDB does not need the first "file" command, it will find out the binary by its build-id. But for binaries either without build-id (=built on non-Fedora GCC) or which do not have their debuginfo rpm installed we would not find the filename of BINARY so it is right to say also "file BINARY".
thread apply all backtrace full q
It tries to locate debuginfo by finding executable's build id, [Jan, can you expand on this - does gdb look at executable or at the core file in order to find build id? If it looks at core file for this, does code file contain build ids of loaded libraries too?] then looks it up in /usr/lib/debug/.build-id/XX/XXXXXXXXXXXXXXX and uses if it is found.
If you type "file BINARY" it will try to find the separate .debug file according to the build-id of BINARY. In such case COREFILE build-id would be ignored.
If you type just "core-file COREFILE" (without "file BINARY") it will find the binary according to its build-id.
Libraries are always found preferred to their build-id.
Separate debug info files for binaries and libraries are always found preferred according to the build-id in the binary/library (not core file). But as build-id of the binary, .debug file and the core file note must be always the same by definition this paragraph is just an implementation detail.
example: # ls -l /usr/lib/debug/.build-id/00/5af5b5e7d6ab560825b0747fcbe41112431b8c.debug lrwxrwxrwx 1 root root 28 2009-07-20 18:08 /usr/lib/debug/.build-id/00/5af5b5e7d6ab560825b0747fcbe41112431b8c.debug -> ../../usr/bin/makestrs.debug
However, we (abrt) do not know whether debuginfo is installed, so currently we just run "debuginfo-install -y -- PACKAGE".
ABRT should eu-unstrip -n --core=/tmp/core.20546 and for its each produced line like 0x3979600000+0x36e000 ec8dd400904ddfcac8b1c343263a790f977159dc@0x3979600280 /lib64/libc-2.10.1.so /usr/lib/debug/lib64/libc-2.10.1.so.debug libc.so.6 use yum --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/ec/8dd400904ddfcac8b1c343263a790f977159dc.debug (or some other yum/gpk command what recommend their maintainers)
This is a simple approach, but it has several drawbacks.
Yes, I was already suggesting the direct build-id way before which avoids the package versions mistakes and it may even work one day after we convince releng they should keep debuginfos even for some (all) previous versions of packages. Currently for any longterm running programs/daemons the crash backtrace always fails as there is no longer the debuginfo available for the running version of the program (while the on-disk program binary is already updated, incl. its debuginfo package, even if it was already installed).
So, Jan, what is this pk-debuginfo-install thing, and how it can help us here?
Tried installing pk-debuginfo-install but it wanted to install some graphical mess. Crashes bugreporting must not rely on graphical tools to be usable on the RHEL text-only server farms.
Thanks, Jan
Hi Jan,
On Wed, 2009-08-26 at 10:39 +0200, Jan Kratochvil wrote:
thread apply all backtrace full q
It tries to locate debuginfo by finding executable's build id, [Jan, can you expand on this - does gdb look at executable or at the core file in order to find build id? If it looks at core file for this, does code file contain build ids of loaded libraries too?] then looks it up in /usr/lib/debug/.build-id/XX/XXXXXXXXXXXXXXX and uses if it is found.
If you type "file BINARY" it will try to find the separate .debug file according to the build-id of BINARY. In such case COREFILE build-id would be ignored.
If you type just "core-file COREFILE" (without "file BINARY") it will find the binary according to its build-id.
This may be wrong in the rare case when binary name is somehow misdetected, or the binary was replaced. But such cases are not typical, so I do not want to worry about it just yet.
Libraries are always found preferred to their build-id.
This is the part I am interested in. How can we extract libraries' build-ids? By ldd'ing the binary and then extracting libraries' build-ids? What about dlopen'ed libs, how to find their debuginfos?
Basically, we need to answer the question "do we need to install debuginfo packages, and which ones?". For that, we need to know "what debuginfo FILES (not packages) gdb would need?".
One way to achieve it is to obtain the list of all build-ids of all binaries/libraries loaded in crashed process' memory. Then it is trivial to check existence of /usr/lib/debug/.build-id/XX/XXX files.
Can we do it somehow? I imagine the last resort way to do it is to read gdb source and extract the code which does that, but maybe there is a simpler way?
So, Jan, what is this pk-debuginfo-install thing, and how it can help us here?
Tried installing pk-debuginfo-install but it wanted to install some graphical mess. Crashes bugreporting must not rely on graphical tools to be usable on the RHEL text-only server farms.
Yeah, I now know that you aren't the right person to talk about pk-debuginfo-install. I now have another contact email to try. -- vda
Hi Denys,
On Thu, 03 Sep 2009 17:21:40 +0200, Denys Vlasenko wrote:
On Wed, 2009-08-26 at 10:39 +0200, Jan Kratochvil wrote:
If you type just "core-file COREFILE" (without "file BINARY") it will find the binary according to its build-id.
This may be wrong in the rare case when binary name is somehow misdetected, or the binary was replaced. But such cases are not typical, so I do not want to worry about it just yet.
I find it very common, 30% of processes on my system already have their binary/library files deleted due to `yum update's: # ls -l /proc/*/maps|wc -l 282 # for i in /proc/*/maps;do egrep '(/lib|/bin).*deleted' $i|grep -vq prelink && echo $i;done|wc -l 86 -> ~30%
There is another problem one currently no longer has debuginfo files for it installed but that requires two unrelated action items: * Installation of multiple debuginfo rpms simultaneously. Currently not possible, planned by Roland McGrath, hacked it before: http://people.redhat.com/jkratoch/multidebug/ * distribution of debuginfo rpms for releases: * full release (as is) * every released update (not just the last update as currently is) (distribution of debuginfo rpms for rawhide) * probably about last two weeks of built rpms or so.
Not following build-ids would be another item to solve. Currently after a crash it has no valid backtrace one has to restart the current on-disk version of the daemon hoping it will crash again before the next `yum update'.
Libraries are always found preferred to their build-id.
This is the part I am interested in. How can we extract libraries' build-ids?
By ldd'ing the binary and then extracting libraries' build-ids? What about dlopen'ed libs, how to find their debuginfos?
Right, DT_NEEDED (=ldd) way would not catch those.
Basically, we need to answer the question "do we need to install debuginfo packages, and which ones?". For that, we need to know "what debuginfo FILES (not packages) gdb would need?".
One way to achieve it is to obtain the list of all build-ids of all binaries/libraries loaded in crashed process' memory. Then it is trivial to check existence of /usr/lib/debug/.build-id/XX/XXX files.
Can we do it somehow?
Core file contains build-id of every ELF file loaded in memory (if CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS).
You can extract this build-id list as I wrote before:
On Wed, 26 Aug 2009 10:39:23 +0200, Jan Kratochvil wrote: # ABRT should # eu-unstrip -n --core=/tmp/core.20546 # and for its each produced line like # 0x3979600000+0x36e000 ec8dd400904ddfcac8b1c343263a790f977159dc@0x3979600280 /lib64/libc-2.10.1.so /usr/lib/debug/lib64/libc-2.10.1.so.debug libc.so.6 # use # yum --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/ec/8dd400904ddfcac8b1c343263a790f977159dc.debug # (or some other yum/gpk command what recommend their maintainers)
It is enough for GDB to provide all these files as the one listed above: /usr/lib/debug/.build-id/ec/8dd400904ddfcac8b1c343263a790f977159dc.debug
Point #2: ------------------------------------------------------------------------------ This solution imperfect as some program (for example linker) may have mmap(2)ed some library which it does not execute and does not need for a backtrace. GDB will not even search for such debug info file.
(One could improve such heuristics by checking the 'x' (executable) flag of page ranges of such mmap(2)ed data in /proc/PID/maps but Linux kernel currently does not save /proc/PID/maps into a core file - although there were some intentions (or even kernel patches?) to do so. Still it would be just heuristics.) 3979200000-397921f000 r-xp 00000000 fd:00 5236735 /lib64/ld-2.10.1.so --> ^ <-- 397941e000-397941f000 r--p 0001e000 fd:00 5236735 /lib64/ld-2.10.1.so 397941f000-3979420000 rw-p 0001f000 fd:00 5236735 /lib64/ld-2.10.1.so
The right solution to never download unneeded .debug files would * find the AUXV note in the core file. eu-readelf -n corefile [...] CORE 288 AUXV * Find the executable binary VMA (address-in-memory) in it: [...] PHDR: 0x400040 * Find build-id of the executable in that page. * Load matching executable file from disk according to that build-id as the next looked up structures may be in readonly pages omitted in the core file. [ Here it is similar to GDB elf_locate_base()->scan_dyntag(DT_DEBUG).] * Find DYNAMIC segment address in that PHDR. eu-readelf -l executable-file Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align [...] DYNAMIC 0x0cb5e8 0x00000000006cb5e8 0x00000000006cb5e8 0x0001b0 0x0001b0 RW 0x8 * Find DT_DEBUG tag in that DYNAMIC segment: eu-readelf -d executable-file DEBUG [ tag value is 0x0 in the on-disk file, read it from the core file ] * The DT_DEBUG tag value is the address of: extern struct r_debug _r_debug; * _r_debug.r_map contains the linkmap of loaded shared libraries to traverse.
Code for this traversal from a core file would be probably best to write as a new program based on elfutils. ------------------------------------------------------------------------------
I imagine the last resort way to do it is to read gdb source and extract the code which does that, but maybe there is a simpler way?
I think currently the eu-unstrip is good enough as in real world cases there will never be needless excessive .debug files being downloaded.
Thanks, Jan
Code for this traversal from a core file would be probably best to write as a new program based on elfutils.
The libdwfl logic that --core gets you actually does this already. (This is how e.g. -e exe --core core can work when the core lacks build IDs.)
It just doesn't punt the other modules discovered by "raw" ELF image detection, so those will remain in the list too but ordered last (I think) and with their names taken from embedded DT_SONAME in the core (not usually there unless it's an ELF-headers dump) or default "[dso]" instead of the "proper" name from link_map.
If you wanted to change this to give more control to the caller of libdwfl, that should not be hard.
Thanks, Roland
If you type just "core-file COREFILE" (without "file BINARY") it will find the binary according to its build-id.
This may be wrong in the rare case when binary name is somehow misdetected, or the binary was replaced. But such cases are not typical, so I do not want to worry about it just yet.
I'm not following you. Going purely by build ID is exactly what avoids being wrong when files have changed on disk since the process ran.
One way to achieve it is to obtain the list of all build-ids of all binaries/libraries loaded in crashed process' memory. Then it is trivial to check existence of /usr/lib/debug/.build-id/XX/XXX files.
Can we do it somehow? I imagine the last resort way to do it is to read gdb source and extract the code which does that, but maybe there is a simpler way?
eu-unstrip -n --core FILE eu-unstrip -n -p PID
or the libdwfl calls that uses. Didn't we discuss this before on this list?
Thanks, Roland
crash-catcher@lists.fedorahosted.org