Status 2009-01-19
by Petr Machata
= Work done last week:
* Time spent on elfutils: ~80%
* dwarflint:
* Validation of .debug_pubnames, .debug_pubtypes
* DIE references now remember where did they originate from.
That's used when writing an error message about the reference not being
satisfied.
* Message machinery was overhauled. Many macros were macros just
to honor the underlying message macros, and as this transitively applied
to functions higher up the hierarchy, everything was slowly becoming a
mess of variadic macros. I turned everything into functions, so at
least now we were passing va_lists back and forth, and had well defined
boundaries between chunks of code.
Then I introduced "struct where". Instances pinpoint a message
to concrete debug object (e.g. abbreviation section, abbreviation,
attribute), and may reference another where (e.g. where of a DIE would
reference where of a related abbreviation). Wheres can be chained to
form a "caused by" trace, like the way GCC does it for template
instantiations. This helped to clean up interfaces a lot: now the only
thing you need to pass between functions is a where pointer.
* Validation and coverage analysis of .debug_loc. For some
reason this consistently yields a lot of holes. I must be missing some
loclist pointer attribute. I picked a couple holes at random, and
grepped through the output from elfutils -w. The addresses were nowhere
to be found. So either elfutils is missing the same attribute as I, or
there really are holes of unreferenced garbage, or the holes I picked
were actually legitimate holes. I need to look closer on that one. I'm
thinking I'll just scan through the DIE chain looking for attributes
whose value happens to be interpretable as the address of one of the
holes (ala garbage collector).
= Work scheduled for this week:
* Expected time spent on elfutils: 50-80%.
* Finish what's left of .debug_aranges work (address/length
validation, .text coverage). Finish what's left of .debug_loc (i.e.
look at the coverage problem above).
* Write .debug_ranges validation.
* Fix 10-byte LEB128 that Roland posted about.
* Start the reloc work if I get the above in time.
PM
15 years, 4 months
elfutils status 2009-1-19
by Roland McGrath
Last week:
Actual time on elfutils: 80%
* very braindead week (ugh)
* posted brain dump wrt 0.138 libelf (elf_update) regression
-> hope Uli will figure it all out for me
* gzip/bzip2 support in libdwfl (RHBZ #472136)
* dwarflint nit fixes
* enabled 'transif' user, filed RHBZ #480713
-> keep an eye out for translator activity
* stared at attr value interfaces, no real progress
* mass tests on dwarfcmp -T, dwarflint
** dwarflint doing well
** dwarfcmp -T uses pathological amounts of memory for large files;
more or less expected, though surprising how huge;
not representative (hopefully) of how the output data structures
will really work
*** rejiggered debuginfo-test-scripts slightly so it's less likely
to simultaneously test on two huge files with similar names
** no dwarfcmp failures
* contemplating some imported_unit issues
* F11 feature page got wrangled and went to FESCo
** feedback: want more concrete details, delivery dates
-> yeah, me too, buddy
** jkeating says "maybe mid-Feb" for F11 mass rebuild
-> 4 weeks for the whole kit and kibootle to be done & perfect (uh...)
This week:
Expected time on elfutils: 80%
* post to dwarf-discuss list wrt imported_unit issues
* 0.139 release
** needs elf_update issue resolved
* attr value interfaces
15 years, 4 months
dwarflint vs 10-byte leb128
by Roland McGrath
Your check for bogus LEB128 rejects a correct encoding of 64 bits. 9 bytes
encodes 63 bits, and 64 bits takes 10 bytes. A 10-byte encoding is correct
as long as the last byte is 0x1, since encoding more than 64 bits is bogus.
This comes up because of some bogons, but they are higher-level bogons.
It's hitting in location expressions like DW_OP_plus_uconst(0xfffffffffffffff8).
That's a bogon because the compiler meant -8 but used plus_uconst, which
only takes an unsigned constant. I think it would be reasonable to cite
any unsigned constant with lots of high 1 bits as "suspicious".
Thanks,
Roland
15 years, 4 months
dwarflint 40d0945..1d07862
by Roland McGrath
With the small loc expr fixes in commit 1d07862, dwarflint passes
self-test now. I've started a mass test run of -q -i --gnu.
I'll let you know if there are any errors.
Thanks,
Roland
15 years, 4 months
dwarflint crash
by Roland McGrath
I'm running this mass test:
./single-file-test.sh norel dwarflint-fdc9733 /test/build-elfutils-O3/run.sh src/dwarflint -q -i --gnu {}
http://roland.fedorapeople.org/tmp/gnu.gettext.DumpResource.debug.bz2
gettext-debuginfo-0.16.1-12.fc8.i386:usr:lib:debug:usr:lib:gettext:gnu.gettext.DumpResource.debug
is a file on which it crashed. The only cases so far are crashes on a few
files from this or similar packages, probably the same case.
Thanks,
Roland
15 years, 4 months
dwarflint relocs
by Roland McGrath
I'm thinking about the attr value interfaces and not making much progress yet.
So I thought I'd give a brain dump about dwarflint doing reloc checks.
Petr might run out of low-level section checks to implement before I get
the attr value stuff in shape to use it as the basis for high-level checks.
For some test data, try:
mkdir k; eu-unstrip -k -a -m -d k
(with kernel-debuginfo installed, or s/-k/-K/ for many more files).
In an ET_REL file, you need to check for reloc sections applying to each of
the sections you check. That's any section with sh_type SHT_REL or SHT_RELA
and nonzero sh_size, and sh_info matching the section index of a section
you check. Complain if there are multiple reloc sections pointing to the
same debug section. Each reloc section's sh_link is the section index of
the symtab it uses; complain if they don't all point to the same section.
You must examine the reloc section before you can check the corresponding
section. Check all its relocs' type with ebl_reloc_simple_type (see
libdwfl/relocate.c example). Only WORD, SWORD, XWORD, SXWORD make sense.
If any relocs have types we don't grok, you could either blacklist the
section (as if you'd found a bad format error parsing the relocated section
itself), or just report the individual bad relocs in-line in the checks.
Make a table of the relocs sorted by r_offset. (Given plain READ or
READ_MMAP_PRIVATE, you could just sort the reloc section data in place.)
As you check each section linearly, you can keep a pointer/index that
follows along in the reloc table, which is now sorted by offset into the
section. If you cross the offset touched by the next reloc when you aren't
expecting it, complain. You expect a reloc (but don't require one) when
looking at a fixed-size 4-byte or 8-byte quantity (i.e. usually offset_size
or address_size). The reloc type has to match the size you are decoding.
For an SHT_REL table, the value decoded from the section is the "addend" to
yield the pair (symndx, addend) from the reloc. For an SHT_RELA table, the
pair is (symndx, r_addend); the value in the section is ignored, and you
should complain if it's not zero.
When you hit the relocatable value, you were either looking for an offset
in some section or for a target value. i.e., the CU header's offset into
.debug_abbrev, a strp, ref_addr forms, etc. For the *ptr attribute classes
(offsets into .debug_{line,loc,ranges,macinfo}) you need to match the known
tag and attribute; in the long run we'll do that with a centralized map of
known stuff, but for the moment you can just hard-code DW_AT_ranges et al.
This yields what section makes sense for the value: a particular one of the
debug sections, or any allocated section.
If symtab[symndx].st_shndx does not match the expected debug section's
index, complain. If a target value is what's expected, then complain if
it's not either SHN_ABS, an SHF_ALLOC section, or SHN_UNDEF.
For the offsets into debug sections, you then use st_value+addend in place
of the value decoded from the section, in your connectivity maps et al.
If relocs occur inside a block form, that has more cases.
We'll get to that later. Just note it for now.
For DW_FORM_addr and for data forms of address_size (when they are class
constant, worry about that later), an SHN_UNDEF reloc is acceptable.
In all other places, reject SHN_UNDEF.
In all begin/end address pairs, such as in .debug_ranges, .debug_loc, and
.debug_aranges entries, if either the begin or end is relocated, complain
unless both have relocs and both relocs' symbols have matching st_shndx and
begin's st_value+addend <= end's st_value+addend.
In DW_FORM_addr and all header/other section places where an address is
specifically required, do a "suspicious" warning if there is no reloc for
that spot and its in-place value is not zero. (This in ET_REL files only,
of course.)
Is that plan all clear?
Thanks,
Roland
15 years, 4 months
Fedora translators access
by Roland McGrath
https://bugzilla.redhat.com/show_bug.cgi?id=479491 was filed, asking us
to enable access for the Fedora Localization Project's translators.
I've never dealt with the Fedora Localization procedures before, so
I'm just reading http://fedoraproject.org/wiki/L10N/FAQ to see.
Apparently the way it works is that we allow git commit access for their
robot account and they commit files in po/ whenever they want.
I would have thought they wanted us to have the .pot file committed,
but there is nothing in their instructions about that. So maybe not.
If there is no objection, then I'll follow their procedures and we'll see
what it all means.
Thanks,
Roland
15 years, 4 months
elf_update bug
by Roland McGrath
I fought enough rounds with this libelf problem to get my head muddled
about it, and then it's been several days since I thought hard about it.
So now I just want to air everything I think I know about it before it rots
any further in my brain. Some fresh eyeballs on this and checking up my
logic would help a lot.
In <= 0.137, elf_update could crash if the application set ELF_F_LAYOUT and
then made some sh_offset+sh_size ranges that overlap. This happened in
eu-strip with a botched input file.
https://bugzilla.redhat.com/show_bug.cgi?id=476136 is for that crash (and
some others). The case was: eu-strip -o foo try.out with input file
https://bugzilla.redhat.com/attachment.cgi?id=322583
In commit 75b07c00/commit 534ad315, I fixed that case.
This code shipped in 0.138, and is still current on the trunk.
This caused a regression in a different case. It broke rpmbuild's
debugedit on some files (killing rawhide kernel builds), which started
hitting a new assert. debugedit uses ELF_F_LAYOUT, and then dirties only
one or two of several sections. Here the case was some .ko file (ET_REL).
I think the situation was that a dirty section with an odd size
(e.g. .debug_str, with sh_addralign=1) is followed by a non-dirty section
that requires some filling (sh_addralign>1), or some similar interleaving
of sections. It crashed in the filling code after the bookeeping got
confused by skipping the non-dirty sections. I'm having trouble tracking
down the reproducer for that one right now. (It was in rawhide koji failed
builds that have been GC'd.) I'll keep looking.
On roland/pending I have commit ed9d3bc1. This fixed the crash. (This is
in rawhide only, 0.138-2.) However, it means that elf_update will touch
all the inter-section alignment fill bytes every time, even ones between
two non-dirty sections that didn't move at all. That seems suboptimal.
In retrospect it occurred to me that 0.137 might have been wrong in certain
cases. Say you have an odd-sized section followed by an aligned section,
so there is fill space between them. Say the odd-sized section had been 3,
and the next section is aligned to 4. Now you change that section's
sh_size to 1, and mark it dirty. elf_update won't write those fill bytes.
What should the general rule be here? It's not clear e.g. that one should
not elf_begin(,ELF_C_RDWR,) + elf_fill(,0xaa) + elf_update(,) to change
all the inter-section fill bytes.
Off hand what seems like the rule that is simplest to grok, while not being
ridiculously over-eager to write fill bytes, is when you dirty a section,
then any fill bytes before that section required by its alignment get
written. Otherwise, they don't. (Here the section headers count as "a
section" that might have been dirtied.)
That might even have been what the rule was before. But I've fiddled with
it too much and become unclear now.
We should have test cases for these three scenarios in our suite.
Note that affected code path has mmap and non-mmap variants that
are similar but don't share code. So proper regression tests will
try both ELF_C_RDWR and ELF_C_RDWR_MMAP.
https://bugzilla.redhat.com/attachment.cgi?id=322583 (try.out)
is large and of unknown provenance, so we can't just use it.
We need to construct a test file that tickles the same problem.
I'm trying to hit the debugedit regression now with:
pkgs/kernel/devel
cvs up -r kernel-2_6_29-0_9_rc0_git4_fc11
make x86_64
on F-10 with elfutils-0.138-1.fc10.x86_64 (updates-testing, buggy,
should behave the same as current trunk code, i.e. w/o commit ed9d3bc1).
This is after I failed to find a file more easily that triggered using the
test case below. This might fail to correctly mimic the debugedit behavior
that matters. Or there might be some necessary weirdness about the
particular test file required that I haven't understood correctly.
(I tried 'mkdir k; eu-unstrip -am -K -d k' on F-10 and tried all those
.ko files with no hits of the bug.)
The third case is purely theoretical. If it really existed, we should
construct a test case to tickle it and verify that it makes <= 0.137 misbehave.
Thanks,
Roland
=====
/* Copyright (C) 2009 Red Hat, Inc.
This file is part of Red Hat elfutils.
Red Hat elfutils is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; version 2 of the License.
Red Hat elfutils is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License along
with Red Hat elfutils; if not, write to the Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston MA 02110-1301 USA.
Red Hat elfutils is an included package of the Open Invention Network.
An included package of the Open Invention Network is a package for which
Open Invention Network licensees cross-license their patents. No patent
license is granted, either expressly or impliedly, by designation as an
included package. Should you wish to participate in the Open Invention
Network licensing program, please visit www.openinventionnetwork.com
<http://www.openinventionnetwork.com>. */
#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <gelf.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
static void
handle_file (const char *file, Elf_Cmd cmd)
{
int fd = open (file, O_RDWR);
if (fd < 0)
error (EXIT_FAILURE, errno, "cannot open input file '%s'", file);
Elf *elf = elf_begin (fd, cmd, NULL);
size_t shstrndx;
if (elf_getshstrndx (elf, &shstrndx))
error (EXIT_FAILURE, 0, "problems opening '%s' as ELF file: %s",
file, elf_errmsg (-1));
Elf_Scn *scn = NULL;
while ((scn = elf_nextscn (elf, scn)) != NULL)
{
GElf_Shdr shdr_mem;
const char *name = elf_strptr (elf, shstrndx,
gelf_getshdr (scn, &shdr_mem)->sh_name);
if (name != NULL && !strncmp (name, ".debug", 6))
elf_flagscn (scn, ELF_C_SET, ELF_F_DIRTY);
}
elf_flagelf (elf, ELF_C_SET, ELF_F_LAYOUT);
if (elf_update (elf, ELF_C_WRITE) == -1)
error (EXIT_FAILURE, 0, "elf_update failed: %s", elf_errmsg (-1));
elf_end (elf);
close (fd);
}
int
main (int argc, char *argv[])
{
elf_version (EV_CURRENT);
for (int i = 1; i < argc; ++i)
{
handle_file (argv[i], ELF_C_RDWR_MMAP);
handle_file (argv[i], ELF_C_RDWR);
}
return 0;
}
15 years, 4 months
Re: elf_update bug
by Roland McGrath
I can't figure out why my test program doesn't trigger the same bug.
Take http://roland.fedorapeople.org/tmp/deadline-iosched.ko.bz2 and run:
/usr/lib/rpm/debugedit -b /home/roland/redhat/pkgs/kernel/devel/kernel-2.6.28 -d /usr/src/debug deadline-iosched.ko
This crashes with 0.138-1 (or point LD_LIBRARY_PATH=yourbuild/libelf,
using master). It doesn't crash this test.
/* Copyright (C) 2009 Red Hat, Inc.
This file is part of Red Hat elfutils.
Red Hat elfutils is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; version 2 of the License.
Red Hat elfutils is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License along
with Red Hat elfutils; if not, write to the Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston MA 02110-1301 USA.
Red Hat elfutils is an included package of the Open Invention Network.
An included package of the Open Invention Network is a package for which
Open Invention Network licensees cross-license their patents. No patent
license is granted, either expressly or impliedly, by designation as an
included package. Should you wish to participate in the Open Invention
Network licensing program, please visit www.openinventionnetwork.com
<http://www.openinventionnetwork.com>. */
#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <gelf.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
static void
handle_file (const char *file, Elf_Cmd cmd)
{
int fd = open (file, O_RDWR);
if (fd < 0)
error (EXIT_FAILURE, errno, "cannot open input file '%s'", file);
Elf *elf = elf_begin (fd, cmd, NULL);
size_t shstrndx;
if (elf_getshstrndx (elf, &shstrndx))
error (EXIT_FAILURE, 0, "problems opening '%s' as ELF file: %s",
file, elf_errmsg (-1));
elf_flagelf (elf, ELF_C_SET, ELF_F_LAYOUT);
Elf_Scn *scn = NULL;
while ((scn = elf_nextscn (elf, scn)) != NULL)
{
GElf_Shdr shdr_mem;
const char *name = elf_strptr (elf, shstrndx,
gelf_getshdr (scn, &shdr_mem)->sh_name);
if (name != NULL && !strcmp (name, ".debug_str"))
elf_flagdata (elf_rawdata (scn, NULL), ELF_C_SET, ELF_F_DIRTY);
}
if (elf_update (elf, ELF_C_NULL) == -1)
error (EXIT_FAILURE, 0, "elf_update failed: %s", elf_errmsg (-1));
if (elf_update (elf, ELF_C_WRITE) == -1)
error (EXIT_FAILURE, 0, "elf_update failed: %s", elf_errmsg (-1));
elf_end (elf);
close (fd);
}
int
main (int argc, char *argv[])
{
elf_version (EV_CURRENT);
for (int i = 1; i < argc; ++i)
{
handle_file (argv[i], ELF_C_RDWR_MMAP);
handle_file (argv[i], ELF_C_RDWR);
}
return 0;
}
15 years, 4 months
elfutils status 2009-1-12
by Roland McGrath
Last week:
Actual time on elfutils: 60%
* fixes-for-c++ merged to trunk
* fixes and comments in libdw/c++/dwarf
* pushed shared dwarf branch, merged dwarflint
* F11 feature page updated (no wrangler activity yet)
* DwarfValues wiki page
* debuginfo-test-scripts
** refined scripts (no more one directory of 30000 long names)
** spare machine disk went bad -> reinstalled, respun extraction from scratch
* added dwarflint-self, dwarfcmp-self tests
* dwarfcmp -T: testing pure-memory writer data (DwarfTasks 2.5)
-> successful self-test, mass-test waiting for extraction
This week:
Expected time on elfutils this week: 60%
* post brain dump wrt 0.138 libelf regression
** need test cases in tree
** need roland/pending fix reviewed
* 0.139 release
* figure out Fedora translator setup (RHBZ#479491)
* dwarflint review
* mass-test on dwarfcmp -T, dwarflint
* non-ref attr value interfaces
15 years, 4 months