Abrt (was Re: Most buggy packages)

Wed Feb 20 01:09:22 UTC 2013

On Tue, Feb 19, 2013 at 10:10:38PM +0100, Jiri Moskovcak wrote:

 > >>So if you want to hack this into a tool for use on kernel bugs, go for
 > >>it.
 > >...and please integrate with abrt! Let's have it all working together :)
 > 
 > - I am all for it, the abrt server is exactly the place where these
 > kind of things should be

What I have in mind is the cases where some human interaction is still necessary.

Adding heuristics on the server side for certain cases would help us, but
there are still a bunch of common operations we do that require a human
to make a judgment call before we make a change.

But, pursuing the server-side solution, here are some things that we'd find useful
that *could* be automated.

- Unlike most packages, we have individual maintainers for subcomponents
  (this is where our bugzilla implementation sucks, because we can't file
   by subcomponent).  So when we get bugs against certain drivers,
   or filesystems etc, we reassign to those developers who signed up to work
   on those.
  This probably counts for a significant percentage of our interactions with
  bugzilla.  I'm not sure what kind of heuristics you'd need to add to automate
  assigning to the right person.  Maybe you can pull the symbol from the IP,
  translate that to a filename, and have a database of wildcards so you can do
  things like..
   drivers/net/wireless/* -> linville@
   fs/btrfs/* -> zab@
   etc..

  Because it's not always easy from a report to tell what component is responsible,
  sometimes parsing the Summary is necessary, which is the sort of thing
  I meant by 'needs human to make a judgment call'.  But if we can automate
  the majority of the cases, it would still help a lot.

- Similar thing as previous, but all graphics bugs get reassigned by us
  immediately to xorg-x11-drv-* because those guys deal with both the X and
  kernel modesetting/dri code. So any trace with 'i915', 'radeon' etc
  can probably be auto-reassigned.

- When we get 'general protection fault' bugs, it's useful to run the Code:
  line of the oops through scripts/decodecode (from a kernel tree).
  This disassembly will allow us to see what instruction caused the GPF.
  (Note: *just* general protection faults, not every trace.  Also, we
   only really need the faulting instruction, not the whole disassembly).
  Bonus points if it can suck the relevant data out of the debuginfo rpms
  to map the code line to C code.

- Extrapolating from the above, when we see certain register values in those
  bugs, they usually hint at the cause of a bug. For example 0x6b6b6b6b is
  SLAB_POISON, and usually means we tried to use memory after it was freed.
  Adding a comment to point this out speeds up analysis.

- Getting trickier..  We see a *lot* of flaky hardware, where we tried to
  dereference an address which had a single bit flip in memory.
  If the server side had some smarts so it knew what 'good' addresses looked like,
  it could detect the single bit-flip case, and guide the user to run
  memtest86 will save us a round-trip.

That's all I have right now, but there are probably a bunch of other
common operations we do which could be automated.

	Dave