ANNOUNCE: rpmgrok - a web-based tool for tracking a full distribution of RPMs

David Malcolm dmalcolm at redhat.com
Wed Aug 6 19:35:45 UTC 2008


I've been working on a new web-based tool for analyzing Fedora: rpmgrok

It digests built RPMs, analysing the metadata and payload, and stores
the results in a database.  There's a web UI for viewing the data, an
XML-RPC interface for querying it, and a command-line tool for using the
XML-RPC interface.

I've got a prototype running on:
http://publictest7.fedoraproject.org/rpmgrok

More info (e.g. source code) can be seen at
https://fedorahosted.org/rpmgrok

The idea is to provide a new way for Fedora developers, testers, and
other enthusiasts to track various things across the entire
distribution, without having to have a full tree installed.  It's
probably usable by other Linux distributions.

rpmgrok is Free/Open Source software (licensed under the LGPLv2.1)

== What does it track? ==
  - all symbols in binaries/libraries, and the dependencies between
them, so that you can see e.g. exactly what calls a particular function.
This can also be used to locate instances of static linkage.  See e.g.
http://publictest7.fedoraproject.org/rpmgrok/elffile/258085 (details
of /lib/libexpat.so.1.5.2 from a built RPM)
  - manifests of all RPMs, so that you can browse the files in packages
via a web UI.  See http://publictest7.fedoraproject.org/rpmgrok/files
The file view is only interesting at the moment for ELF files (binaries)
and for .desktop files.
  - all shared objects names, and the dependencies between them.  See
e.g.
    - http://publictest7.fedoraproject.org/rpmgrok/sonames (browsable
view of all sonames in the distro)
    - http://publictest7.fedoraproject.org/rpmgrok/elffile/739167
(details of /usr/lib/libxml2.so.2.6.32 from within a built libxml2 rpm)
    - Everything implementing or linking against libpcre.so.0 down to
the level of individual binaries:
http://publictest7.fedoraproject.org/rpmgrok/soname/libpcre.so.0
  - results of rpmlint of all rpms.  See
http://publictest7.fedoraproject.org/rpmgrok/rpmlint_messages for a UI
to browse by error message, and e.g.
http://publictest7.fedoraproject.org/rpmgrok/rpmlint/dangerous-command-in-%25post for an example of all error messages of a particular kind.  It may be worth fixing some rpmlint errors (though others look like false positives, and others are probably not worth it)
  - all .desktop files and their fields so that you can e.g. find
applications that can handle PDF files.  See e.g.
http://publictest7.fedoraproject.org/rpmgrok/mimetype/application/pdf
for a view of all desktop files that can handle "application/pdf", and
e.g. http://publictest7.fedoraproject.org/rpmgrok/desktopfile/253580
showing a specific desktop file
  - SLOCcount stats (see http://www.dwheeler.com/sloccount/ ) for
prepped source trees (e.g "what % of Fedora is in C/C++/Python?" etc).
Don't have the data prepped yet.
  - any other kind of thing we want to add (provided there's a sane way
to gather it in a script and slurp it into the database, of course...)
    - sizes of packages; why is package foo so big?
    - report on all fonts in the distro, and what packages provide them
    etc

Note that due to my poor css there are lots of links that don't show up
as such in the various table views.  You may need to explore with the
mouse to find all of the cross-referencing that the web UI has.

== What's it currently showing? ==
I queued up an analysis of all of rawhide as of 2008-07-25 on i386; a
little over 10000 built packages.  It took about a week to process, and
about 200 of these jobs failed for one reason or another.  See
https://fedorahosted.org/rpmgrok/ticket/9 for more info.

So the db is currently just showing a snapshot in time of rawhide two
weeks ago, on one architecture (and missing 2% of the packages due to
errors).  

Ultimately I want to build things up so that we can show time-based
trend reports e.g. the size of a minimal install over time (or
whatever).

== Help Needed! ==
Hopefully this looks of interest to people.

I need help with coding, with sysadmin work, with making the UI better,
and with things I probably haven't thought of yet etc.  I hope this can
be a useful tool for Fedora.

If you're interested in hacking on rpmgrok, get in touch.  The README
file is hopefully of interest; see
http://git.fedorahosted.org/git/rpmgrok.git?p=rpmgrok.git;a=blob_plain;f=README.txt;hb=HEAD README.txt

It's implemented using TurboGears and SQLAlchemy (specifically,
sqlalchemy 0.4, since it uses polymorphic inheritance features from that
version).

It also has a somewhat general-purpose task scheduler, used to control a
pool of worker hosts that do the actual analysis.  It ought to be
pluggable to do other types of task.

== Source Code ==
Git URLS are:
  git://git.fedorahosted.org/rpmgrok.git
  ssh://git.fedorahosted.org/git/rpmgrok.git
  http://git.fedorahosted.org/git/rpmgrok.git
(you need to be in the gitrpmgrok of the Fedora Accounts System to have
git push privileges; talk to me if you want to get involved)

== Related work ==
Inspiration includes
  - the OpenGrok project (see
http://opensolaris.org/os/project/opengrok/ though that appears to focus
on source trees, whereas rpmgrok focuses on built packages)
  - the Debian project's Lintian tool (see http://lintian.debian.org )

Enjoy!
Dave




More information about the devel mailing list