Static Analysis: results of FUDcon Lawrence hackfest
dmalcolm at redhat.com
Thu Jan 24 16:44:10 UTC 2013
Michael Hrivnak and I spent some time at FUDcon Lawrence looking at
static code analysis.
We hacked on the proposed common format for analysis tools (aka
We now have parsers (and test suites) for coercing the following into a
* gcc warnings
* "cppcheck" warnings (specifically, v2 of its XML output format)
* LLVM's clang-static-analyzer (specifically, its .plist output format)
and I'm working on adding support to my cpychecker tool (for Python
refcounting bugs etc).
The code for this can be seen at:
Note that the details of the file format and API aren't set in stone yet
(and each time we add a new analyzer we find we have to tweak the format
Since FUDcon I've hacked on injecting static analysis into mock. This
is now working, so that you can run (say):
$ ./mock-with-analysis \
and it will do a mock rebuild, hacking up /usr/bin/gcc in the chroot so
that it runs the following on each .c file that gcc is invoked on:
parsing the results into the firehose XML format, dropping them all into
one directory, along with all relevant source files. It has some smarts
for handling paths so e.g. recursive make doesn't confuse it (I hope).
It also gathers all gcc warnings, in the same format.
It then postprocesses the results at the end of the build (adding the
srpm NVR as metadata, and gathering source files that were mentioned in
The code for this is at:
(it's all a work-in-progress right now; expect things to change).
You can see an example of the raw results here:
It's a regular mock build so it should look familiar: see e.g. the
build.log etc (I omitted the built rpms for the sake of disk space).
You'll see that it adds a new "static-analysis" directory to hold the
*.rpm (the built rpms)
static-analysis/ <=== this and below are new
Each warning from a tool goes into an individual XML file, e.g. this
warning from cppcheck about an off-by-one in a buffer size:
and this one, which is a gcc warning:
All source files that are referenced in a warning are scraped from the
chroot and go in the sources dir, named by SHA-1 sum - in this case
there's just this one:
If you look in the build.log, you'll see extra lines from "FAKE-GCC"
where it invokes the analysis tools every time /usr/bin/gcc is invoked.
[I think there are some bugs where it failed to parse some of the gcc
warnings, but hey, this is the first iteration of this - one thing to
add will be to have explicit tracking for when an analysis fails]
The plan is that the interchange format can be uploaded into a web
UI/database, so that we can:
* scan the entire distro
* compare warnings: e.g. what new warnings appear in a package rebuild?
* have a consistent interface for marking warnings as false positives
* come up with some subset of the warnings that we care about
My own plans are to repeat the "run all of the Python extension code in
Fedora" through cpychecker that I attempted for Fedora 17 , but this
time capturing it in an interchange format, rather than as blobs of HTML
on my fedorapeople.org space, so that automated analysis is possible.
Hopefully this looks valuable to Fedora.
Anyone interested in helping with this? There's plenty of scope for
* building the web UI for dealing with the results (any Python web
developers out there?) 
* packaging more static analyzers in Fedora (e.g. has anyone looked at
* writing parsers for more static analyzers
* adding invocation hooks for more static analyzers to the
* making it more robust (e.g. adding timeouts in case a tool goes into
an infinite loop; recording analysis failures; etc)
(also I'm on IRC on #fedora-devel and elsewhere as "dmalcolm").
 Michael started this as:
https://github.com/fedora-static-analysis/firehoseui but there's almost
nothing there yet.
More information about the devel