Static Analysis: results of FUDcon Lawrence hackfest

David Malcolm dmalcolm at redhat.com
Thu Jan 24 16:44:10 UTC 2013


Michael Hrivnak and I spent some time at FUDcon Lawrence looking at
static code analysis.

We hacked on the proposed common format for analysis tools (aka
"firehose").

We now have parsers (and test suites) for coercing the following into a
common format:
 * gcc warnings
 * "cppcheck" warnings (specifically, v2 of its XML output format)
 * LLVM's clang-static-analyzer (specifically, its .plist output format)
and I'm working on adding support to my cpychecker tool (for Python
refcounting bugs etc).

The code for this can be seen at:

  https://github.com/fedora-static-analysis/firehose

Note that the details of the file format and API aren't set in stone yet
(and each time we add a new analyzer we find we have to tweak the format
a little...).

Since FUDcon I've hacked on injecting static analysis into mock.  This
is now working, so that you can run (say):

$ ./mock-with-analysis \
    fedora-17-x86_64 python-ethtool-0.7-4.fc19.src.rpm

and it will do a mock rebuild, hacking up /usr/bin/gcc in the chroot so
that it runs the following on each .c file that gcc is invoked on:
  * cppcheck
  * clang-static-analyzer
parsing the results into the firehose XML format, dropping them all into
one directory, along with all relevant source files.  It has some smarts
for handling paths so e.g. recursive make doesn't confuse it (I hope).

It also gathers all gcc warnings, in the same format.

It then postprocesses the results at the end of the build (adding the
srpm NVR as metadata, and gathering source files that were mentioned in
warnings).

The code for this is at:

  https://github.com/fedora-static-analysis/mock-with-analysis

(it's all a work-in-progress right now; expect things to change).

You can see an example of the raw results here:
http://fedorapeople.org/~dmalcolm/static-analysis/2013-01-24/python-ethtool-0.7-4.fc19.src.rpm/

It's a regular mock build so it should look familiar: see e.g. the
build.log etc  (I omitted the built rpms for the sake of disk space).
You'll see that it adds a new "static-analysis" directory to hold the
results:

/var/lib/mock/CONFIG/result/state.log
                            root.log
                            build.log
                            *.rpm (the built rpms)
                            static-analysis/ <=== this and below are new
                                           /reports/*.xml
                                           /sources/

Each warning from a tool goes into an individual XML file, e.g. this
warning from cppcheck about an off-by-one in a buffer size:
http://fedorapeople.org/~dmalcolm/static-analysis/2013-01-24/python-ethtool-0.7-4.fc19.src.rpm/static-analysis/reports/094b472e857a39e96a199ae4b3c3aa1c41fbfccf.xml
and this one, which is a gcc warning:
http://fedorapeople.org/~dmalcolm/static-analysis/2013-01-24/python-ethtool-0.7-4.fc19.src.rpm/static-analysis/reports/34bb83db976ad68132cfcb94bd61e36550239eb8.xml

All source files that are referenced in a warning are scraped from the
chroot and go in the sources dir, named by SHA-1 sum - in this case
there's just this one:
http://fedorapeople.org/~dmalcolm/static-analysis/2013-01-24/python-ethtool-0.7-4.fc19.src.rpm/static-analysis/sources/97e2c2ff2b2d12528c295bb41c029a8658e6931e

If you look in the build.log, you'll see extra lines from "FAKE-GCC"
where it invokes the analysis tools every time /usr/bin/gcc is invoked.
[I think there are some bugs where it failed to parse some of the gcc
warnings, but hey, this is the first iteration of this - one thing to
add will be to have explicit tracking for when an analysis fails]

The plan is that the interchange format can be uploaded into a web
UI/database, so that we can:
* scan the entire distro
* compare warnings: e.g. what new warnings appear in a package rebuild?
* have a consistent interface for marking warnings as false positives
* come up with some subset of the warnings that we care about
* etc

My own plans are to repeat the "run all of the Python extension code in
Fedora" through cpychecker that I attempted for Fedora 17 [1], but this
time capturing it in an interchange format, rather than as blobs of HTML
on my fedorapeople.org space, so that automated analysis is possible.

Hopefully this looks valuable to Fedora.

Anyone interested in helping with this?   There's plenty of scope for
getting involved:
* building the web UI for dealing with the results (any Python web
developers out there?) [2]
* packaging more static analyzers in Fedora (e.g. has anyone looked at
Frama-C ?)
* writing parsers for more static analyzers
* adding invocation hooks for more static analyzers to the
"mock-with-analysis" tool
* making it more robust (e.g. adding timeouts in case a tool goes into
an infinite loop; recording analysis failures; etc)

See also:
https://fedoraproject.org/wiki/StaticAnalysis
(also I'm on IRC on #fedora-devel and elsewhere as "dmalcolm").

Cheers
Dave
[1]
http://fedoraproject.org/wiki/Features/StaticAnalysisOfPythonRefcounts
[2] Michael started this as:
https://github.com/fedora-static-analysis/firehoseui but there's almost
nothing there yet.



More information about the devel mailing list