Dealing with static code analysis in Fedora

Wed Dec 12 19:15:37 UTC 2012

On Wed, 2012-12-12 at 01:00 -0200, Paulo César Pereira de Andrade wrote:

(Thanks; various replies inline below)

> 2012/12/11 David Malcolm <dmalcolm at redhat.com>:
> > A while back I ran my static checker on all of the Python extension
> > modules in Fedora 17:
> >   http://fedoraproject.org/wiki/Features/StaticAnalysisOfPythonRefcounts
> >
> > I wrote various scripts to build the packages in a mock environment that
> > injects my checker into gcc, then wrote various scripts to triage the
> > results.  I then filed bugs by hand for the most important results,
> > writing some more scripts along the way to make the process easier.
> >
> > This led to some valuable bug fixes, but the mechanism for running the
> > analysis was very ad hoc and doesn't scale.
> 
>   I think it could be useful at least as a generic tool where one would
> just do something like:
> make CC=gcc-with-python-plugin
> like some time ago one could run
> make CC=cgcc
> to see what sparse would tell. Or maybe think of it as a tool like
> rpmlint.
That's what I first tried, but there are plenty of packages that don't
use "make".  I then tried setting the rpm build flags to add my plugin
to gcc, but I ran into enough packages that didn't respect them that
it's simplest to hack up /usr/bin/gcc within the built chroot to
automatically add the checker.  (This means that e.g. all the little
programs that "configure" runs get the checker run on them too, but that
isn't a major issue).

> > In particular, we don't yet have an automated way of rerunning the
> > tests, whilst using the old results as a baseline.  For example it would
> > be most useful if only new problems could be reported, and if the system
> > (whatever it is) remembered when a report has been marked as a true bug
> > or as a false positive.  Similarly, there's no automated way of saying
> > "this particular test is bogus; ignore it for now".
> 
>   Something like valgrind's .supp files?
Yes, though I was thinking of a web ui backed by a web UI to make it
trivial to flag something.  I don't know if e.g. a collection of files
backed by git is going to be as flexible...  Not sure.

> > I'm wondering if there's a Free Software system for doing this kind of
> > thing, and if not, I'm thinking of building it.
> >
> > What I have in mind is a web app backed by a database (perhaps
> > "checker.fedoraproject.org" ?)
> 
>   Remembers me of http://upstream-tracker.org/
Thanks for the link, I hadn't seen it.  Looks nice, and has some good UI
ideas, but it's not quite what I had in mind - it seems to have a fixed
collection of tests, each of which is very different from the other,
whereas I want a dynamic collection of tests, all within the pattern of
"paths through source code", with tracking of the quality of those
tests.  Any static analysis tool will have false positives, and we need
a mechanism in place to cope with that, or we'll just drown in the
noise.

> > We'd be able to run all of the code in Fedora through static analysis
> > tools, and slurp the results into the database: primarily my
> > "cpychecker" work, but we could also run the clang analyzer etc.  I've
> > also been working on another as-yet-unreleased static analysis tool for
> > which I'd want a db for the results.  What I have working is a way to
> > inject an analysis payload into gcc within a mock build, which dumps
> > JSON report files into the chroot without disturbing the "real" build.
> > The idea is then to gather up the JSON files and insert the report data
> > into the db, tagging it with version information.
> >
> > There are two dimensions to the version information:
> >  (A) the version of the software under analysis
> >          (name-version-release.arch)
> >  (B) the version of the tool doing the analysis
> >
> > We could use (B) within the system to handle the release cycle of a
> > static analysis tool.  Initially, any such analysis tools would be
> > regarded as "experimental", and package maintainers could happily ignore
> > the results of such a tool.  The maintainer of an analysis tool could
> > work on bug fixes and heuristics to get the signal:noise ratio of the
> > tool up to an acceptable level, and then the status of the analysis tool
> > could be upgraded to an "alpha" level or beyond.
> >
> > Functional Requirements:
> >   * a collection of "reports" (not bugs):
> >     * interprocedural control flow, potentially across multiple source
> >       files (potentially with annotations, such as value of variables,
> >       call stack?)
> >       * syntax highlighting
> >       * capturing of all relevant source (potentially with headers as
> >         well?)
> >       * visualization of control flow so that you can see the path
> >         through the code that leads to the error
> >     * support for my cpychecker analysis
> >     * support for an as-yet-unreleased interprocedural static analysis
> >       tool I've been working on
> >     * support for reports from the clang static analyzer
> >     * ability to mark a report as:
> >       * a true bug (and a way to act on it, e.g. escalate to bugzilla or
> >         to the relevant upstream tracker)
> >       * a false positive (and a way for the analysis maintainer to act
> >         on it)
> >       * other bug associations with a report? (e.g. if the wording from
> >         the tool's message could be improved)
> >       * ability to have a "conversation" about a report within the UI as
> >         a series of comments (similar to bugzilla).
> >     * automated report matching between successive runs, so that the
> >       markings can be inherited
> >     * scriptable triage, so that we can write scripts that mark all
> >       reports matching a certain pattern e.g. as being bogus, as being
> >       security sensitive, etc
> >     * potentially: debug data (from the analysis tool) associated with a
> >       report, so that the maintainers of the tool can analyze a false
> >       positive
> >     * ability to store crash results where some code broke a static
> >       analysis tool, so that the tool can be fixed
> >   * association between reports and builds
> >   * association between builds and source packages
> >   * association between packages and people, so that you can see what
> >     reports are associated with you (perhaps via the pkgdb?)
> >   * prioritization of reports to be generated by the tool
> >   * association between reports and tools (and tool versions)
> >   * "quality marking" of tool versions, so that we can ignore "alpha"
> >     versions of tools and handle phasing in of a new static analysis
> >     tool without spamming everyone
> >   * ability to view the signal:noise ratio of a version of a tool
> >
> > Nonfunctional requirements:
> >   * Free Software
> >   * sanely deployable within Fedora infrastructure
> >   * sane code, since we're likely to want to extend it (fwiw I'd be most
> >     comfortable with a Python implementation).
> >   * able to scale to running all of Fedora through multiple tools
> >     repeatedly
> >   * many simultaneous users
> >   * will want an authentication system so that we can associate comments
> >     with users.  Eventually we may want a way of embargoing
> >     security-sensitive bugs found by the tool so that they're only
> >     visible by a trusted cabal.
> >   * authentication system to support FAS, but not require it, in case
> >     other people want to deploy such a tool.  Maybe OpenID?
> >
> > Implementation ideas:
> >   * as well as a relational database for the usual things, perhaps a
> > lookaside of source files stored gzipped, with content-addressed storage
> > e.g. "0fcb0d45a6353e150e26f1fa54d11d7be86726b6" stored gzipped as:
> >     objects/0f/cb0d45a6353e150e26f1fa54d11d7be86726b6
> > (yes, this looks a lot like git)
> >
> > Thoughts?  Does such a thing already exist?
> 
>   I am sure anything that can help in detecting runtime failures is
> welcome.
> 
> > It might be fun to hack on this at the next FUDcon.
> 
> For anybody interest, the most relevant results after searching a bit :-)
> 
> http://samate.nist.gov/index.php/Source_Code_Security_Analyzers.html
> http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis
> http://developers.slashdot.org/story/08/05/19/1510245/do-static-source-code-analysis-tools-really-work

Thanks for the lists.

Dave