Dealing with static code analysis in Fedora

Paulo César Pereira de Andrade paulo.cesar.pereira.de.andrade at gmail.com
Wed Dec 12 03:00:36 UTC 2012


2012/12/11 David Malcolm <dmalcolm at redhat.com>:
> A while back I ran my static checker on all of the Python extension
> modules in Fedora 17:
>   http://fedoraproject.org/wiki/Features/StaticAnalysisOfPythonRefcounts
>
> I wrote various scripts to build the packages in a mock environment that
> injects my checker into gcc, then wrote various scripts to triage the
> results.  I then filed bugs by hand for the most important results,
> writing some more scripts along the way to make the process easier.
>
> This led to some valuable bug fixes, but the mechanism for running the
> analysis was very ad hoc and doesn't scale.

  I think it could be useful at least as a generic tool where one would
just do something like:
make CC=gcc-with-python-plugin
like some time ago one could run
make CC=cgcc
to see what sparse would tell. Or maybe think of it as a tool like
rpmlint.

> In particular, we don't yet have an automated way of rerunning the
> tests, whilst using the old results as a baseline.  For example it would
> be most useful if only new problems could be reported, and if the system
> (whatever it is) remembered when a report has been marked as a true bug
> or as a false positive.  Similarly, there's no automated way of saying
> "this particular test is bogus; ignore it for now".

  Something like valgrind's .supp files?

> I'm wondering if there's a Free Software system for doing this kind of
> thing, and if not, I'm thinking of building it.
>
> What I have in mind is a web app backed by a database (perhaps
> "checker.fedoraproject.org" ?)

  Remembers me of http://upstream-tracker.org/

> We'd be able to run all of the code in Fedora through static analysis
> tools, and slurp the results into the database: primarily my
> "cpychecker" work, but we could also run the clang analyzer etc.  I've
> also been working on another as-yet-unreleased static analysis tool for
> which I'd want a db for the results.  What I have working is a way to
> inject an analysis payload into gcc within a mock build, which dumps
> JSON report files into the chroot without disturbing the "real" build.
> The idea is then to gather up the JSON files and insert the report data
> into the db, tagging it with version information.
>
> There are two dimensions to the version information:
>  (A) the version of the software under analysis
>          (name-version-release.arch)
>  (B) the version of the tool doing the analysis
>
> We could use (B) within the system to handle the release cycle of a
> static analysis tool.  Initially, any such analysis tools would be
> regarded as "experimental", and package maintainers could happily ignore
> the results of such a tool.  The maintainer of an analysis tool could
> work on bug fixes and heuristics to get the signal:noise ratio of the
> tool up to an acceptable level, and then the status of the analysis tool
> could be upgraded to an "alpha" level or beyond.
>
> Functional Requirements:
>   * a collection of "reports" (not bugs):
>     * interprocedural control flow, potentially across multiple source
>       files (potentially with annotations, such as value of variables,
>       call stack?)
>       * syntax highlighting
>       * capturing of all relevant source (potentially with headers as
>         well?)
>       * visualization of control flow so that you can see the path
>         through the code that leads to the error
>     * support for my cpychecker analysis
>     * support for an as-yet-unreleased interprocedural static analysis
>       tool I've been working on
>     * support for reports from the clang static analyzer
>     * ability to mark a report as:
>       * a true bug (and a way to act on it, e.g. escalate to bugzilla or
>         to the relevant upstream tracker)
>       * a false positive (and a way for the analysis maintainer to act
>         on it)
>       * other bug associations with a report? (e.g. if the wording from
>         the tool's message could be improved)
>       * ability to have a "conversation" about a report within the UI as
>         a series of comments (similar to bugzilla).
>     * automated report matching between successive runs, so that the
>       markings can be inherited
>     * scriptable triage, so that we can write scripts that mark all
>       reports matching a certain pattern e.g. as being bogus, as being
>       security sensitive, etc
>     * potentially: debug data (from the analysis tool) associated with a
>       report, so that the maintainers of the tool can analyze a false
>       positive
>     * ability to store crash results where some code broke a static
>       analysis tool, so that the tool can be fixed
>   * association between reports and builds
>   * association between builds and source packages
>   * association between packages and people, so that you can see what
>     reports are associated with you (perhaps via the pkgdb?)
>   * prioritization of reports to be generated by the tool
>   * association between reports and tools (and tool versions)
>   * "quality marking" of tool versions, so that we can ignore "alpha"
>     versions of tools and handle phasing in of a new static analysis
>     tool without spamming everyone
>   * ability to view the signal:noise ratio of a version of a tool
>
> Nonfunctional requirements:
>   * Free Software
>   * sanely deployable within Fedora infrastructure
>   * sane code, since we're likely to want to extend it (fwiw I'd be most
>     comfortable with a Python implementation).
>   * able to scale to running all of Fedora through multiple tools
>     repeatedly
>   * many simultaneous users
>   * will want an authentication system so that we can associate comments
>     with users.  Eventually we may want a way of embargoing
>     security-sensitive bugs found by the tool so that they're only
>     visible by a trusted cabal.
>   * authentication system to support FAS, but not require it, in case
>     other people want to deploy such a tool.  Maybe OpenID?
>
> Implementation ideas:
>   * as well as a relational database for the usual things, perhaps a
> lookaside of source files stored gzipped, with content-addressed storage
> e.g. "0fcb0d45a6353e150e26f1fa54d11d7be86726b6" stored gzipped as:
>     objects/0f/cb0d45a6353e150e26f1fa54d11d7be86726b6
> (yes, this looks a lot like git)
>
> Thoughts?  Does such a thing already exist?

  I am sure anything that can help in detecting runtime failures is
welcome.

> It might be fun to hack on this at the next FUDcon.

For anybody interest, the most relevant results after searching a bit :-)

http://samate.nist.gov/index.php/Source_Code_Security_Analyzers.html
http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis
http://developers.slashdot.org/story/08/05/19/1510245/do-static-source-code-analysis-tools-really-work

> Dave

Paulo


More information about the devel mailing list