Dealing with static code analysis in Fedora
Paulo César Pereira de Andrade
paulo.cesar.pereira.de.andrade at gmail.com
Wed Dec 12 03:00:36 UTC 2012
2012/12/11 David Malcolm <dmalcolm at redhat.com>:
> A while back I ran my static checker on all of the Python extension
> modules in Fedora 17:
> http://fedoraproject.org/wiki/Features/StaticAnalysisOfPythonRefcounts
>
> I wrote various scripts to build the packages in a mock environment that
> injects my checker into gcc, then wrote various scripts to triage the
> results. I then filed bugs by hand for the most important results,
> writing some more scripts along the way to make the process easier.
>
> This led to some valuable bug fixes, but the mechanism for running the
> analysis was very ad hoc and doesn't scale.
I think it could be useful at least as a generic tool where one would
just do something like:
make CC=gcc-with-python-plugin
like some time ago one could run
make CC=cgcc
to see what sparse would tell. Or maybe think of it as a tool like
rpmlint.
> In particular, we don't yet have an automated way of rerunning the
> tests, whilst using the old results as a baseline. For example it would
> be most useful if only new problems could be reported, and if the system
> (whatever it is) remembered when a report has been marked as a true bug
> or as a false positive. Similarly, there's no automated way of saying
> "this particular test is bogus; ignore it for now".
Something like valgrind's .supp files?
> I'm wondering if there's a Free Software system for doing this kind of
> thing, and if not, I'm thinking of building it.
>
> What I have in mind is a web app backed by a database (perhaps
> "checker.fedoraproject.org" ?)
Remembers me of http://upstream-tracker.org/
> We'd be able to run all of the code in Fedora through static analysis
> tools, and slurp the results into the database: primarily my
> "cpychecker" work, but we could also run the clang analyzer etc. I've
> also been working on another as-yet-unreleased static analysis tool for
> which I'd want a db for the results. What I have working is a way to
> inject an analysis payload into gcc within a mock build, which dumps
> JSON report files into the chroot without disturbing the "real" build.
> The idea is then to gather up the JSON files and insert the report data
> into the db, tagging it with version information.
>
> There are two dimensions to the version information:
> (A) the version of the software under analysis
> (name-version-release.arch)
> (B) the version of the tool doing the analysis
>
> We could use (B) within the system to handle the release cycle of a
> static analysis tool. Initially, any such analysis tools would be
> regarded as "experimental", and package maintainers could happily ignore
> the results of such a tool. The maintainer of an analysis tool could
> work on bug fixes and heuristics to get the signal:noise ratio of the
> tool up to an acceptable level, and then the status of the analysis tool
> could be upgraded to an "alpha" level or beyond.
>
> Functional Requirements:
> * a collection of "reports" (not bugs):
> * interprocedural control flow, potentially across multiple source
> files (potentially with annotations, such as value of variables,
> call stack?)
> * syntax highlighting
> * capturing of all relevant source (potentially with headers as
> well?)
> * visualization of control flow so that you can see the path
> through the code that leads to the error
> * support for my cpychecker analysis
> * support for an as-yet-unreleased interprocedural static analysis
> tool I've been working on
> * support for reports from the clang static analyzer
> * ability to mark a report as:
> * a true bug (and a way to act on it, e.g. escalate to bugzilla or
> to the relevant upstream tracker)
> * a false positive (and a way for the analysis maintainer to act
> on it)
> * other bug associations with a report? (e.g. if the wording from
> the tool's message could be improved)
> * ability to have a "conversation" about a report within the UI as
> a series of comments (similar to bugzilla).
> * automated report matching between successive runs, so that the
> markings can be inherited
> * scriptable triage, so that we can write scripts that mark all
> reports matching a certain pattern e.g. as being bogus, as being
> security sensitive, etc
> * potentially: debug data (from the analysis tool) associated with a
> report, so that the maintainers of the tool can analyze a false
> positive
> * ability to store crash results where some code broke a static
> analysis tool, so that the tool can be fixed
> * association between reports and builds
> * association between builds and source packages
> * association between packages and people, so that you can see what
> reports are associated with you (perhaps via the pkgdb?)
> * prioritization of reports to be generated by the tool
> * association between reports and tools (and tool versions)
> * "quality marking" of tool versions, so that we can ignore "alpha"
> versions of tools and handle phasing in of a new static analysis
> tool without spamming everyone
> * ability to view the signal:noise ratio of a version of a tool
>
> Nonfunctional requirements:
> * Free Software
> * sanely deployable within Fedora infrastructure
> * sane code, since we're likely to want to extend it (fwiw I'd be most
> comfortable with a Python implementation).
> * able to scale to running all of Fedora through multiple tools
> repeatedly
> * many simultaneous users
> * will want an authentication system so that we can associate comments
> with users. Eventually we may want a way of embargoing
> security-sensitive bugs found by the tool so that they're only
> visible by a trusted cabal.
> * authentication system to support FAS, but not require it, in case
> other people want to deploy such a tool. Maybe OpenID?
>
> Implementation ideas:
> * as well as a relational database for the usual things, perhaps a
> lookaside of source files stored gzipped, with content-addressed storage
> e.g. "0fcb0d45a6353e150e26f1fa54d11d7be86726b6" stored gzipped as:
> objects/0f/cb0d45a6353e150e26f1fa54d11d7be86726b6
> (yes, this looks a lot like git)
>
> Thoughts? Does such a thing already exist?
I am sure anything that can help in detecting runtime failures is
welcome.
> It might be fun to hack on this at the next FUDcon.
For anybody interest, the most relevant results after searching a bit :-)
http://samate.nist.gov/index.php/Source_Code_Security_Analyzers.html
http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis
http://developers.slashdot.org/story/08/05/19/1510245/do-static-source-code-analysis-tools-really-work
> Dave
Paulo
More information about the devel
mailing list