Static Analysis: some UI ideas

Mon Feb 4 21:37:45 UTC 2013

On Mon, 2013-02-04 at 22:13 +0100, Kamil Dudka wrote:
> On Monday, February 04, 2013 15:04:36 David Malcolm wrote:
> > I've been experimenting with some UI ideas for reporting static analysis
> > results: I've linked to two different UI reports below.
> > 
> > My hope is that we'll have a server in the Fedora infrastructure for
> > browsing results, marking things as false positives etc.
> > 
> > However, for the purposes of simplicity during experimentation I'm
> > simply building static HTML reports.
> > 
> > My #1 requirement when I'm viewing static analysis results is that I
> > want to *see the code* with the report, so I've attempted to simply show
> > the code with warnings shown inline.
> 
> Does it mean you need to keep the unpacked source files for all scanned 
> packages?  Then you will easily run out of disk space after scanning a few 
> versions of libreoffice.

Content-addressed storage: they're named by SHA-1 sum of their contents,
similar to how git does it, so if the bulk of the files don't change,
they have the same SHA-1 sum and are only stored once.  See e.g.:
http://fedorapeople.org/~dmalcolm/static-analysis/2013-01-30/python-ethtool-0.7-4.fc19.src.rpm/static-analysis/sources/
I probably should gzip them as well.

Currently it's capturing all C files that have GCC invoked on them, or
are mentioned in a warning (e.g. a .h file with an inline function with
a bug).  I could tweak things so it only captures files that are
mentioned in a warning.

> > Note also that when we have a server we can do all kinds of
> > auto-filtering behaviors so that e.g. package maintainers only see
> > warnings from tests that have decent signal:noise ratio (perhaps with
> > other warnings greyed out, or similar).
> 
> It would be cool if the auto-filtering techniques were implemented in 
> standalone utilities operating on text files so that we have separated 
> algorithms from presentation of the results.  It is easy to use a filter-like 
> utility on a server, but painful to use a server for processing local text 
> files.

I guess the issue is: where do you store the knowledge about good vs bad
warnings?   My plan was to store it server-side.  But we could generate
summaries and have them available client-side.  For example, if, say
cppcheck's "useClosedFile" test has generated 100 issues of which 5 have
received human attention: 1 has been marked as a true positive, and 4
has been marked as false positives.  We could then say ("cppcheck",
"useClosedFile") has a signal:noise ratio of 1:4.   We could then
generate a summary of these (tool, testID) ratios for use by clients,
which could then a user-configurable signal:noise threshold, so you can
say: "only show me results from tests that achieve 1:2 or better".

> > Results of an srpm build
> > ========================
> > The first experimental report can be seen here:
> > http://fedorapeople.org/~dmalcolm/static-analysis/2013-02-01/policycoreutils
> > -2.1.13-27.2.fc17.src.rpm-001.html
> > 
> > It shows warnings from 4 different static analyzers when rebuilding a
> > particular srpm (policycoreutils-2.1.13-27.2.fc17).  There's a summary
> > table at the top of the report showing for each source files in the
> > build which analyzers found reports (those that found any are
> > highlighted in red).  Each row has a <a> linking you to a report on each
> > source file.  Those source files that have issues have a table showing
> > the issues, with links to them.  The issue are shown inline within the
> > syntax-colored source files.
> > 
> > Ideally there'd by support for using "n" and "p" to move to
> > next/previous error (with a little javascript), but for now I've been
> > using "back" in the browser to navigate through the tables.
> > 
> > An example of an error shown inline:
> > http://fedorapeople.org/~dmalcolm/static-analysis/2013-02-01/policycoreutils
> > -2.1.13-27.2.fc17.src.rpm-001.html#file-868b5c03918269eaabebfedc41eaf32e3903
> > 57be-line-791 shows a true error in seunshare.c found by cppcheck ("foo =
> > realloc(foo, , )"  is always a mistake, since if realloc fails you get
> > NULL back, but still have responsibility for freeing the old foo).
> 
> The limitation of javascript-based UIs is that they are read-only.  Some 
> developers prefer to go through the defects using their own environment 
> (eclipse, vim, emacs, ...) rather than a web browser so that they can fix
> them immediately.  We should support both approaches I guess.

Both approaches.  What we could do is provide a tool ("fedpkg
get-errors" ?)  that captures the errors in the same output format as
gcc.  That way if you run it from say gcc, the *compilation* buffer has
everything in the right format, and emacs' goto-next-error stuff works.

> 
> > Comparison report
> > =================
> > The second experimental report can be seen here:
> > http://fedorapeople.org/~dmalcolm/static-analysis/2013-02-04/comparison-of-p
> > ython-ethtool-builds.html
> > 
> > It shows a comparison of the results of two different builds of a
> > package (python-ethtool), again running multiple analyzers.
> > (specifically, a comparison between 0.7 and an snapshot of upstream
> > git).
> > 
> > It's similar to the first report, but instead of showing one file at a
> > time, it shows a side-by-side diff of old vs new file.
> 
> Does it assume that you have 1:1 file mapping between old and new versions of 
> the package?  What will happen if the source files are renamed, moved, merged, 
> split, etc.?

Currently it's matching on 4 things:
* by name of test tool (e.g. "clang-analyzer")
* by filename of C file within the tarball (so e.g.
'/builddir/build/BUILD/python-ethtool-0.7/python-ethtool/etherinfo.c'
becomes 'python-ethtool/etherinfo.c', allowing different versions to be
compared)
* function name (or None)
* text of message

See "make-comparative-report.py:ComparativeIssues" in 
https://github.com/fedora-static-analysis/mock-with-analysis/blob/master/reports/make-comparative-report.py

Thanks for the feedback
Dave