Static Analysis: some UI ideas

Tue Feb 5 16:57:51 UTC 2013

On Tue, 2013-02-05 at 13:02 +0100, Kamil Dudka wrote:
> On Monday 04 February 2013 22:37:45 David Malcolm wrote:
> > Content-addressed storage: they're named by SHA-1 sum of their contents,
> > similar to how git does it, so if the bulk of the files don't change,
> > they have the same SHA-1 sum and are only stored once.  See e.g.:
> > http://fedorapeople.org/~dmalcolm/static-analysis/2013-01-30/python-ethtool
> > -0.7-4.fc19.src.rpm/static-analysis/sources/ I probably should gzip them as
> >  well.
> 
> This can indeed save some space.  I really like the idea.  Maybe using a true 
> git store would give you additional reduction of space requirements thanks to 
> using the delta compression.

Interesting.  I had thought that git purely stored compressed files and
eschewed the idea of deltas, but it does indeed store deltas sometimes:
http://git-scm.com/book/ch9-4.html

My plan had been to store the files either in a big lookaside directory
structure, or directly as (compressed) blobs in a database,
named/indexed by SHA-1 sum.  But that approach might yield significant
savings, provided it can do name-matching.

Then again, premature optimization is the root of all bugs :)

> > Currently it's capturing all C files that have GCC invoked on them, or
> > are mentioned in a warning (e.g. a .h file with an inline function with
> > a bug).  I could tweak things so it only captures files that are
> > mentioned in a warning.
> 
> But then you would be no longer able to provide the context.  If the error 
> trace goes through a function foo() defined in another module of the same 
> package, the user needs to look at its definition to confirm/waive the defect.

I was a little sloppy in my wording when I said "mentioned in a
warning", sorry.  This *does* include the traces, so for the case you
describe it *would* capture the source file containing the definition of
foo(), so that we would be able to render that information in a
presentation of the data.  (though all of my current trace
visualizations only handle one function at a time; see e.g.:
http://gcc-python-plugin.readthedocs.org/en/latest/_images/sample-html-error-report.png
(screenshot of html)
and:
http://fedorapeople.org/~dmalcolm/gcc-python-plugin/2012-03-19/example/example.html
http://fedorapeople.org/~dmalcolm/gcc-python-plugin/2012-03-26/pylibmc/example.html

so we'd need some kind of new visualization for interprocedural paths.

[specifically, the source extraction is implemented by
https://github.com/fedora-static-analysis/mock-with-analysis/blob/master/mock-with-analysis#L172
in extract_results_from_chroot() where I pass a SourceExtractor visitor
over the analysis file; it visits all files mentioned in the analysis
file]

> > I guess the issue is: where do you store the knowledge about good vs bad
> > warnings?   My plan was to store it server-side.  But we could generate
> > summaries and have them available client-side.  For example, if, say
> > cppcheck's "useClosedFile" test has generated 100 issues of which 5 have
> > received human attention: 1 has been marked as a true positive, and 4
> > has been marked as false positives.  We could then say ("cppcheck",
> > "useClosedFile") has a signal:noise ratio of 1:4.   We could then
> > generate a summary of these (tool, testID) ratios for use by clients,
> > which could then a user-configurable signal:noise threshold, so you can
> > say: "only show me results from tests that achieve 1:2 or better".
> 
> I did not realize you mean auto-filtering based on statistics form user's 
> input.  Then maintaining the statistics at the server sounds as a good idea.  
> Being able to export a text file with scores per checker should be just fine 
> for the command-line tools.  We will see if the statistics from user's input 
> could be used as a reliable criterion.  The problem is that some defects
> tend to be classified incorrectly without a deeper analysis of the report
> (and code).

(nods)

For some depressing but useful thoughts on this, see:
  http://www.dwheeler.com/flawfinder/#fool-with-tool

> > > The limitation of javascript-based UIs is that they are read-only.  Some
> > > developers prefer to go through the defects using their own environment
> > > (eclipse, vim, emacs, ...) rather than a web browser so that they can fix
> > > them immediately.  We should support both approaches I guess.
> > 
> > Both approaches.  What we could do is provide a tool ("fedpkg
> > get-errors" ?)  that captures the errors in the same output format as
> > gcc.  That way if you run it from say gcc, the *compilation* buffer has
> > everything in the right format, and emacs' goto-next-error stuff works.
> 
> 'fedpkg foo' is probably overkill at this point.  My concern was rather that 
> we should not so much rely on the web server/browser approach in the first 
> place.  I would like to have most of the equipment working just from terminal 
> without any server or browser.  Any server solution can be then easily built 
> on top of it.
Right, and that's what I've been doing.

> > Currently it's matching on 4 things:
> > * by name of test tool (e.g. "clang-analyzer")
> 
> + the class of defect?  e.g. "useClosedFile" in your example above...
Good point (the "testid" in firehose terminology).  Added in:
https://github.com/fedora-static-analysis/mock-with-analysis/commit/25b12996cbc66477241c94d6f6a7123007743f6c

though this doesn't appear to affect the comparisons with the
python-ethtool test data I'm currently working with.

> > * by filename of C file within the tarball (so e.g.
> > '/builddir/build/BUILD/python-ethtool-0.7/python-ethtool/etherinfo.c'
> > becomes 'python-ethtool/etherinfo.c', allowing different versions to be
> > compared)
> 
> With some part of the path or just base name?

the "full" path within the tarball, so that, say:
'/builddir/build/BUILD/python-ethtool-0.6/python-ethtool/etherinfo.c'
'/builddir/build/BUILD/python-ethtool-0.7/python-ethtool/etherinfo.c'

both are treated as 'python-ethtool/etherinfo.c'

(see get_internal_filename() within:
https://github.com/fedora-static-analysis/mock-with-analysis/blob/master/reports/reports.py#L10 )

> > * function name (or None)
> 
> You want to work with full signatures if you are going to support overloaded 
> functions/methods in C++.

Agreed.  

Currently mock-with-analysis only runs analyzers on C code, for
simplicity - my immediate deadline/goal is that I'm giving a talk at
PyCon next month on static analysis of Python extensions:
  https://us.pycon.org/2013/schedule/presentation/95/
and I ideally want to have my cpychecker data in some kind of database I
can sanely query.  (this is what's motivating my work on this right
now).

So I'm currently largely ignoring C++, but I believe that the approach
I'm using can be extended to support it without excess pain.

> > * text of message
> 
> The messages cannot be checked for exact match in certain cases.  Have a look 
> at the rules we use in csdiff for the text messages:
> 
> http://git.fedorahosted.org/cgit/codescan-diff.git/plain/csfilter.cc

Thanks for the link.   My first thought looking at that code was
"Whoah!" - you appear to be doing some fairly complicated text parsing
in C++." Have you considered using Python for that?  (in my experience
the time taken to run the tool dwarfs that of parsing the results, so
using an interpreted language shouldn't impose a noticeable speed
penalty).

Those rules appear to be very much tool-specific, for the analysis tool
that you're parsing the results of.

It would seem to makes sense to have two concepts of message for a
warning: the raw message (as emitted by the tool), and a cleaned version
of the message (for use in comparing for duplicates).  I wonder where
the cleaning should happen: on parsing, or at a later time.  The latter
seems better: the parser reads the raw message, which is treated as
canonical, and we can have a set of rules for cleaning the message
(which will need to grow and be tweaked as we run into difficult
messages).

I did some message cleaning of my own in:
https://github.com/fedora-static-analysis/mock-with-analysis/blob/master/reports/make-comparative-report.py#L28
since some of cpychecker's messages embed too much context. (my bad, as
author of cpychecker)

> > See "make-comparative-report.py:ComparativeIssues" in
> > https://github.com/fedora-static-analysis/mock-with-analysis/blob/master/re
> > ports/make-comparative-report.py
> 
> Actually my comment was not about the matching algorithm, but about the way 
> you present the comparative results.  The UI is based on comparing a pair of 
> source files.  In many cases you will fail to find a proper pairing of source 
> files between two versions of a package.
Yes, you're right.  I need to be careful to ensure that issues aren't
dropped when this happens.  I haven't yet checked to see what my
experiment does in that case.

Thanks for the valuable suggestions.
Dave