Static Analysis: some UI ideas

Tue Feb 5 12:02:36 UTC 2013

On Monday 04 February 2013 22:37:45 David Malcolm wrote:
> Content-addressed storage: they're named by SHA-1 sum of their contents,
> similar to how git does it, so if the bulk of the files don't change,
> they have the same SHA-1 sum and are only stored once.  See e.g.:
> http://fedorapeople.org/~dmalcolm/static-analysis/2013-01-30/python-ethtool
> -0.7-4.fc19.src.rpm/static-analysis/sources/ I probably should gzip them as
>  well.

This can indeed save some space.  I really like the idea.  Maybe using a true 
git store would give you additional reduction of space requirements thanks to 
using the delta compression.

> Currently it's capturing all C files that have GCC invoked on them, or
> are mentioned in a warning (e.g. a .h file with an inline function with
> a bug).  I could tweak things so it only captures files that are
> mentioned in a warning.

But then you would be no longer able to provide the context.  If the error 
trace goes through a function foo() defined in another module of the same 
package, the user needs to look at its definition to confirm/waive the defect.

> I guess the issue is: where do you store the knowledge about good vs bad
> warnings?   My plan was to store it server-side.  But we could generate
> summaries and have them available client-side.  For example, if, say
> cppcheck's "useClosedFile" test has generated 100 issues of which 5 have
> received human attention: 1 has been marked as a true positive, and 4
> has been marked as false positives.  We could then say ("cppcheck",
> "useClosedFile") has a signal:noise ratio of 1:4.   We could then
> generate a summary of these (tool, testID) ratios for use by clients,
> which could then a user-configurable signal:noise threshold, so you can
> say: "only show me results from tests that achieve 1:2 or better".

I did not realize you mean auto-filtering based on statistics form user's 
input.  Then maintaining the statistics at the server sounds as a good idea.  
Being able to export a text file with scores per checker should be just fine 
for the command-line tools.  We will see if the statistics from user's input 
could be used as a reliable criterion.  The problem is that some defects
tend to be classified incorrectly without a deeper analysis of the report
(and code).

> > The limitation of javascript-based UIs is that they are read-only.  Some
> > developers prefer to go through the defects using their own environment
> > (eclipse, vim, emacs, ...) rather than a web browser so that they can fix
> > them immediately.  We should support both approaches I guess.
> 
> Both approaches.  What we could do is provide a tool ("fedpkg
> get-errors" ?)  that captures the errors in the same output format as
> gcc.  That way if you run it from say gcc, the *compilation* buffer has
> everything in the right format, and emacs' goto-next-error stuff works.

'fedpkg foo' is probably overkill at this point.  My concern was rather that 
we should not so much rely on the web server/browser approach in the first 
place.  I would like to have most of the equipment working just from terminal 
without any server or browser.  Any server solution can be then easily built 
on top of it.

> Currently it's matching on 4 things:
> * by name of test tool (e.g. "clang-analyzer")

+ the class of defect?  e.g. "useClosedFile" in your example above...

> * by filename of C file within the tarball (so e.g.
> '/builddir/build/BUILD/python-ethtool-0.7/python-ethtool/etherinfo.c'
> becomes 'python-ethtool/etherinfo.c', allowing different versions to be
> compared)

With some part of the path or just base name?

> * function name (or None)

You want to work with full signatures if you are going to support overloaded 
functions/methods in C++.

> * text of message

The messages cannot be checked for exact match in certain cases.  Have a look 
at the rules we use in csdiff for the text messages:

http://git.fedorahosted.org/cgit/codescan-diff.git/plain/csfilter.cc

> See "make-comparative-report.py:ComparativeIssues" in
> https://github.com/fedora-static-analysis/mock-with-analysis/blob/master/re
> ports/make-comparative-report.py

Actually my comment was not about the matching algorithm, but about the way 
you present the comparative results.  The UI is based on comparing a pair of 
source files.  In many cases you will fail to find a proper pairing of source 
files between two versions of a package.

Kamil