Static Analysis: some UI ideas
Kamil Dudka
kdudka at redhat.com
Tue Feb 5 12:02:36 UTC 2013
On Monday 04 February 2013 22:37:45 David Malcolm wrote:
> Content-addressed storage: they're named by SHA-1 sum of their contents,
> similar to how git does it, so if the bulk of the files don't change,
> they have the same SHA-1 sum and are only stored once. See e.g.:
> http://fedorapeople.org/~dmalcolm/static-analysis/2013-01-30/python-ethtool
> -0.7-4.fc19.src.rpm/static-analysis/sources/ I probably should gzip them as
> well.
This can indeed save some space. I really like the idea. Maybe using a true
git store would give you additional reduction of space requirements thanks to
using the delta compression.
> Currently it's capturing all C files that have GCC invoked on them, or
> are mentioned in a warning (e.g. a .h file with an inline function with
> a bug). I could tweak things so it only captures files that are
> mentioned in a warning.
But then you would be no longer able to provide the context. If the error
trace goes through a function foo() defined in another module of the same
package, the user needs to look at its definition to confirm/waive the defect.
> I guess the issue is: where do you store the knowledge about good vs bad
> warnings? My plan was to store it server-side. But we could generate
> summaries and have them available client-side. For example, if, say
> cppcheck's "useClosedFile" test has generated 100 issues of which 5 have
> received human attention: 1 has been marked as a true positive, and 4
> has been marked as false positives. We could then say ("cppcheck",
> "useClosedFile") has a signal:noise ratio of 1:4. We could then
> generate a summary of these (tool, testID) ratios for use by clients,
> which could then a user-configurable signal:noise threshold, so you can
> say: "only show me results from tests that achieve 1:2 or better".
I did not realize you mean auto-filtering based on statistics form user's
input. Then maintaining the statistics at the server sounds as a good idea.
Being able to export a text file with scores per checker should be just fine
for the command-line tools. We will see if the statistics from user's input
could be used as a reliable criterion. The problem is that some defects
tend to be classified incorrectly without a deeper analysis of the report
(and code).
> > The limitation of javascript-based UIs is that they are read-only. Some
> > developers prefer to go through the defects using their own environment
> > (eclipse, vim, emacs, ...) rather than a web browser so that they can fix
> > them immediately. We should support both approaches I guess.
>
> Both approaches. What we could do is provide a tool ("fedpkg
> get-errors" ?) that captures the errors in the same output format as
> gcc. That way if you run it from say gcc, the *compilation* buffer has
> everything in the right format, and emacs' goto-next-error stuff works.
'fedpkg foo' is probably overkill at this point. My concern was rather that
we should not so much rely on the web server/browser approach in the first
place. I would like to have most of the equipment working just from terminal
without any server or browser. Any server solution can be then easily built
on top of it.
> Currently it's matching on 4 things:
> * by name of test tool (e.g. "clang-analyzer")
+ the class of defect? e.g. "useClosedFile" in your example above...
> * by filename of C file within the tarball (so e.g.
> '/builddir/build/BUILD/python-ethtool-0.7/python-ethtool/etherinfo.c'
> becomes 'python-ethtool/etherinfo.c', allowing different versions to be
> compared)
With some part of the path or just base name?
> * function name (or None)
You want to work with full signatures if you are going to support overloaded
functions/methods in C++.
> * text of message
The messages cannot be checked for exact match in certain cases. Have a look
at the rules we use in csdiff for the text messages:
http://git.fedorahosted.org/cgit/codescan-diff.git/plain/csfilter.cc
> See "make-comparative-report.py:ComparativeIssues" in
> https://github.com/fedora-static-analysis/mock-with-analysis/blob/master/re
> ports/make-comparative-report.py
Actually my comment was not about the matching algorithm, but about the way
you present the comparative results. The UI is based on comparing a pair of
source files. In many cases you will fail to find a proper pairing of source
files between two versions of a package.
Kamil
More information about the devel
mailing list