correlating static analysis results with known crashes
by Martin Milata
Hello,
I've implemented a proof-of-concept of an analysis that tries to pair
static analysis results with known crashes based on the source code
locations, as outlined in [1].
The code extends David Malcolm's mock-with-analysis and is available at
[2]. The machinery for generating static analysis results is unchanged
apart from a few fixes needed for it to run on Fedora 19. The script
make-simple-report.py was extended to accept second argument with crash
reports from FAF server, and the matching crashes are referenced in the
generated reports. There is currently no way to obtain the file with
crashes automatically, I got it from the server administrator.
I ran the analysis on following packages:
* tracker-0.16.2-1.fc19
* evolution-3.6.4-3.fc18
* gnome-shell-3.6.3.1-1.fc18
* nautilus-3.6.3-4.fc18
* python-2.7.3-13.fc18
* rhythmbox-2.98-4.fc18
Tracker was chosen arbitrarily, the rest of the builds are those that
have the highest number of distinct crashes. The results can be viewed
at [3] and given that they were obtained from packages with the highest
number of collected crashes, they don't seem to be very encouraging.
There are only three [4,5,6] matches that are not obvious false
positives. All the data needed to reproduce this should be available at
[7].
There are two main causes of false positives:
* The code considers all static analysis results, not only those from
tests for behaviour that would result in a crash at runtime.
* It considers all stack frames in a crash, not just the topmost one.
As a side note, all three matches come from the clang static analyzer,
which for some reason fails for quite a lot of source files.
What do you think?
Thanks,
Martin
[1] https://lists.fedoraproject.org/pipermail/firehose-devel/2013-October/000...
[2] https://github.com/mmilata/mock-with-analysis/tree/crash-correlation
[3] http://mmilata.fedorapeople.org/firehose-crash-correlation/
[4] http://mmilata.fedorapeople.org/firehose-crash-correlation/nautilus/sourc...
[5] http://mmilata.fedorapeople.org/firehose-crash-correlation/nautilus/sourc...
[6] http://mmilata.fedorapeople.org/firehose-crash-correlation/python/sources...
[7] http://mmilata.fedorapeople.org/firehose-crash-correlation.tar.xz
10 years, 6 months
ANN: firehose-0.3 released
by David Malcolm
"firehose" is a Python package intended for managing the results from
code analysis tools (e.g. compiler warnings, static analysis, linters,
etc).
It currently provides parsers for the output of gcc, clang-analyzer,
cppcheck, and findbugs. These parsers convert the results into a common
data model of Python objects, with methods for lossless roundtrips
through a provided XML format. There is also a JSON equivalent.
It is available on pypi here:
https://pypi.python.org/pypi/firehose
and via git from:
https://github.com/fedora-static-analysis/firehose
The mailing list is:
https://admin.fedoraproject.org/mailman/listinfo/firehose-devel
Firehose is Free Software, licensed under the LGPLv2.1 or (at your
option) any later version.
It requires Python 2.7 or 3.2 onwards, and has been successfully tested
with PyPy.
Changes since 0.2:
This release adds a parser for the output of the findbugs tool, along
with various bugfixes and other internal cleanups.
Thanks to Léo Cavaillé, Matthieu Caneill, Nicolas Dandrimont, and
yeshuxiong for their help with this release.
Enjoy!
Dave
10 years, 6 months
related work & correlation of problem data
by Martin Milata
Hello firehose-devel,
I have recently stumbled upon the firehose format and surrounding
ecosystem and realized that it somewhat overlaps with what we've been
working on within the Automatic Bug Reporting Tool project [1]. I'll
take the liberty of briefly introducing our format and software:
* uReport is a JSON-based format for describing run-time software
problems. It aims to contain no anonymous data and be useful for
automatic processing. There's no formal specification but some
information can be found at [2].
* FAF is the main consumer of uReports. It performs some analysis on
them (e.g. resolving addresses to source code locations, or grouping
reports likely caused by the same bug), and provides a web interface
with statistics. The instance for Fedora is available at [3]
* satyr is a library (in C with Python bindings) for creating and
manipulating uReports. It is used by ABRT to create uReports and by
FAF to perform clustering on them. In terms of type of problems, it
currently supports crashes of native binaries that produce core dump,
uncaught python exceptions, and kernel oopses.
Due to the slightly different goals of both projects and substantial
amount of code already written, it seems unclear whether it would be
beneficial to directly collaborate. However, it might be interesting to
find out if we can obtain anything useful by correlating the reports
from static analyzers and from crashes on users' computers.
To be more concrete, we were thinking of doing something like this:
* Run static analysis on a particular package, keep only the issues
that would correspond to a crash if it occurred.
* Take the crash reports for that package.
* For every static analysis result, try to find a crash report with the
same source code location.
Finding such pair would mean that the static analysis issue is not a
false positive and also the additional information from the static
analysis (e.g. trace) could be added to the crash report.
Does it make any sense? Would the results of such analysis be useful?
Best regards,
Martin Milata
[1] https://github.com/abrt/abrt/wiki/ABRT-Project
[2] https://github.com/abrt/faf/wiki/uReport
[3] https://retrace.fedoraproject.org/faf/summary/
10 years, 7 months