Dealing with static code analysis in Fedora

Wed Dec 12 20:53:37 UTC 2012

On Wed, 2012-12-12 at 15:03 -0500, Steve Grubb wrote:
> On Wednesday, December 12, 2012 01:00:36 AM Paulo César Pereira de Andrade 
> wrote:
> > > A while back I ran my static checker on all of the Python extension
> > >
> > > modules in Fedora 17:
> > >   http://fedoraproject.org/wiki/Features/StaticAnalysisOfPythonRefcounts
> > >
> > > I wrote various scripts to build the packages in a mock environment that
> > > injects my checker into gcc, then wrote various scripts to triage the
> > > results.  I then filed bugs by hand for the most important results,
> > > writing some more scripts along the way to make the process easier.
> > > 
> > > This led to some valuable bug fixes, but the mechanism for running the
> > > analysis was very ad hoc and doesn't scale.
> > 
> >   I think it could be useful at least as a generic tool where one would
> > just do something like:
> > make CC=gcc-with-python-plugin
> > like some time ago one could run
> > make CC=cgcc
> > to see what sparse would tell. Or maybe think of it as a tool like
> > rpmlint.
> 
> I wrote a program, fake-make which collects everything so that programs like 
> cppcheck can be run with correct defines, paths, and files. Instructions are 
> here:
> 
> http://people.redhat.com/sgrubb/swa/cwe/index.html
> 
> That said, what's really needed is every analyzer to output messages with 
> something in common so that results can be compared. That something in common 
> is CWE (Common Weakness Enumeration). I was working on a mapping for cppcheck 
> to CWE so that it could be correlated with other tools.
> 
> the advantage of CWE is that its also married to CAPEC (Common Attack Pattern 
> enumeration and classification). This mapping shows some possible ways that the 
> bug being found could be exploited depending on other mitigating factors.
> 
> So, what would be nice is to figure out how to get all the static analyzers and 
> compilers outputting CWE information. Then define a common format so that 
> correlation tools can be built. If several tools report the same issue at the 
> same line, then its probably not a false positive and someone should look at 
> it.
> 
> But at the same time, not all bugs are created equal. A buffer overflow is a 
> worse problem than unchecked return code (unless its setuid(2)). There is a 
> scoring framework, CWSS (Common Weakness Scoring System) that can be used to 
> rank bugs so they can be prioritized. It also takes into account the effect of 
> the bug withon the program its found in. For example, buffer overflow in network 
> app or daemon is more critical that same issue in a program run by 
> authenticated users such as "ps". Don't get me wrong, there are corners 
> cases...but some heuristic has to be used and decisions have to be made.
> 
> So.. this would be my advice...try to follow these standards. Its all part of 
> a larger project to track weaknesses, combine with configuration information, 
> and network IDS systems for real time situational awareness.

Thanks for the feedback.

CWE appears to be a good fit for what I have in mind, so I intend to
allow analysis tools to (optionally) supply a CWE id in the reports we
capture, so we ought to be able to e.g. query on CWE id in the DB (and
I'm adding CWE codes right now to my own checker tool).

CWSS doesn't seem to be such a natural fit though: how can we map from
source code through to the scenarios described in the CWSS document?
setuid binaries is one thing that occurs to me, but I'm not sure how to
bridge from the world of, say, GCC's internal representation at build
time through to say "this code is being run by a regular user" let alone
"the entity has control over [...] a router" ; perhaps with a list of
ELF files that get run in/linked to setuid binaries?  (but that requires
bridging the world of compile time vs installation path).  Ideas
welcome.

Dave