What to move to?

Tue Apr 16 11:46:25 UTC 2013

On 04/15/2013 09:04 PM, Björn Persson wrote:
> Miloslav Trmač wrote:
>> The logical conclusion from this is to move to a language with automatic
>> memory management.  The "top vulnerability" reports for programs written in
>> C/C++ and most other languages so different that starting a new project
>> that processes untrusted data in C/C++ is becoming indefensible.
>
> If by "automatic memory management" you mean garbage collection, then
> that's not really what we need. Garbage collection has advantages, but
> what is needed to stop the buffer overflows is bounds checking. The
> compiler needs to keep track of how big each object is and insert code
> to check that writes to an array stay within the bounds of the array.

There's also the issue of dangling pointers (pointers which point to a 
memory location which now holds an object of a different type).  They 
can result from misapplied memory management, or from type safety 
loopholes in the language definition.  An example for Ada is here:

   <http://www.enyo.de/fw/notes/ada-type-safety.html>

(See the postscript—this was already known in the Ada 83 days.  I still 
find it remarkable.  It's possible to work around this in a GC-based 
implementation.)

>> Now, what to move to?  I currently don't have see any language/runtime I
>> could recommend, which is in itself rather frightening.
>
> I recommend Ada. Ada does bounds checking, and is compiled to machine
> code with performance comparable to C.

Yes, Ada has some nice features.  At least there are real arrays, but 
they are somewhat cumbersome to work with, compared to Java, Python or, 
well, C pointers.  There are two aspects: preservation of array bounds 
in slices (so that you have to write Table (Table'First + Offset) to 
access the element Offset of Table, Offset ranging from 0 to 
Table'Length - 1), and the fact that is impossible to put an 
unconstrained array (of arbitrary length) into a constrained object 
(i.e., you need an indirection).

For many programming tasks, arrays might be at the wrong level of 
abstraction, but we have a lot of plumbing code which uses them heavily.

Garbage collection support would make it easier to introduce the 
indirection, but it would require a conservative collector at present, 
and those we have right now (Boehm-Dehmers-Weiser and the Go collectors) 
require a process-global view, touch signal handlers etc., so they do 
away with one significant Ada advantage (see below).

 > Only compiler bugs can cause
> buffer overflows in Ada, unless you're so foolhardy that you disable the
> bounds checking.

The GNAT run-time is compiled without language-defined checks, and it 
used to have at least one buffer overflow in the Ada part.  Many Ada 
libraries used to follow GNAT's example and disabled the checks as well, 
but this has changed during the last few years, it appears.  Manual 
overflow checks are hampered by the fact that -gnato still isn't the 
default.

> Ada doesn't do garbage collection across the whole program, but features
> such as controlled types, generic data structures and out parameters
> greatly reduce the need for garbage collection. The double-free problem
> is also eliminated. (Garbage collection was made optional in Ada so
> that the language would be suitable for embedded real-time systems, and
> in practice most compilers don't provide it.)

Controlled types have a fixed overhead which is quite visible with small 
objects.  By default, code for abort deferral is emitted, the vtable 
pointer takes space, and avoiding unnecessary indirect calls takes some 
care by the programmer.  There's also no well-defined ABI for shared 
libraries (and adding a subprogram can change the name of existing 
subprograms).

On the other hand, lack of garbage collection means that it's feasible 
to have some GNAT-compiled part in a larger program, without the larger 
program noticing that there's a component not written in C.  I sometimes 
call this "deep embedding support", and only very few language 
implementations have this property at present.  (Even with GNAT, you 
have to restrict yourself to a language subset.)  The list of feasible 
systems programming languages is much, much longer, but most need global 
run-time state, threads, signal handler manipulation, have address space 
layout requirements etc.  But that is primarily an implementation issue, 
not an aspect which is inherent to most languages.

The other aspect is low baseline overhead from the run-time system.  We 
don't want programmers to rewrite working system components in C only to 
reduce memory usage.  This is what happened (or is expected to happen) 
to some daemons written in Python.

-- 
Florian Weimer / Red Hat Product Security Team