Plan for tomorrow's FESCo meeting (2011-06-21)

Fri Jun 24 21:16:12 UTC 2011

On 6/24/11 3:31 AM, Richard W.M. Jones wrote:

> I don't think GHC generates C (it used to, a very long time ago).  GHC
> and OCaml contain code generators that generate machine code directly.
>
> So this could require changes to the code generator, but at least for
> RELRO it seems this is just a link-time change (is it?)

It's a link time change in that you have to ask the linker to create the 
GNU_RELRO segment put stuff in it, yes.  But it's also a code generation 
change if you want that segment to have anything in it besides the C 
runtime details.

Maybe an example will make this clear:

% cat test.c
#include <stdlib.h>
typedef void (*exit_type)(int);
maybeconst exit_type exit_type_array[] = { exit };
% gcc -Dmaybeconst= -c -o mutable.o test.c
% gcc -Dmaybeconst=const -c -o const.o test.c
% readelf -a mutable.o | grep -B2 -m1 exit
Relocation section '.rela.data' at offset 0x438 contains 1 entries:
   Offset          Info           Type           Sym. Value    Sym. Name 
+ Addend
000000000000  000800000001 R_X86_64_64       0000000000000000 exit + 0
% nm -a --defined mutable.o | grep exit
0000000000000000 D exit_type_array
% readelf -a const.o | grep -B2 -m1 exit
Relocation section '.rela.rodata' at offset 0x498 contains 1 entries:
   Offset          Info           Type           Sym. Value    Sym. Name 
+ Addend
000000000000  000900000001 R_X86_64_64       0000000000000000 exit + 0
% nm -a --defined const.o | grep exit
0000000000000000 R exit_type_array

We're not getting anywhere near the linker yet, but codegen has done 
different things.  When the array is expected to be const, the symbol 
and relocation info for the array are emitted into different sections.

Now, imagine tacking on to our test program something like:

#include <stdio.h>
int main(void) { printf("%p\n", exit_type_array[0]); return 0 }

and compiling it.  In this case, -z relro on its own will not help: the 
address of the 'exit' function isn't known until it's first called, 
because function resolution is normally done lazily, and because the 
'exit' symbol is not provided in the executable itself.  So the 
exit_type_array will end up in the final executable in a writeable 
section.  However, -z relro _will_ constify relocations that end up as 
part of the same linked object, eg, a function defined in one 
translation unit whose address is taken in another.

If instead you say both -z relro and -z now, then you are explicitly 
asking the runtime linker to resolve all symbols up front.  In this case 
the address of 'exit' _will_ be known before ctors are run, which means 
the array can be emitted in a .data.rel.ro section, which is initially 
writeable but made read-only after relocations.

- ajax