Do we care about the exec-shield=2 configuration? Does anybody use that?
In the execshield patch we have in Fedora at this point, the (exec_shield & 2) special cases are the only arch-independent changes that are not fairly clean and isolated.
The patch puts a comment in sysctl.c about several bit flags in exec_shield, but actually only &2 and !=0 are really meaningful in our code. If we could get rid of exec_shield&2 then it would be down to just exec_shield!=0 and as of now that already only affects NX-emulation in fact.
If someone does want a behavior akin to exec_shield&2 that could be done cleanly (and upstreamed) with a saner sysctl or two. What it does now is a little incoherent.
Thanks, Roland
On Fri, Jun 18, 2010 at 04:00:12AM -0700, Roland McGrath wrote:
Do we care about the exec-shield=2 configuration? Does anybody use that?
I'd be surprised to hear that anyone changes that sysctl these days.
In the execshield patch we have in Fedora at this point, the (exec_shield & 2) special cases are the only arch-independent changes that are not fairly clean and isolated.
The patch puts a comment in sysctl.c about several bit flags in exec_shield, but actually only &2 and !=0 are really meaningful in our code. If we could get rid of exec_shield&2 then it would be down to just exec_shield!=0 and as of now that already only affects NX-emulation in fact.
If someone does want a behavior akin to exec_shield&2 that could be done cleanly (and upstreamed) with a saner sysctl or two. What it does now is a little incoherent.
Sounds like a good idea to me.
Dave
After the thundering disinterest when I asked, I've nixed the exec-shield=2 code from rawhide's execshield patch. That leaves no meaning for the exec-shield boot/sysctl parameter except that, on x86-32 kernels only, exec-shield=0 disables segmentation hacks when NX is not available (i.e. non-PAE or ancient hardware). So I've removed it from sysctl except for x86-32.
I also removed some random lingering cruft that AFAICT really didn't do anything useful any more.
We've whittled execshield.patch down to where all the really crufty parts relate only to the x86-32 segmentation hack for old hardware/non-PAE. There is only one piece of code actually active on other machines/configs, and even that is used only for 32-bit x86 processes.
I hope to manage to cajole Ingo into either upstreaming or punting that one thing, the different arch_get_unmapped_area algorithm used for PROT_EXEC mappings. I can't tell if it's actually of any use when we're not using the segmentation hack or not. If it is, some version of it belongs upstream.
Thanks, Roland
* Roland McGrath roland@redhat.com wrote:
I hope to manage to cajole Ingo into either upstreaming or punting that one thing, the different arch_get_unmapped_area algorithm used for PROT_EXEC mappings. I can't tell if it's actually of any use when we're not using the segmentation hack or not. If it is, some version of it belongs upstream.
Even not considering the segmentation based protection, it's useful (on 32-bit) because it compresses executable mappings into an address space region where all 32-bit addresses have a zero byte in them.
This adds one more complication to exploits - for example ASCII string overflow based exploits (which cannot have a end-of-string zero byte in them) will have to work harder to generate an address into that address range. (Some may even be prevented altogether - although it's usually rather hard to disprove the exploitability of overflow bugs.)
But upstream mm/ maintainers expressed a thundering disinterest in these kinds of changes, and the segmentation based trick was explicitly nak-ed IIRC.
Thanks,
Ingo
Even not considering the segmentation based protection, it's useful (on 32-bit) because it compresses executable mappings into an address space region where all 32-bit addresses have a zero byte in them.
Ok. So these are really two independent things. Let's address each one separately.
I've split them into two separate patches for rawhide. The rawhide kernel builds with either, both, or neither of these, so they are really separable. You can find them at:
http://cvs.fedoraproject.org/viewvc/devel/kernel/ linux-2.6-32bit-mmap-exec-randomization.patch linux-2.6-i386-nx-emulation.patch
32bit-mmap-exec-randomization.patch is now pretty small:
arch/x86/mm/mmap.c | 3 + arch/x86/vdso/vdso32-setup.c | 2 b/include/linux/sched.h | 4 + b/mm/mmap.c | 91 ++++++++++++++++++++++++++++++++++++++++--- include/linux/mm.h | 8 +++ include/linux/mm_types.h | 3 + mm/mremap.c | 4 - 7 files changed, 106 insertions(+), 9 deletions(-)
We'd sure be happy if you wanted to clean that up and maintain it on some tip branch, whether or not it ever gets merged.
wrt clean-up, IMHO it would be cleanest to just add the prot argument to get_unmapped_area and all its cousins (fs hooks, etc). That alone doesn't seem like it should be especially controversial upstream. It now has almost every mmap argument, and this would add the missing one. It seems quite reasonable that arch or driver hooks might want to use PROT_* bits as part of the decision.
From what you've said it sounds like this PROT_EXEC behavior would be
desireable for all 32-bit processes, not just on x86. But the patch we have now only touches x86 as it is, and it seems unlikely anyone really cares about maximal exploit mitigation for other machines to bother with. I would have thought you could have just put it in the x86 arch function upstream. Last I knew, you had some pull with the x86 arch maintainers. ;-)
Now, as to i386-nx-emulation.patch, a separate subject.
Again, we would not mind one bit if you wanted to clean it up and maintain it on some tip tree. With or without that, it could use a little more cleanup (it still has the exec-shield boot/sysctl parameter that should be replaced with something more sensibly-named and cleanly implemented for just suppressing the segmentation hack). Off hand, I can't say I'm likely to bother with it. (I could be more easily convinced to put any time into it if it were ever going to be merged upstream.) Perhaps someone like Kees will want to bother, to make it a cleaner version we'd share across Ubuntu and Fedora. I figure that eventually some Fedora release cycle will stop supporting non-PAE hardware anyway and/or officially just not care about maximal exploit mitigation for non-PAE or ancient hardware. So one day we'll just drop that patch.
Thanks, Roland
Hi,
On Fri, Jul 09, 2010 at 02:03:26PM -0700, Roland McGrath wrote:
and Fedora. I figure that eventually some Fedora release cycle will stop supporting non-PAE hardware anyway and/or officially just not care about maximal exploit mitigation for non-PAE or ancient hardware. So one day we'll just drop that patch.
Yeah, Ubuntu is in a similar situation, but I fear it's still several years out.
It seems like the patches you've got still don't have the brk collision fix I sent[1] a while back? (Looks like the va_randomize fix was done, though.)
Also, it looks like the ASLR is seriously flawed. In actual testing, the ASLR in this patch set is extremely predictable due to how it does the reordering, which actually reduces its entropy. :( I haven't worked out a good way to fix it yet, though, but I suspect doing a base offset like is done in mainline is the way to go, though the range is so tiny, I'm not sure how to best deal with it. Maybe wrap around in the SHLIB_BASE through 0x08000000 range? Anyway, running "ldd $(which mysql)" 1000 times sometimes shows libc in the same place almost 500 of those times.
Regardless, having a branch rebased on upstream linux would be nice. I've got one here at the moment: http://kernel.ubuntu.com/git?p=kees/linux-2.6.git;a=shortlog;h=refs/heads/nx...
It looks like you've added a few more CONFIG_X86_32 checks, but not as many as I've got still. Have you got any feedback on the patches I'm carrying here?
Thanks,
-Kees
[1] first hunk of http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-maverick.git;a=commitdiff;h=07c...
Yeah, Ubuntu is in a similar situation, but I fear it's still several years out.
Yeah, well, I said "some Fedora cycle" and it may be 2*several of those too. We aren't really the people who decide when to narrow the target set of CPUs.
It seems like the patches you've got still don't have the brk collision fix I sent[1] a while back? (Looks like the va_randomize fix was done, though.)
I missed that one. I don't know if it's been reported to us as a bug. TBH, I had really not wanted to look into any of the details of this before we at least nominally cleaned it up and split it into its appropriate two topic patches. I've only just done that in the last week.
Also, it looks like the ASLR is seriously flawed. In actual testing, the ASLR in this patch set is extremely predictable due to how it does the reordering, which actually reduces its entropy. :( I haven't worked out a good way to fix it yet, though, but I suspect doing a base offset like is done in mainline is the way to go, though the range is so tiny, I'm not sure how to best deal with it. Maybe wrap around in the SHLIB_BASE through 0x08000000 range? Anyway, running "ldd $(which mysql)" 1000 times sometimes shows libc in the same place almost 500 of those times.
I've never thought much about the details of the randomization algorithms and I don't much intend to. Ingo hatched all that and I'd be happy to see the two of you hash out something that makes more sense.
In our new split, 32bit-mmap-exec-randomization.patch contains all that logic. A better/cleaner rewrite of that would be great, whatever the details you and Ingo think work. (It seems like it should be doable in ways clean enough that upstream should take it, too. I said some details about that here earlier.) That's where the brk change belongs to, though I leave it to you and Ingo to decide how those two should each be and fit together. (I'm happy to help with making it clean and all, but I just don't have any input to offer on the address selection logic itself.)
Ingo tells us an algorithm akin to this one is desireable for any process with 32-bit pointers. So it really is entirely separate from NX emulation per se, and should not be conditional on CONFIG_X86_32 at all. (When it gets cleaned up, it might not even want to be x86-specific.)
Regardless, having a branch rebased on upstream linux would be nice. I've got one here at the moment: http://kernel.ubuntu.com/git?p=kees/linux-2.6.git;a=shortlog;h=refs/heads/nx...
Not all that rebased...for some reason git log --no-merges v2.6.35-rc5...kees/nx-emu shows a lot. But $ git describe `git merge-base v2.6.35-rc5 kees/nx-emu` v2.6.35-rc3-262-g984bc96 and git log v2.6.35-rc3-262-g984bc96..kees/nx-emu shows just the three patches.
It looks like you've added a few more CONFIG_X86_32 checks, but not as many as I've got still. Have you got any feedback on the patches I'm carrying here?
Um, they look nearly identical to how our execshield.patch was before the clean up and split I just did recently. Aside from the cruft that I just took out that you still have, you have the brk change we don't have, but you also have #ifdef CONFIG_X86_32'd the randomization change, which we don't think is the desireable change. Not to be too transparently lazy, but I'd like it better if you started from our current two patches instead. ;-)
Thanks, Roland
On Tue, Jul 13, 2010 at 05:07:53PM -0700, Roland McGrath wrote:
It seems like the patches you've got still don't have the brk collision fix I sent[1] a while back? (Looks like the va_randomize fix was done, though.)
I missed that one. I don't know if it's been reported to us as a bug. TBH, I had really not wanted to look into any of the details of this before we at least nominally cleaned it up and split it into its appropriate two topic patches. I've only just done that in the last week.
I don't know if it's been reported in RH, but when I first triaged the problem in Ubuntu, I did check Fedora and I saw it there too. I probably should have opened a bug (sorry about that) -- I ended up forwarding the patch instead to Dave Jones and Kyle McMartin.
In our new split, 32bit-mmap-exec-randomization.patch contains all that logic. A better/cleaner rewrite of that would be great, whatever the details you and Ingo think work. (It seems like it should be doable in ways clean enough that upstream should take it, too. I said some details about that here earlier.) That's where the brk change belongs to, though I leave it to you and Ingo to decide how those two should each be and fit together. (I'm happy to help with making it clean and all, but I just don't have any input to offer on the address selection logic itself.)
Cool; it'll probably be a few months before I have time to really start trying new ways to deal with it.
Ingo tells us an algorithm akin to this one is desireable for any process with 32-bit pointers. So it really is entirely separate from NX emulation per se, and should not be conditional on CONFIG_X86_32 at all. (When it gets cleaned up, it might not even want to be x86-specific.)
Well, the existing ASLR infrastructure is actually pretty good -- if we could eke out even more entropy that'd be nice. One of our ARM kernel guys just recently finished up getting ASLR working there, and that'll be getting upstreamed soon.
Regardless, having a branch rebased on upstream linux would be nice. I've got one here at the moment: http://kernel.ubuntu.com/git?p=kees/linux-2.6.git;a=shortlog;h=refs/heads/nx...
Not all that rebased...for some reason git log --no-merges v2.6.35-rc5...kees/nx-emu shows a lot. But $ git describe `git merge-base v2.6.35-rc5 kees/nx-emu` v2.6.35-rc3-262-g984bc96 and git log v2.6.35-rc3-262-g984bc96..kees/nx-emu shows just the three patches.
Git confuses me to no end. Is my branch not on Linus's HEAD?
It looks like you've added a few more CONFIG_X86_32 checks, but not as many as I've got still. Have you got any feedback on the patches I'm carrying here?
Um, they look nearly identical to how our execshield.patch was before the clean up and split I just did recently. Aside from the cruft that I just took out that you still have, you have the brk change we don't have, but you also have #ifdef CONFIG_X86_32'd the randomization change, which we don't think is the desireable change. Not to be too transparently lazy,
Ah, yeah, that's where I was looking -- I prefer to use the upstream ASLR when we don't have to pin the libraries in the "below 0x08000000" range. I.e. we only use the weaker ASLR when we're forced to use the nx-emu too.
but I'd like it better if you started from our current two patches instead. ;-)
Sure, I can do that -- do you want to push a git tree with those patches, and I can work off that and send you stuff to merge?
Thanks,
-Kees
Well, the existing ASLR infrastructure is actually pretty good -- if we could eke out even more entropy that'd be nice. One of our ARM kernel guys just recently finished up getting ASLR working there, and that'll be getting upstreamed soon.
Ingo says the important point of the PROT_EXEC handling is to choose a 32-bit address that has some whole byte of 0.
Git confuses me to no end. Is my branch not on Linus's HEAD?
At the moment Linus's HEAD is v2.6.35-rc5, so no, it's not. It's on v2.6.35-rc3-262-g984bc96, which was Linus's HEAD two weeks ago. It's some git mystery I also don't follow why the ... log syntax doesn't look clean. But upstream..kees/nx-emu does look clean. So go figure. Ra ra git. Anyway, if you're trying to keep it a simple history, a fresh 'git rebase' should do it.
Ah, yeah, that's where I was looking -- I prefer to use the upstream ASLR when we don't have to pin the libraries in the "below 0x08000000" range. I.e. we only use the weaker ASLR when we're forced to use the nx-emu too.
Ingo's plan is intended to be "better" because having a 0 byte in the address is inherently helpful for defanging exploits in a qualitative sense different from simple measurement of how much randomness there is in the address selection. It is not related to NX emulation.
When NX emulation is actually taking place, it's furthermore desireable to put all PROT_EXEC mappings at addresses lower than all non-PROT_EXEC mappings. So that (i.e. !(__supported_pte_mask & _PAGE_NX) tests) could perhaps influence what randomization details you want to use. But that is really the secondary concern for us.
but I'd like it better if you started from our current two patches instead. ;-)
Sure, I can do that -- do you want to push a git tree with those patches, and I can work off that and send you stuff to merge?
Frankly, I'm more interested in you and Ingo agreeing on new randomization behavior for executable mappings and seeing if we can't get that (or at least configurability that makes it an option) in upstream.
But since you asked and it was easy:
git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland.git
Two branches:
fedora/x86-nx-emulation
Ingo thinks this will never go upstream. That being the case, it's more or less fine as it stands. The only cleanup it really needs is to replace the exec-shield sysctl/boot parameter with something more sensically named. The only function of it is to suppress the NX emulation on a machine that would otherwise do it (i.e. has no NX).
Changing the sysctl only affects new processes, since it just causes "arch_add_exec_range(current->mm, -1);". But disabling this at boot time means it can leave sysenter enabled, which exec-shield=0 now does. Once you've done that, you can't really usefully enable the sysctl later, since you'll still be using sysenter, which breaks the segmentation every time. So perhaps you should not be able to reenable it by sysctl at all.
I don't really know why anyone might want to disable it, except for bugs in it. So it probably doesn't matter much what knobs we leave here. But the current one is stupid.
fedora/32bit-mmap-exec-randomization
This can probably do with being rewritten from scratch, in fact. I said before that I think the clean change, and the approach that I'd prefer if I were upstream, is to just adjust the get_unmapped_area function and hooks throughout to take the prot argument, rather than adding the second hook. Whatever the cleaned-up version of executable mapping logic is, it really belongs in a generic version of the function, it's desireable for any 32-bit process. We should do a little jiggery so that sys_x86_64.c's function can call that generic code instead of duplicating it wholesale by cut&paste as it is now. If you and Ingo can agree on what the actual address selection algorithm should be for 32-bit PROT_EXEC mappings, I can be talked into doing all the glue and cleanup juggling around it.
Thanks, Roland
kernel@lists.fedoraproject.org