Hi folks,
The long discussions upstream regarding the default behaviour of Python's os.urandom() API in Python 3.6 have come to an end, with Guido opting to make os.urandom() block implicitly waiting for the system RNG to be ready, even on Linux (where /dev/urandom doesn't do that).
While I can see his point from a cross-platform language runtime design perspective, I'm still not sure it's the right answer for the Fedora system Python, as it doesn't play nice with ABRT, and goes against Ted T'so's decision to keep /dev/urandom non-blocking at the kernel level (risking confusion for folks that are Linux developers & operators first, and Pythonistas second).
If "os.urandom() was called before system RNG is ready" throws an uncaught BlockingIOError, then we'll get a nice ABRT-friendly Python stack trace for people to debug (and perhaps figure out their VM entropy pool isn't being seeded properly, or their ARM hardware design needs a better source of entropy). By contrast, if it just blocks, then folks will be faced with a system hang, which they'll need to trace back to CPython being blocked on a kernel getrandom() call, and then infer from that that the system RNG isn't ready (for whatever reason), and they should probably do something about that.
Patching this behaviour should be relatively straightforward - Python 3.5 currently makes this call in non-blocking mode and falls back to reading /dev/urandom directly in that case, so we'd just be forward porting the same logic to 3.6 and raising an exception instead of falling back to the file descriptor.
Accordingly, what I propose we do is as follows:
1. Raise the concern in the F26 system-wide change proposal for migrating to Python 3.6 2. Apply the patch when the 3.6 beta releases are added to Fedora Rawhide 3. Decide whether or not to keep the patch based on ABRT results and other feedback on the Rawhide builds.
Note that if the feedback on Rawhide shows that the change is helping people to find and diagnose VMs and hardware with improperly seeded entropy pools, that's a *good* thing: this proposed change is replacing Python 3.5's "/dev/urandom and os.urandom() may silently return statistically less-than-fully-random random numbers if the kernel entropy pool isn't seeded properly" with "os.urandom() will fail noisily in those cases, so you can either switch to the random module, or fix your entropy pool seeding".
Cheers, Nick.
On 8 August 2016 at 13:10, Nick Coghlan ncoghlan@gmail.com wrote:
Accordingly, what I propose we do is as follows:
- Raise the concern in the F26 system-wide change proposal for migrating
to Python 3.6 2. Apply the patch when the 3.6 beta releases are added to Fedora Rawhide 3. Decide whether or not to keep the patch based on ABRT results and other feedback on the Rawhide builds.
Note that if the feedback on Rawhide shows that the change is helping people to find and diagnose VMs and hardware with improperly seeded entropy pools, that's a *good* thing: this proposed change is replacing Python 3.5's "/dev/urandom and os.urandom() may silently return statistically less-than-fully-random random numbers if the kernel entropy pool isn't seeded properly" with "os.urandom() will fail noisily in those cases, so you can either switch to the random module, or fix your entropy pool seeding".
Elaborating a bit further on the nature of the proposed Rawhide experiment, the cases we're trying to distinguish are:
- it doesn't really matter, because it doesn't happen much (few or no ABRT hits) - it happens, but blocking briefly resolves it (a preceding "python -c 'import os; os.getrandom(1)" eliminates the exception) - it happens, and blocking causes an indefinite hang (a preceding "python -c 'import os; os.getrandom(1)" never completes)
If the feedback from Rawhide builds with the patch applied all falls into the first two categories, then we should drop the patch before F26 Beta and stick with the upstream behaviour of a cross-platform blocking os.urandom() implementation, with folks that want to opt-in to non-blocking behaviour pointed to the new os.getrandom() API.
It's only if we get a significant number of bug reports that fall into the third category that we'd consider keeping the patch, as those are the ones where blocking implicitly won't help, and in fact may make a system level configuration problem harder to diagnose.
Cheers, Nick.
On 08/08/2016 06:09 AM, Nick Coghlan wrote:
On 8 August 2016 at 13:10, Nick Coghlan <ncoghlan@gmail.com mailto:ncoghlan@gmail.com> wrote:
Accordingly, what I propose we do is as follows: 1. Raise the concern in the F26 system-wide change proposal for migrating to Python 3.6 2. Apply the patch when the 3.6 beta releases are added to Fedora Rawhide 3. Decide whether or not to keep the patch based on ABRT results and other feedback on the Rawhide builds. Note that if the feedback on Rawhide shows that the change is helping people to find and diagnose VMs and hardware with improperly seeded entropy pools, that's a *good* thing: this proposed change is replacing Python 3.5's "/dev/urandom and os.urandom() may silently return statistically less-than-fully-random random numbers if the kernel entropy pool isn't seeded properly" with "os.urandom() will fail noisily in those cases, so you can either switch to the random module, or fix your entropy pool seeding".
Elaborating a bit further on the nature of the proposed Rawhide experiment, the cases we're trying to distinguish are:
- it doesn't really matter, because it doesn't happen much (few or no
ABRT hits)
- it happens, but blocking briefly resolves it (a preceding "python -c
'import os; os.getrandom(1)" eliminates the exception)
- it happens, and blocking causes an indefinite hang (a preceding
"python -c 'import os; os.getrandom(1)" never completes)
If the feedback from Rawhide builds with the patch applied all falls into the first two categories, then we should drop the patch before F26 Beta and stick with the upstream behaviour of a cross-platform blocking os.urandom() implementation, with folks that want to opt-in to non-blocking behaviour pointed to the new os.getrandom() API.
It's only if we get a significant number of bug reports that fall into the third category that we'd consider keeping the patch, as those are the ones where blocking implicitly won't help, and in fact may make a system level configuration problem harder to diagnose.
Cheers, Nick.
Hi, I agree with doing this experiment.
Two notes that should appear in the Change page:
This modification should only affect code that runs early in system boot. It is quite a special case, and Fedora is in a special position with some fairly low-level parts of system written in Python.
The raised BlockingIOError should advertise that it is Fedora-specific behavior, e.g. with a link to the Change page.
On 8 August 2016 at 20:04, Petr Viktorin pviktori@redhat.com wrote:
On 08/08/2016 06:09 AM, Nick Coghlan wrote:
Elaborating a bit further on the nature of the proposed Rawhide experiment, the cases we're trying to distinguish are:
- it doesn't really matter, because it doesn't happen much (few or no
ABRT hits)
- it happens, but blocking briefly resolves it (a preceding "python -c
'import os; os.getrandom(1)" eliminates the exception)
- it happens, and blocking causes an indefinite hang (a preceding
"python -c 'import os; os.getrandom(1)" never completes)
If the feedback from Rawhide builds with the patch applied all falls into the first two categories, then we should drop the patch before F26 Beta and stick with the upstream behaviour of a cross-platform blocking os.urandom() implementation, with folks that want to opt-in to non-blocking behaviour pointed to the new os.getrandom() API.
It's only if we get a significant number of bug reports that fall into the third category that we'd consider keeping the patch, as those are the ones where blocking implicitly won't help, and in fact may make a system level configuration problem harder to diagnose.
Cheers, Nick.
Hi, I agree with doing this experiment.
Two notes that should appear in the Change page:
This modification should only affect code that runs early in system boot. It is quite a special case, and Fedora is in a special position with some fairly low-level parts of system written in Python.
The raised BlockingIOError should advertise that it is Fedora-specific behavior, e.g. with a link to the Change page.
Writing my posts here reminded me of an idea that came up earlier in the upstream discussion, which was to emit a runtime warning when os.urandom() is forced to block waiting for the system RNG, so I proposed that as a tweak to the way PEP 524 will be implemented: https://mail.python.org/pipermail/security-sig/2016-August/000105.html
That way it can be blocking by default (as specified by PEP 524), but made to raise an error just by setting PYTHONWARNINGS appropriately, *or* patched at build time to add that to the default set of filters in Python/_warnings.c (init_filters). (We could also add it to Lib/warnings.py, but I'm not sure I see any point to doing that, since we never build Python without the warnings accelerator module)
Cheers, Nick.
Here's a potentially related issue – apparently Python 3.6 doesn't run on a CentOS 7 kernel (which would be an issue when running Fedora in Docker on an EL7 host, or when we try to get py3.6 in EPEL).
https://github.com/rpm-software-management/mock/issues/28
Harris, could you try to reproduce this?
On 08/08/2016 02:59 PM, Nick Coghlan wrote:
On 8 August 2016 at 20:04, Petr Viktorin <pviktori@redhat.com mailto:pviktori@redhat.com> wrote:
On 08/08/2016 06:09 AM, Nick Coghlan wrote: Elaborating a bit further on the nature of the proposed Rawhide experiment, the cases we're trying to distinguish are: - it doesn't really matter, because it doesn't happen much (few or no ABRT hits) - it happens, but blocking briefly resolves it (a preceding "python -c 'import os; os.getrandom(1)" eliminates the exception) - it happens, and blocking causes an indefinite hang (a preceding "python -c 'import os; os.getrandom(1)" never completes) If the feedback from Rawhide builds with the patch applied all falls into the first two categories, then we should drop the patch before F26 Beta and stick with the upstream behaviour of a cross-platform blocking os.urandom() implementation, with folks that want to opt-in to non-blocking behaviour pointed to the new os.getrandom() API. It's only if we get a significant number of bug reports that fall into the third category that we'd consider keeping the patch, as those are the ones where blocking implicitly won't help, and in fact may make a system level configuration problem harder to diagnose. Cheers, Nick. Hi, I agree with doing this experiment. Two notes that should appear in the Change page: This modification should only affect code that runs early in system boot. It is quite a special case, and Fedora is in a special position with some fairly low-level parts of system written in Python. The raised BlockingIOError should advertise that it is Fedora-specific behavior, e.g. with a link to the Change page.
Writing my posts here reminded me of an idea that came up earlier in the upstream discussion, which was to emit a runtime warning when os.urandom() is forced to block waiting for the system RNG, so I proposed that as a tweak to the way PEP 524 will be implemented: https://mail.python.org/pipermail/security-sig/2016-August/000105.html
That way it can be blocking by default (as specified by PEP 524), but made to raise an error just by setting PYTHONWARNINGS appropriately, *or* patched at build time to add that to the default set of filters in Python/_warnings.c (init_filters). (We could also add it to Lib/warnings.py, but I'm not sure I see any point to doing that, since we never build Python without the warnings accelerator module)
Cheers, Nick.
On 3 January 2017 at 20:01, Petr Viktorin pviktori@redhat.com wrote:
Here's a potentially related issue – apparently Python 3.6 doesn't run on a CentOS 7 kernel (which would be an issue when running Fedora in Docker on an EL7 host, or when we try to get py3.6 in EPEL).
Even in 3.6+, CPython falls back to reading /dev/urandom if the syscall triggers ENOSYS or ENOPERM at runtime.
What *will* fail is attempting to run in a chroot or container without access to either the getrandom syscall or the /dev/urandom device path.
Cheers, Nick.
On 4.1.2017 09:20, Nick Coghlan wrote:
On 3 January 2017 at 20:01, Petr Viktorin pviktori@redhat.com wrote:
Here's a potentially related issue – apparently Python 3.6 doesn't run on a CentOS 7 kernel (which would be an issue when running Fedora in Docker on an EL7 host, or when we try to get py3.6 in EPEL).
Even in 3.6+, CPython falls back to reading /dev/urandom if the syscall triggers ENOSYS or ENOPERM at runtime.
What *will* fail is attempting to run in a chroot or container without access to either the getrandom syscall or the /dev/urandom device path.
Reported today: https://bugzilla.redhat.com/show_bug.cgi?id=1410175
Cheers, Nick.
Here's a potentially related issue – apparently Python 3.6 doesn't run on a CentOS 7 kernel (which would be an issue when running Fedora in Docker on an EL7 host, or when we try to get py3.6 in EPEL).
https://github.com/rpm-software-management/mock/issues/28
Harris, could you try to reproduce this?
It is not just about python3.6 but I can also see something similar with old kernel and python35 in latest rawhide userspace https://bugzilla.redhat.com/show_bug.cgi?id=1410187
It is not just about python3.6 but I can also see something similar with old kernel and python35 in latest rawhide userspace https://bugzilla.redhat.com/show_bug.cgi?id=1410187
BTW the explanation is that the latest update of glibc in rawhide provides functions getentropy and getrandom
objdump -T /lib64/libc.so.6 | grep GLIBC_2.25 000000000003c6f0 g DF .text 000000000000008c GLIBC_2.25 getentropy 0000000000000000 g DO *ABS* 0000000000000000 GLIBC_2.25 GLIBC_2.25 000000000003c9a0 g DF .text 000000000000020f GLIBC_2.25 strfromd 000000000003c780 g DF .text 000000000000021f GLIBC_2.25 strfromf 000000000003cbb0 g DF .text 000000000000021f GLIBC_2.25 strfroml 00000000000a3a60 g DF .text 0000000000000013 GLIBC_2.25 explicit_bzero 000000000011f1a0 g DF .text 0000000000000025 GLIBC_2.25 __explicit_bzero_chk 000000000003c650 g DF .text 0000000000000099 GLIBC_2.25 getrandom
and the latest python3 nad python35 packaeges uses it sh# rpm -q python3 python35 python3-3.6.0-1.fc26.x86_64 python35-3.5.2-5.fc26.x86_64
sh# objdump -T /usr/lib64/libpython3.5m.so | grep GLIBC_2.25 0000000000000000 DF *UND* 0000000000000000 GLIBC_2.25 getentropy
sh# objdump -T /usr/lib64/libpython3.6m.so | grep GLIBC_2.25 0000000000000000 DF *UND* 0000000000000000 GLIBC_2.25 getentropy
older version of python3 (3.5 at that time) didn't use it because it was compiled against older version of glibc
sh# objdump -T /usr/lib64/libpython3.so | grep GLIBC_2.25 sh# objdump -T /usr/lib64/libpython3.5m.so.1.0 | grep GLIBC_2.25 sh# rpm -q python3 python3-3.5.2-7.fc26.x86_64
On 5 January 2017 at 03:42, Lukas Slebodnik lslebodn@fedoraproject.org wrote:
It is not just about python3.6 but I can also see something similar with old kernel and python35 in latest rawhide userspace https://bugzilla.redhat.com/show_bug.cgi?id=1410187
BTW the explanation is that the latest update of glibc in rawhide provides functions getentropy and getrandom
And Victor Stinner further diagnosed that as a combined bug in CPython's conditional compilation logic where:
- getentropy was preferred over getrandom when both were available - only the getrandom code had the ENOSYS handling needed to cope with newer binaries running on older kernels
http://bugs.python.org/issue29157 has a patch to change the logic so that getrandom is preferred over getentropy when both are available, and also to add the ENOSYS handling that getrandom already has to getentropy.
Cheers, Nick.
python-devel@lists.fedoraproject.org