On 15 December 2016 at 21:17, Toshio Kuratomi <a.badger(a)gmail.com> wrote:
On Mon, Dec 12, 2016 at 1:39 AM, Nick Coghlan
<ncoghlan(a)gmail.com> wrote:
> I don't anticipate any major concerns with downstream redistributors
> adding this behaviour, as the main thing that makes us nervous about
> globally changing the default upstream is the sheer variety of Linux
> distros out there, and the fact that folks are inclined to take their
> Linux integration bugs straight to
bugs.python.org rather than first
> trying the issue tracker for their particular distro.
>
My one concern is precisely this variety. For instance, if I get a
report that my application is raising a UnicodeError on RHEL7 when run
under cron (which uses the C locale) I might then try to replicate the
error on Fedora using the same LC_ALL=C locale. With this change I
would fail to reproduce the error.
But with the current patch you *would* get a visible warning on stderr saying:
Python detected LC_CTYPE=C. Setting LC_ALL & LANG to C.UTF-8.
This is a variation on arguments about why individual sites should
not
change the default encoding via sitecustomize.py. The changes tend to
make python applications non-portable. I don't think it is as severe
because we're still able to broadly classify things as "Fedora Python"
vs "Upstream Python" (instead of "Python running at My Business" vs
"Python running on the rest of the world" but it still is problematic.
Agreed, and my original idea upstream included an environment variable
override to account for that case:
http://bugs.python.org/issue28180#msg282964
I just forgot about that bit while writing the initial patch :(
As documented at
https://docs.python.org/3/using/cmdline.html#environment-variables the
normal convention for Python environment variable toggles is "A
non-empty string setting enables it", so the name I'd suggest here is
PYTHONALLOWCLOCALE.
The error message would then change to:
Python detected LC_CTYPE=C, forcing LC_ALL & LANG to C.UTF-8 (set
PYTHONALLOWCLOCALE to disable this behaviour)
and if the environment variable is already set:
Python detected LC_CTYPE=C, but PYTHONALLOWCLOCALE is set. Some
applications may not work correctly.
Does that approach seem more reasonable than unilateral locale
coercion with no off switch?
OTOH, if this is a stepping stone and proving ground for getting it
into upstream Python then we just get this change a little early...
that's IMHO, a good thing.
Yeah, my goal is to standardise this upstream for 3.7, but I expect
folks to be more willing to make it the default behaviour on *nix
systems if at least some distros are willing to try it out in their
releases of 3.6 first.
Perhaps what's needed is a locale on Fedora that allows people
to
select an ascii encoding for python which does not coincide with the C
locale. This should satisfy the case you mention that *most* of the
time the C locale is not a conscious desire to select the ascii
encoding but also, as I'm pointing out, the need to select an
ascii-only encoding for debugging cross-platform scripts and
applications.
As in an explicit "LANG=C.ASCII"? While I agree that would work, it's
probably more complexity than is needed vs a dedicated off switch for
the locale coercion.
On the other hand, if *glibc* were to some day start natively
interpreting "no locale set" or an unqualified "C" locale as
"C.UTF-8", then I agree a "C.ASCII" locale to explicitly opt in to
the
old behaviour would make sense.
[..]
> As far as where we might add that check, I'd suggest the entry point
> for the `python3` binary itself, rather than in the shared library:
>
https://hg.python.org/cpython/file/3.6/Programs/python.c#l46
>
I think the library is the appropriate place. Otherwise you end up
with a python application failing when run under mod_wsgi[*]_ which
you can't debug using the command line interpreter.
There's one pragmatic problem with that, and one that's a question of
appropriate division of responsibilities in terms of understanding the
runtime's context of use.
The pragmatic problem is that the main CPython binary calls
https://docs.python.org/3/c-api/sys.html#c.Py_DecodeLocale to convert
the command line arguments from char* to wchar_t* before it calls
Py_Main, which means we have to override the locale *before* we hand
over control to the dynamically linked library. Otherwise we end up in
exactly the same situation that click complains about: by the time we
find out there's a problem with the locale, some work has already been
done using the wrong setting.
The architectural problem is that when you embed CPython, it really is
one of the embedding application's responsibilities to configure the
locale such that the interpreter plays nice with the rest of the
application. It's one thing to second guess the shell from directly
inside a C-level main() function when we know POSIX makes some really
old ASCII-centric assumptions and that developers are prone to writing
"LANG=C" rather than "LANG=C.UTF-8" to turn off their locale
settings,
but something else entirely to second guess a GUI application like
Blender (where arbitrary amounts of code may have already run before
the CPython runtime gets initialised) or an application platform with
its own environment management system like Apache httpd.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia