Hi,
Troubleshooting this bug:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=122304
I found something that could be a problem on x86_64 for python:
try this on x86_64:
foo='www.redhat.com'
foo.encode("idna")
depending on your encoding that's set you'll get either the correct:
www.redhat.com
or
www.redhat..om
we look on line 6 of
/usr/lib64/python2.3/encodings/idna.py at:
dots = re.compile(u"[\u002E\u3002\uFF0E\uFF61]")
that works great on x86 - so a little further down on line 153 you see:
labels = dots.split(input)
the input in this question is like the url above.
so try this bit of code on your own x86_64 python 2.3.3 system:
import re
dots = re.compile(u"[\u002E\u3002\uFF0E\uFF61]")
foo = 'www.redhat.com'
labels = dots.split(foo)
print labels
you'll find it is:
Hi,
Troubleshooting this bug:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=122304
I found something that could be a problem on x86_64 for python:
try this on x86_64:
foo='www.redhat.com'
foo.encode("idna")
depending on your encoding that's set you'll get either the correct:
www.redhat.com
or
www.redhat..om
we look on line 6 of
/usr/lib64/python2.3/encodings/idna.py at:
dots = re.compile(u"[\u002E\u3002\uFF0E\uFF61]")
that works great on x86 - so a little further down on line 153 you see:
labels = dots.split(input)
the input in this question is like the url above.
so try this bit of code on your own x86_64 python 2.3.3 system:
import re
dots = re.compile(u"[\u002E\u3002\uFF0E\uFF61]")
foo = 'www.redhat.com'
labels = dots.split(foo)
print labels
you'll find it is:
['www.redhat.', 'om']
while on x86 it is:
['www', 'redhat', 'com']
which is correct - 3 label sections from rfc 3490
so I went looking for the problem a little bit and found in _sre.c
#if defined(MS_WIN64) || defined(__LP64__) || defined(_LP64)
/* require smaller recursion limit for a number of 64-bit platforms:
* Win64 (MS_WIN64), Linux64 (__LP64__), Monterey (64-bit AIX) (_LP64)
*/
/* FIXME: maybe the limit should be 40000 / sizeof(void*) ? */
#define USE_RECURSION_LIMIT 7500
I'm wondering if that FIXME is accurate - I've not tested the change yet
but it seems like a potential problem for regexes like this - or more to
the point anything using the HTTPHandler in python.
Can someone more experienced at python _sre internals take a look at
this?
This will most likely effect up2date, yum, and many network-interacting
python applications using http.
Thanks
-sv