Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug.
https://bugzilla.redhat.com/show_bug.cgi?id=668282
--- Comment #43 from Nils Philippsen <nphilipp(a)redhat.com> 2011-10-25 10:58:14 EDT
---
(In reply to comment #42)
I still don't understand why we need to convert strings which are
already UTF-8
byte strings to Unicode first, only to convert them back to UTF-8 right
afterwards. (For strings which are in some non-UTF-8 byte encoding,
txt.decode('utf-8', errors='replace') won't work anyway.)
At the beginning of the function, we don't know whether an str object is
encoded in UTF-8 or something else. Attempting to decode it as UTF-8 and
replacing characters which aren't with those funny question marks is really the
best thing we can do at this point with the information we have.
I'd rather not "optimize" that function as you seem to suggest -- if
we'd
simply return the same str object it might be still encoded in something else
than UTF-8, decoding it to unicode (with errors='replace') and re-encoding it
again ensures that regardless what we give back, it's UTF-8.
--
Configure bugmail:
https://bugzilla.redhat.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.