[Fedora-i18n-bugs] [Bug 537753] glibc: wcscoll/strcoll needs to check if the char is out of the locale collation.

bugzilla at redhat.com bugzilla at redhat.com
Fri Nov 20 09:21:32 UTC 2009


Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug.


https://bugzilla.redhat.com/show_bug.cgi?id=537753





--- Comment #12 from fujiwara <tfujiwar at redhat.com>  2009-11-20 04:21:31 EDT ---
Created an attachment (id=372439)
 --> (https://bugzilla.redhat.com/attachment.cgi?id=372439)
Patch for glibc/string/strcoll_l.c, glibc/string/strxfrm_l.c,
glibc/wcsmbs/wcscoll_l.c, glibc/wcsmbs/wcsxfrm_l.c

This patch is another solution.

if a char is not defined in glib/localedata/locales/ja_JP,
__collidx_table_lookup() returns 0.
If we could use __collseq_table_lookup() instead, it can return the max value
for the undefined char.
But I think we need to use __collidx_table_lookup() for wcscoll() since the
size of locale collation is unclear.

But the problem is when we receive 0, U+0 is actually defined in
glib/localedata/locales/ja_JP LC_COLLATION and the result is undefined chars
are always collated before defined chars in wcscoll().

E.g. If I think a is ASCII char, b is a Japanese char, c is a Korean char, the
collation would be c < a < b on ja_JP.UTF-8 since U+0 is defined in ja_JP file.

But if you look at ja_JP file, the file also defines "UNDEFINED" in LC_COLLATE.
UNDEFINED char should be collated at last.
But the word "UNDEFINED" seems to be used in localedef program only.
If we run wcscoll(), we don't know which index of weight[] is the UNDEFINED
value.
So my solution is, if wcscoll() receives 0 from findidx(), wcscoll() use
USTRING_MAX instead of weight[].

If I see zh_CN file, U+0 is not defined. The undefined chars are always
collated before defined chars in wcscoll() because the following line effects
the result in wcscoll():

   result = seq1len == 0 ? -1 : 1;

seq1len is 0 but the string is not shorter than the other in this case.
The string is not defined in the locale collation in this case actually.

I'd modified this part.

Probably it's good for wcscoll() to follow the 'UNDEFINED' keyword in the
locale collation file and I think 'UNDEFINED' should be put in the last of the
LC_COLLATE.

-- 
Configure bugmail: https://bugzilla.redhat.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.




More information about the i18n-bugs mailing list