[A-Z]* oddities

Mark Mielke mark at mark.mielke.cc
Mon Feb 2 20:23:05 UTC 2004


On Mon, Feb 02, 2004 at 07:04:22PM +0000, Luciano Miguel Ferreira Rocha wrote:
> On Mon, Feb 02, 2004 at 01:15:01PM -0500, Behdad Esfahbod wrote:
> > This is totally expected :).
> > For what you want try LANG=C echo [A-Z]* ...
> > In LANG=en_US.UTF-8 which is the default, both 'A' and 'a' sort
> > before 'B' and 'b'.
> But 'A' is *after* 'a'?
> And I thought utf-8 was supposed to be compatible with ascii...
> Well, if that's the standard, then it isn't broken...

UTF-8 is an encoding scheme that allows multi-byte characters to be packed
in fewer characters. ASCII is compatible with UNICODE, not UTF-8. UTF-8 can
be used to encode all ASCII characters (except 0x00) without using multiple
bytes.

LANG really sets the locale information, which is a different beast,
although a related beast. The locale information defines characters
classes, and the relationship between the characters. For example,
should e with an accent be outside the range a-z, just because its
UNICODE value is outside the range 97-122? With LANG=C, e with an
accent is outside the range a-z, just as people have been forced to
live with for years. With LANG=en_US.UTF-8, e with an accent may be
defined within the range a-z. As an added benefit, the sort order
is more pleasing to an English user - case doesn't matter. This is
far better than what we lived with in the previous decades, where
one had to look in both places if one didn't know what case the
filename was specified with...

I've never played with this stuff in-depth, so my terminology may be
inaccurate.

Cheers,
mark

-- 
mark at mielke.cc/markm at ncf.ca/markm at nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/





More information about the devel mailing list