Re: [A-Z]* oddities

2 Feb 2004


      On Mon, Feb 02, 2004 at 07:04:22PM +0000, Luciano Miguel Ferreira Rocha wrote:
...
On Mon, Feb 02, 2004 at 01:15:01PM -0500, Behdad Esfahbod wrote:
...
This is totally expected :).
For what you want try LANG=C echo [A-Z]* ...
In LANG=en_US.UTF-8 which is the default, both 'A' and 'a' sort
before 'B' and 'b'.
But 'A' is *after* 'a'?
And I thought utf-8 was supposed to be compatible with ascii...
Well, if that's the standard, then it isn't broken...
UTF-8 is an encoding scheme that allows multi-byte characters to be packed
in fewer characters. ASCII is compatible with UNICODE, not UTF-8. UTF-8 can
be used to encode all ASCII characters (except 0x00) without using multiple
bytes.
LANG really sets the locale information, which is a different beast,
although a related beast. The locale information defines characters
classes, and the relationship between the characters. For example,
should e with an accent be outside the range a-z, just because its
UNICODE value is outside the range 97-122? With LANG=C, e with an
accent is outside the range a-z, just as people have been forced to
live with for years. With LANG=en_US.UTF-8, e with an accent may be
defined within the range a-z. As an added benefit, the sort order
is more pleasing to an English user - case doesn't matter. This is
far better than what we lived with in the previous decades, where
one had to look in both places if one didn't know what case the
filename was specified with...
I've never played with this stuff in-depth, so my terminology may be
inaccurate.
Cheers,
mark
-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|/| |_| |_| |/    |_     |/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [A-Z]* oddities