Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug report.
Summary: glibc or perl incorrect locale LC_CTYPE data
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166478
------- Additional Comments From jvdias(a)redhat.com 2005-11-01 20:29 EST -------
One has to carefully analyse the perl man-pages to find out what is going on here.
I don't particularly agree with the way the upstream perl maintainers have done
this, but this is not a bug - it is the way perl is meant to behave.
The point is that /\w/ matches any ASCII word char, and /\W/ matches any ASCII
non-word char.
To match a UTF-8 word character, you have to use \p{IsWord} .
The \w wildcard is a synonym for the POSIX character class [:word:].
So this version of your program :
---
#!/usr/bin/perl -w -C
use strict;
use utf8;
use locale;
use Encode qw(decode);
my $str = decode('utf-8', "\xc3\x81\xc4\x8c");
# U+00C1 "A with acute", U+010C "C with caron" (encoded in
UTF-8)
print 'Is UTF-8:',utf8::is_utf8($str),
' is word:', $str =~ /^\w+$/,'
is UTF-8 word:',
$str =~ /^\p{IsWord}+$/, ' str:',$str, "\n";
--
Configure bugmail:
https://bugzilla.redhat.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.