Lost in translation, part II: Lost in orthographies

Sun Feb 1 13:34:12 UTC 2009

On Sun, Feb 1, 2009 at 2:40 AM, Nicolas Mailhot
<nicolas.mailhot at gmail.com> wrote:
> I'd be very careful to redefine tagalog for example. The tagalog script
> is definitely not latin and has a specific unicode block, iso 15924
> script tag, and specific supporting fonts
>
> http://www.unicode.org/iso15924/iso15924-codes.html
> http://fedoraproject.org/wiki/Pmorrow_Tagalog_Doctrina_1593_fonts

No worries. I know all that. According to the Unicode Standard, that
script has not been much used for Tagalog since mid-1700s. The Tagalog
script in Unicode is an archaic script only, mostly for scholarly use:

http://bugs.freedesktop.org/show_bug.cgi?id=19846

> IMHO we've been conflating too many different notions (country, region,
> language, script) in a few short locale tags and what you see is just
> the system breaking appart for non-mainstream languages/scripts.

That's very true. I've been talking with Behdad to try to do a bit or
redesigning, and we had a few ideas, like this bug, trying to get BCP
47 into fontconfig and somehow solving the glibc locale naming
problem:

http://bugs.freedesktop.org/show_bug.cgi?id=19869

I would appreciate your feedback. (Of course, we can't fix this in
this release.)

> This is something for Behdad. I hear he's cutting a new fontconfig
> version right now, you may want to catch him before he's done and the
> projects wents back to its usual 6-8 months sleepiness ;).

I'm working with him. Actually he insisted that I rush it :)

> all the different parts of language support
> are done by different groups with different agendas and different time
> scales so of course initial support is going to be incomplete. Expecting
> one contributor to provide all the parts in one go is illusory.
>
> What you want is to help the different contributors to a language group
> to:
> — identify other bits are missing
> — ping in the right place so they get added
>
> Otherwise everyone will just wait for everyone else.

I'm sorry Nicolas, but I don't understand. Would you consider
rewording? Are you saying that I should not have been trying to create
the missing orth files myself? Or fix the buggy ones?

> Some people will claim that « full » language support means a system
> dictionnary and thesaurus BTW, completeness is a slipery slope.

I agree.

But we are talking about very basic language support here. If we
cannot bring up and show a language in a proper font, we cannot claim
to support it. Our layout system would not know what which font to
use, so it will be just DejaVu first, instead of a font that actually
supports the full glyph set for a language. Since we're also doing the
automatic language detection in RPM thing, wouldn't that be based on
orth files too? How can we claim we support a language in Fedora if
none of our font rpms would report that they support it?

Also, the orth files are very simple, much easier to create than glibc
locale files for example.

Thanks a lot for all your time, :)
Roozbeh