On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings])

Behdad Esfahbod behdad at behdad.org
Thu Dec 20 18:55:10 UTC 2007


On Thu, 2007-12-20 at 04:22 +0800, Abel Cheung wrote:
> Hi,

Hi,

> My reply is followed below, inline...

So is mine.

> On Dec 17, 2007 7:22 AM, Behdad Esfahbod <behdad at behdad.org> wrote:
> [..........tons of quasi-maths ...........]
> >
> > > Secondly, you said that "contextual font selection" is a "cool"
> > > feature, I am wondering what languages are beneficial from this feature?
> > > (I believe there are, but just want to know).
> >
> > Pretty much every non-Latin script.  In some situations even the Latin
> > script.
> >
> > Take the Unicode character U+002E FULL STOP, aka ASCII period.  It is
> > used in more than just Latin, in Arabic for example, in Hebrew, possibly
> > in Indic and many other scripts.  If it was not grouped with neighboring
> > characters for font selection purposes all those people would have got
> > their Arabic/Hebrew/... text assigned an Arabic/Hebrew/... font while
> > the periods in at the end of sentences assigned a different (default
> > Latin for example) font.
> >
> > The same happens for Latin under a document tagged as non-Latin.  It's
> > not a luxury thing.  It's just how things are supposed to work.
> 
> That means, font change depending on context is actually preferrred in
> some fonts or some langauges, is it? If that's true, then this would be
> a per-language preference, some want it, some don't.
> 
> So does pango support toggling this behavior yet? (I guess not?)

What do you exactly mean by "this behavior"?  Which behavior?  Show me
the source code line.  I'm getting tired of all the hand waving.



> > > > The main font issue though, is that Chinese (Simplified, Traditional),
> > > > Korean, and Japanese share some Unicode code points, but they require
> > > > slightly different renderings.  Now if you don't tell Pango which
> > > > version is preferred, how can it know which font to choose?  It
> > > > explicitly doesn't prefer any one over the others to avoid cultural
> > > > problems.
> > > >
> > > > The symptoms of this problem are "multiple fonts used in the same line".
> > > > Solution is: Either run under a CJK locale, or give hints to Pango about
> > > > your preferred CJK locale using the env var PANGO_LANGUAGE.
> > > >
> > > > Note that theoretically Pango can do text analysis to come up with a
> > > > best guess, but doing that would then introduce another bug with
> > > > symptoms "changes font when typing a few characters on the same line".
> 
> Let me set the record straight here. Most people seeing this problem is not
> exactly complaining about the font changing, but about the font changing TO
> SOME BAD LATIN GLYPH THEY DON'T LIKE. It is understood that font changing is
> almost not avoidable, since typing just a few characters may not provide enough
> information on what kind of font should be picked, and typing more
> gives more info.
> So far it is determined per sentence, or per what?

Believe me, I know that.  And I understand it if you don't WRITE IN CAPS
too.  Does it help if I say THEN GO REMOVE THE CRAPPY FONT?


[...]
> Sadly this way absolutely won't satisfy everybody -- one party only. And in
> particular, the font picked is determined per glyph, causing a sentence to be
> intermixed by multiple CJK fonts as described.

This is totally wrong.  Pango first tags each piece of text with a
language, then asks fontconfig to sort fonts for that language, then
uses the sorted list to assign font to each character.  That is, if you
mark your text zh_CN (by either running under that locale, or setting
PANGO_LANGUAGE to that, or otherwise marking it), and have a suitable
font for that language and if you have crappy fonts for it, have
fontconfig configured to prefer the good one, then Pango chooses the
right font.  Now all the "bugs" you show me are in all the steps
mentioned except for what Pango is doing.


> What if the font determination is not chopped glyph by glyph, but also
> determined heuristically with context?

Pango already does that.  That's exactly what you call "contextual"
something above and condemn.


> If my guess is correct this would work most of the
> cases, even among language variants (think zh_CN and zh_TW).

No.   You need to go back and read and understand my "tons of
quasi-maths".


> > > > Another symptom, "digits change font after typing character" is in fact
> > > > a very cool Pango feature, just badmouthed by the above problem.  Fix
> > > > the problem.
> 
> When a solution is not universal enough to be accepted by everybody,
> and caused more trouble then its worth for specific people, it would be
> badmouthed no matter what. Or not? I don't know the rule here.

You officially don't know what you are talking about.


behdad


> Abel
> 
> 
> > > >
> > > >
> > > >
> > > >> As you see from the bug lists, this problem has existed for many
> > > >> years, and I am pretty sure that it will come back again and again, as
> > > >> long as the expected rendering is not achieved. If the current pango
> > > >> formatting logic is not sufficient to handle the CJK preferences as
> > > >> said above, I think to refine the logic to take it into consideration
> > > >> is better than stick with a fixed but incomplete logic.
> > > >>
> > > >
> > > > I consider patches improving Pango's font selection algorithm, but none
> > > > that I've seen so far had been an improvement (from my point of view).
> > > > If it has words like CJK or "special case", I'm most probably not
> > > > interested.  Of the bugs you listed, only the one I opened myself is
> > > > valid IMO.  The rest is just left open because no matter how many times
> > > > I close them, they will be reopened... Oh well.
> > > >
> > > >
> > > >
> > > >> please let me know your thoughts and reasoning on whether this is
> > > >> feasible or not, if yes, where to get start.
> > > >>
> > > >
> > > > Does the above make sense?  I understand that it's easier to apply a two
> > > > line patch to Pango instead of doing what of the things I listed above,
> > > > but that just doesn't fit in the design, and it introduces other
> > > > problems you don't see right now.
> > > >
> > > >
> > > >
> > > >> thank you for paying attention to this issue.
> > > >>
> > > >> Qianqian
> > > >>
> > > >
> > > > Regards,
> > > >
> > > > behdad
> > > >
> > > >
> > > >
> > > >> ===============================================================
> > > >> Bug 321113 - Wrong glyph subsituation algorithm for digital characters
> > > >> and punctuations
> > > >> http://bugzilla.gnome.org/show_bug.cgi?id=321113
> > > >>
> > > >>
> > > >> Bug 345072 - changes font when typing different scripts on the same
> > > >> line
> > > >> http://bugzilla.gnome.org/show_bug.cgi?id=345072
> > > >>
> > > >>
> > > >> Bug 345386 - Language and direction propagation in and between
> > > >> PangoLayouts
> > > >> http://bugzilla.gnome.org/show_bug.cgi?id=345386  (opened by yourself)
> > > >> https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679
> > > >>
> > > >>
> > > >> Bug 481210 - [All lang] [firefox] - Face of the number is changing
> > > >> when enter number + Char, in any Locale
> > > >> http://bugzilla.gnome.org/show_bug.cgi?id=481210
> > > >>
> > > >>
> > > >> Bug 481188 - ascii text space too narrow for Chinese encodings
> > > >> http://bugzilla.gnome.org/show_bug.cgi?id=481188
> > > >>
> > > >>
> > > >> Bugzilla Bug 129541: changes font when typing different scripts on the
> > > >> same line
> > > >> https://bugzilla.redhat.com/show_bug.cgi?id=129541
> > > >>
> > > >>
> > > >> Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango
> > > >> https://bugzilla.redhat.com/show_bug.cgi?id=131218
> > > >>
> > > >>
> > > >> Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox
> > > >> give bad eol rendering and cursor placement
> > > >> https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens
> > > >> Petersen)
> > > >>
> > > >>
> > > >> https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link)
> > > >>
> > > >>
> > > >> Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is
> > > >> changing when enter number + Char, in any Locale
> > > >> https://bugzilla.redhat.com/show_bug.cgi?id=228804
> > > >>
> > > >>
> > > >> Bugzilla Bug 221361: [pango] ascii text space and punctuation is
> > > >> narrow for CJK
> > > >> https://bugzilla.redhat.com/show_bug.cgi?id=221361
> > > >>
> > > >>
> > > >> Bug 379125 - chinese punctuations after english letters are wrongly
> > > >> displayed
> > > >> https://bugzilla.mozilla.org/show_bug.cgi?id=379125
> > > >> https://bugzilla.mozilla.org/attachment.cgi?id=263185
> > > >> ===============================================================
> > > >>
> > > >
> > > >
> > >
> > --
> > behdad
> > http://behdad.org/
> >
> > ...very few phenomena can pull someone out of Deep Hack Mode, with two
> > noted exceptions: being struck by lightning, or worse, your *computer*
> > being struck by lightning.  -- Matt Welsh
> >
> > _______________________________________________
> > gtk-i18n-list mailing list
> > gtk-i18n-list at gnome.org
> > http://mail.gnome.org/mailman/listinfo/gtk-i18n-list
> >
> 
> 
> 
-- 
behdad
http://behdad.org/

...very few phenomena can pull someone out of Deep Hack Mode, with two
noted exceptions: being struck by lightning, or worse, your *computer*
being struck by lightning.  -- Matt Welsh




More information about the fonts mailing list