hi
I respect your philosophy of structuring the style propagations based on the context and script natures. I think it is indeed an elegant solution to use a COMMON charset to represent the language-independent symbols and render them based on the context.
IMHO, the confusion comes from the fact that "language-neutrality" and "local-language dependent" are not distinguished for the COMMON scripts. In another word, the charset of COMMON is a mixture of the characters that are essentially not tied to any specific language (such as digits), and those are re-defined by local languages (such as some punctuations, geometric shapes U2500-U25FF). For the former case, I think they should not be influenced by local language preferences, rather, using system fall-back setup (likely Latin-preferred) should be the best solution; for the later case, using local font preference is the best, as in your current COMMON charset handling.
In short, I think the current COMMON set should be further refined into a NEUTRAL and a LOCAL_DEPENDENT char sets, and use system fall-back configuation for NEUTRAL set, and use local-language preferences for the LOCAL_DEPENDENT set. Specifically, for digits, they are language neutral and should be rendered by system fall-back settings rather than a local language settings.
Qianqian
Behdad Esfahbod wrote:
On Mon, 2007-12-03 at 21:58 -0500, Qianqian Fang wrote:
hi Behdad
Hi,
you may well be right and the behavior of pango is not logically flawed. Perhaps this problem should be filed as a feature-request rather than a bug.
I'm not stuck at semantic issues like feature-request vs bug. When I say it's technically infeasible, I mean it.
From Chinese user perspective, Latin scripts and the Common scripts are both non-Hanzi or non-CJK characters, therefore, they are expecting a similar look-n-feel when rendering these characters. For other languages, I guess they more or less share the same view: numbers and basic Latin characters (or Basic ASCII, or keyboard characters) are the most frequently used, non-local-language dependent symbols. As long as their local language does not re-define these symbols, they are expected to be rendered with similar styles.
Let me repeat what's happening again: You are setting a Chinese locale, so when Pango see digits, it assumes that you want to use those digits with Chinese text, and you have provided a Chinese font that has glyphs for those digits, so it believes it's found the perfect font for them (your preferred font indeed) and uses it. If those digits are not desired, remove them from the font.
I don't know the exact definition of PANGO_SCRIPT_COMMON and PANGO_SCRIPT_LATIN, but I think it is more natural to render the numbers using a Latin font rather than a Chinese font, as numbers and Latins are much closer.
Then fix your font.
Huang Peng provided a patch to get the commonly expected behavior for this situation, if it can be implemented, or under the condition of Chinese locales, that would be a great help. I've seen this report many times on Mandriva, Debian, Redhat's bugzilla and almost all Chinese Linux forums.
That's not going to happen. Pango's core has nothing language or script specific hardcoded in it except for the data that is computer-generated from the Unicode Character Database. In Unicode, ASCII digits are marked script Common. There is a very small part of the issue you are seeing that can be improved in Pango:
http://bugzilla.gnome.org/show_bug.cgi?id=345386
but other than that, the behavior looks very reasonable to me. If you can think of an explanation of the behavior you want, without using "change character class of digits" and "special-case Chinese", I'm interested to hear that.
There are a few ways to fix your problem:
- Remove Latin and ASCII digits from your font. Why is it there if
it's not desired? Nicolas suggested that fontconfig adds support for conditional blacklisting of individual blocks/glyphs in a font. That would help too, but it's not in fontconfig yet.
- If you were doing your font in an OpenType container, you could
split Latin and Chinese parts into two different fonts stuffed into a single container and having the same name. Then Pango will not see your Chinese font having ASCII digits and not use them.
But at the end, it all comes down to real or hacky ways of removing those glyphs from the font.
Back to the original topic of this thread, how do you think the fontconfig file in my last email? I have heard complains at some Chinese forums about font changes due to removing the original fontconfig file. Hope I can get something to commit to cease their complains.
No idea.
Qianqian