https://bugzilla.redhat.com/show_bug.cgi?id=1436077
Bug ID: 1436077 Summary: Some emoji which should render as one character with the “Noto Color Emoji” render as several characters Product: Fedora Version: 25 Component: pango Assignee: tagoh@redhat.com Reporter: mfabian@redhat.com QA Contact: extras-qa@fedoraproject.org CC: fonts-bugs@lists.fedoraproject.org, i18n-bugs@lists.fedoraproject.org, tagoh@redhat.com
Created attachment 1266551 --> https://bugzilla.redhat.com/attachment.cgi?id=1266551&action=edit test-text.txt
Test text attached. Showing the test text like this
pango-view --font='Noto Color Emoji 48' ~/test-text.txt
on Fedora 25 shows the emoji not as a single character but as two.
On Fedora 24 (and openSUSE Leap 42.2 and Ubuntu 16.04) this works.
The version of “Noto Color Emoji” used in all these tests is the latest one from https://www.google.com/get/noto/
which has this file size:
-rw-r-----. 1 mfabian mfabian 5987004 10月 20 11:46 NotoColorEmoji.ttf
The problem is the same when using the “Emoji One” font from: https://github.com/Ranks/emojione/blob/master/assets/fonts/emojione-android....
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
Mike FABIAN mfabian@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |mfabian@redhat.com
--- Comment #1 from Mike FABIAN mfabian@redhat.com --- Created attachment 1266552 --> https://bugzilla.redhat.com/attachment.cgi?id=1266552&action=edit pango-view-fedora-25.png
Broken display of the test-text.txt by
pango-view --font='Noto Color Emoji 48' ~/test-text.txt
on Fedora 25.
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #2 from Mike FABIAN mfabian@redhat.com --- Created attachment 1266553 --> https://bugzilla.redhat.com/attachment.cgi?id=1266553&action=edit hb-view-fedora-25-correct.png
Correct display of the same test text using:
hb-view --font-size=24 --text-file=/home/mfabian/test-text.txt --output-file=/tmp/hb.png --output-format=png /usr/share/fonts/google-noto-emoji/NotoColorEmoji.ttf
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #3 from Mike FABIAN mfabian@redhat.com --- Created attachment 1266555 --> https://bugzilla.redhat.com/attachment.cgi?id=1266555&action=edit pango-view-fedora-24.png
Correct display of the same test-text.txt on Fedora 24 using
pango-view --font='Noto Color Emoji 48' ~/test-text.txt
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #4 from Mike FABIAN mfabian@redhat.com --- hb-view --font-size=24 --text-file=/home/mfabian/test-text.txt --output-file=/tmp/hb.png --output-format=png /usr/share/fonts/google-noto-emoji/NotoColorEmoji.ttf
also work correctly on Fedora 24.
So it seems the problem is not in harfbuzz.
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #5 from Mike FABIAN mfabian@redhat.com --- OK, the problem is caused by the width changes in glibc:
Fedora 24:
(gdb) p g_unichar_iswide(0x1f469) $4 = 0 (gdb) p g_unichar_iswide(0x200d) $5 = 0 (gdb) p g_unichar_iswide(0x2708) $6 = 0 (gdb)
Fedora 25:
(gdb) p g_unichar_iswide(0x1f469) $39 = 1 (gdb) p g_unichar_iswide(0x200d) $40 = 0 (gdb) p g_unichar_iswide(0x2708) $41 = 0 (gdb)
👩 U+1F469 WOMAN U+200D ZERO WIDTH JOINER ✈ U+2708 AIRPLANE
In Fedora 24, all these 3 characters are “narrow”. In Fedora 25, 👩 is “wide” (because of the Unicode 9.0.0 update of glibc). Pango breaks the run when the character width changes.
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #6 from Mike FABIAN mfabian@redhat.com --- Here is the function in pango-context.c which causes the break of the run:
/* g_unichar_iswide() uses EastAsianWidth, which is broken. * We should switch to using VerticalTextLayout: * http://www.unicode.org/reports/tr50/#Data50 * * In the mean time, fixup Hangul jamo to be all wide so we * don't break run in the middle. The EastAsianWidth has * 'W' for L-jamo, and 'N' for T and V jamo! * * https://bugzilla.gnome.org/show_bug.cgi?id=705727 */ static gboolean width_iter_iswide (gunichar ch) { if ((0x1100u <= ch && ch <= 0x11FFu) || (0xA960u <= ch && ch <= 0xA97Cu) || (0xD7B0u <= ch && ch <= 0xD7FBu)) return TRUE;
return g_unichar_iswide (ch); }
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #7 from Mike FABIAN mfabian@redhat.com --- Maybe I should rewrite the function width_iter_next(PangoWidthIter* iter) not to break emoji-zwj sequences:
https://git.gnome.org/browse/pango/tree/pango/pango-context.c#n866
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #8 from Akira TAGOH tagoh@redhat.com --- maybe good to file a bug to the upstream bugzilla too.
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
Mike FABIAN mfabian@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Some emoji which should |Some emoji which should |render as one character |render as one character |with the “Noto Color Emoji” |with the “Noto Color Emoji” |render as several |font render as several |characters |characters
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
Mike FABIAN mfabian@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugzilla.gnome.org/ | |show_bug.cgi?id=780669
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #9 from Mike FABIAN mfabian@redhat.com --- Created attachment 1267272 --> https://bugzilla.redhat.com/attachment.cgi?id=1267272&action=edit 0001-Bug-780669-Do-not-start-a-new-run-at-a-zero-width-jo.patch
Patch to fix the problem
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #10 from Mike FABIAN mfabian@redhat.com --- Created attachment 1267444 --> https://bugzilla.redhat.com/attachment.cgi?id=1267444&action=edit Updated patch
I updated my patch a bit, pango should not break a run at skin tone modifiers either, otherwise pango would break between U+270C VICTORY HAND (which is single width) and U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-5 which is double width.
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
Akira TAGOH tagoh@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |GNOME Desktop 780669
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #11 from fujiwara tfujiwar@redhat.com --- Looks good. E.g. A + 0x200d + B I think it works only when the width of A and width of B are same.
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #12 from fujiwara tfujiwar@redhat.com --- (In reply to fujiwara from comment #11)
Looks good. E.g. A + 0x200d + B I think it works only when the width of A and width of B are same.
Sorry, correction. width of A >= width of B
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #13 from Mike FABIAN mfabian@redhat.com --- I think it works also when the width of A and B are the same and also when width of A < width of B.
Because the patch resets the iterator remembering the current width to the width of the next character after the zwj or skin tone modifier:
iter->wide = width_iter_iswide (ch)
So Pango would only break if yet another width change is encountered and that new width change is *not* at a zwj or skin tone modifier.
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #14 from Mike FABIAN mfabian@redhat.com --- Maybe it is even necessary to add 0x1f3f4 to the exceptions where a break at a width change is prevented:
diff --git a/pango/pango-context.c b/pango/pango-context.c index f0cea73..2739b2d 100644 --- a/pango/pango-context.c +++ b/pango/pango-context.c @@ -876,7 +876,18 @@ width_iter_next(PangoWidthIter* iter) while (iter->end < iter->text_end) { gunichar ch = g_utf8_get_char (iter->end); - if (width_iter_iswide (ch) != iter->wide) + if (ch == 0x200d || ch == 0x1f3f4 || (ch >= 0x1f3fb && ch <= 0x1f3ff)) + { + /* do not break at a zero-width-joiner or skin tone modifiers*/ + iter->end = g_utf8_next_char (iter->end); + if (iter-> end < iter->text_end) + { + ch = g_utf8_get_char (iter->end); + iter->wide = width_iter_iswide (ch); + } + continue; + } + else if (width_iter_iswide (ch) != iter->wide) break; iter->end = g_utf8_next_char (iter->end); }
Because flag sequences like this one:
1F3F4 E0067 E0062 E0073 E0063 E0074 E007F; Emoji_Tag_Sequence; Scotland
start with U+1F3F4 WAVING BLACK FLAG, which is a wide character and the rest of the sequence is narrow characters.
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #15 from fujiwara tfujiwar@redhat.com --- (In reply to Mike FABIAN from comment #13)
I think it works also when the width of A and B are the same and also when width of A < width of B.
Because the patch resets the iterator remembering the current width to the width of the next character after the zwj or skin tone modifier:
iter->wide = width_iter_iswide (ch)
So Pango would only break if yet another width change is encountered and that new width change is *not* at a zwj or skin tone modifier.
OK, I think iter->wide is used the return value and mean what value is applied to iter->wide finally and wide is the member of PangoWidthIter instead of a local variable. But you believe it's used in locally only so it's ok.
Probably I think width_iter_next() also needs to have the exception of 0xfe00 - 0xfe0f.
And I'm thinking another patch:
--- pango-1.40.4/pango/pango-context.c.orig 2017-03-30 19:03:28.081488378 +0900 +++ pango-1.40.4/pango/pango-context.c 2017-03-31 18:@@ -1395,11 +1407,13 @@ itemize_state_process_run (ItemizeState * characters if they don't, HarfBuzz will compatibility-decompose them * to ASCII space... * See bugs #355987 and #701652. + * U+0023 FE0F 20E3 has G_UNICODE_NON_SPACING_MARK */ type = g_unichar_type (wc); if (G_UNLIKELY (type == G_UNICODE_CONTROL || type == G_UNICODE_FORMAT || type == G_UNICODE_SURROGATE || + type == G_UNICODE_NON_SPACING_MARK || (type == G_UNICODE_SPACE_SEPARATOR && wc != 0x1680u /* OGHAM SPACE MARK */))) { shape_engine = NULL; 58:21.526336181 +0900
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #16 from Mike FABIAN mfabian@redhat.com --- (In reply to fujiwara from comment #15)
Probably I think width_iter_next() also needs to have the exception of 0xfe00 - 0xfe0f.
Yes.
Also, my patch currently disallows break before *and* after the special characters. But for 0xfe00 -0xfe0f and 0x1f3fb - 0x1f3ff a break after the special character might be OK. And for the flag sequence starter 0x1f3f4, a break before that character might be OK.
I’ll improve that patch a bit.
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #17 from fujiwara tfujiwar@redhat.com --- Thinking this again, maybe 0xfe0e and 0xfe0f are enough instead of 0xfe00 - 0xfe0f.
I'd liked to ask pango people about the following suggestion. --- pango-1.40.4/pango/pango-context.c.orig 2017-03-30 19:03:28.081488378 +0900 +++ pango-1.40.4/pango/pango-context.c 2017-04-04 13:35:18.446935865 +0900 @@ -1404,6 +1404,15 @@ itemize_state_process_run (ItemizeState { shape_engine = NULL; font = NULL; + } + /* If an emoji font does not include emoji presentation, let + * harfbuzz handle the characters. + * http://www.unicode.org/emoji/charts/emoji-variants.html + */ + else if (G_UNLIKELY (wc == 0xfe0fu || wc == 0xfe0eu)) + { + shape_engine = NULL; + font = NULL; } else {
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #18 from Mike FABIAN mfabian@redhat.com --- (In reply to fujiwara from comment #17)
Thinking this again, maybe 0xfe0e and 0xfe0f are enough instead of 0xfe00 - 0xfe0f.
I'd liked to ask pango people about the following suggestion. --- pango-1.40.4/pango/pango-context.c.orig 2017-03-30 19:03:28.081488378 +0900 +++ pango-1.40.4/pango/pango-context.c 2017-04-04 13:35:18.446935865 +0900 @@ -1404,6 +1404,15 @@ itemize_state_process_run (ItemizeState { shape_engine = NULL; font = NULL;
}
/* If an emoji font does not include emoji presentation, let
* harfbuzz handle the characters.
* http://www.unicode.org/emoji/charts/emoji-variants.html
*/
else if (G_UNLIKELY (wc == 0xfe0fu || wc == 0xfe0eu))
{
shape_engine = NULL;
font = NULL; } else {
Yes, this works, it makes the fully-qualified emoji sequences work for me!
For example for this fully-qualified sequcence for the male golfer
🏌️♂️ U+1F3CC U+FE0F U+200D U+2642 U+FE0F
works for me (= renders as a single glyph) only when using the above patch, without that patch is is rendered as several glyphs.
Without that patch, only the non-fully-qualified sequence
🏌♂ U+1F3CC U+200D U+2642
works.
As using the fully-qualified sequences is recommended I think that patch is needed.
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #19 from Mike FABIAN mfabian@redhat.com --- (In reply to Mike FABIAN from comment #16)
(In reply to fujiwara from comment #15)
Probably I think width_iter_next() also needs to have the exception of 0xfe00 - 0xfe0f.
Yes.
Also, my patch currently disallows break before *and* after the special characters. But for 0xfe00 -0xfe0f and 0x1f3fb - 0x1f3ff a break after the special character might be OK. And for the flag sequence starter 0x1f3f4, a break before that character might be OK.
I’ll improve that patch a bit.
Peng Wu already did that, his patch is here:
https://bugzilla.gnome.org/show_bug.cgi?id=780669#c14
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
fujiwara tfujiwar@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugzilla.gnome.org/ | |show_bug.cgi?id=781123
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #20 from Fedora Update System updates@fedoraproject.org --- pango-1.40.7-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-79637b77e0
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #21 from Fedora Update System updates@fedoraproject.org --- pango-1.40.7-1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-f55b21a811
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
Fedora Update System updates@fedoraproject.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA
--- Comment #22 from Fedora Update System updates@fedoraproject.org --- pango-1.40.7-1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-f55b21a811
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #23 from fujiwara tfujiwar@redhat.com --- I think the upstream bug 780669 is not fixed yet?
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #24 from Fedora Update System updates@fedoraproject.org --- pango-1.40.7-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-79637b77e0
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
--- Comment #25 from Peng Wu pwu@redhat.com --- I tried pango-1.40.7-1.fc26, it seems the upstream bug 780669 is not fixed yet.
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
Fedora Update System updates@fedoraproject.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |CLOSED Fixed In Version| |pango-1.40.7-1.fc26 Resolution|--- |ERRATA Last Closed| |2017-07-22 23:57:29
--- Comment #26 from Fedora Update System updates@fedoraproject.org --- pango-1.40.7-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
https://bugzilla.redhat.com/show_bug.cgi?id=1436077
Fedora Update System updates@fedoraproject.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Fixed In Version|pango-1.40.7-1.fc26 |pango-1.40.7-1.fc26 | |pango-1.40.7-1.fc25
--- Comment #27 from Fedora Update System updates@fedoraproject.org --- pango-1.40.7-1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.
fonts-bugs@lists.fedoraproject.org