Hi,
I'm writing this email to the Fedora community to give you a heads-up about what we are going to do with the new change introduced to the groff 1.23.0 version [1]. Upstream of groff stopped mapping the special characters (like hyphens, tilde, ...) to the Basic Latin codes like it previously did.
This change was quite controversial in the Debian community as the upstreams/maintainers of the man-pages that use these characters in a "not correct way" wanted this mapping back in the system. It leads to a 1+ hour read-long email thread [2] with discussions in other threads as well.
The final conclusion in Debian was to revert this change and leave the old mapping in place as the maintainer of groff received a ton of emails [3] and didn't want to spend all of his capacity on this issue.
After reading through all of the emails, we've decided to align with the Debian decision and revert this change, thereby retaining the current mapping. The main reason for this was to eliminate a bunch of bugs reported to the groff/man-pages packages about broken manual pages. Our capacity also has its limits, and we need to spend our resources wisely, and this decision was based on that. The reproducer and the issue description were reported in Bugzilla [4], so please read through it, if you are interested.
[1] https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00001.html [2] https://lwn.net/Articles/947941/ [3] https://lwn.net/ml/debian-devel/ZS0aV4XyJH+O1o%2Fc@riva.ucam.org/ [4] https://bugzilla.redhat.com/show_bug.cgi?id=2224123
On Mon, Nov 06, 2023 at 12:36:19PM +0100, Lukas Javorsky wrote:
we've decided to align with the Debian decision and revert this change
I read the article on LWN and parts of the Debian discussion, and I think this is the right decision.
Zbyszek
On Mon, Nov 06, 2023 at 12:36:19PM +0100, Lukas Javorsky wrote:
Hi,
I'm writing this email to the Fedora community to give you a heads-up about what we are going to do with the new change introduced to the groff 1.23.0 version [1]. Upstream of groff stopped mapping the special characters (like hyphens, tilde, ...) to the Basic Latin codes like it previously did.
Yes .. '~' is being replaced by
U+02DC SMALL TILDE character
and the replacement looks terrible. If you have nbdkit-protect-filter(1) installed you can see this important meta-character becomes almost invisible, rendering the documentation and examples very confusing.
This change was quite controversial in the Debian community as the upstreams/ maintainers of the man-pages that use these characters in a "not correct way" wanted this mapping back in the system. It leads to a 1+ hour read-long email thread [2] with discussions in other threads as well.
The final conclusion in Debian was to revert this change and leave the old mapping in place as the maintainer of groff received a ton of emails [3] and didn't want to spend all of his capacity on this issue.
After reading through all of the emails, we've decided to align with the Debian decision and revert this change, thereby retaining the current mapping. The main reason for this was to eliminate a bunch of bugs reported to the groff/ man-pages packages about broken manual pages. Our capacity also has its limits, and we need to spend our resources wisely, and this decision was based on that. The reproducer and the issue description were reported in Bugzilla [4], so please read through it, if you are interested.
I read the LWN discussion.
Did we try to persuade upstream to revert the problem? But if they're not receptive then a downstream fix aligned with Debian looks right.
Thanks,
Rich.
[1] https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00001.html [2] https://lwn.net/Articles/947941/ [3] https://lwn.net/ml/debian-devel/ZS0aV4XyJH+O1o%2Fc@riva.ucam.org/ [4] https://bugzilla.redhat.com/show_bug.cgi?id=2224123
-- S pozdravom/ Best regards
Lukáš Javorský
Software Engineer, Core service - Databases
Red Hat
Purkyňova 115 (TPB-C)
612 00 Brno - Královo Pole
ljavorsk@redhat.com
[logo--200]
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Did we try to persuade upstream to revert the problem? But if they're not receptive then a downstream fix aligned with Debian looks right.
I didn't yet. However, if they decided to stop mapping these characters, I don't think they would be willing to revert it back. They mentioned the option to map it locally as I did in the PR. I assume they want to stop mapping it on their end and let distros decide if they want to do it themselves.
On Tue, Nov 7, 2023 at 9:25 PM Richard W.M. Jones rjones@redhat.com wrote:
On Mon, Nov 06, 2023 at 12:36:19PM +0100, Lukas Javorsky wrote:
Hi,
I'm writing this email to the Fedora community to give you a heads-up
about
what we are going to do with the new change introduced to the groff
1.23.0
version [1]. Upstream of groff stopped mapping the special characters (like hyphens,
tilde,
...) to the Basic Latin codes like it previously did.
Yes .. '~' is being replaced by
U+02DC SMALL TILDE character
and the replacement looks terrible. If you have nbdkit-protect-filter(1) installed you can see this important meta-character becomes almost invisible, rendering the documentation and examples very confusing.
This change was quite controversial in the Debian community as the
upstreams/
maintainers of the man-pages that use these characters in a "not correct way" wanted this mapping back in the system. It leads to a 1+ hour
read-long
email thread [2] with discussions in other threads as well.
The final conclusion in Debian was to revert this change and leave the
old
mapping in place as the maintainer of groff received a ton of emails [3]
and
didn't want to spend all of his capacity on this issue.
After reading through all of the emails, we've decided to align with the
Debian
decision and revert this change, thereby retaining the current mapping.
The
main reason for this was to eliminate a bunch of bugs reported to the
groff/
man-pages packages about broken manual pages. Our capacity also has
its limits,
and we need to spend our resources wisely, and this decision was based
on that.
The reproducer and the issue description were reported in Bugzilla [4],
so
please read through it, if you are interested.
I read the LWN discussion.
Did we try to persuade upstream to revert the problem? But if they're not receptive then a downstream fix aligned with Debian looks right.
Thanks,
Rich.
[1] https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00001.html [2] https://lwn.net/Articles/947941/ [3] https://lwn.net/ml/debian-devel/ZS0aV4XyJH+O1o%2Fc@riva.ucam.org/ [4] https://bugzilla.redhat.com/show_bug.cgi?id=2224123
-- S pozdravom/ Best regards
Lukáš Javorský
Software Engineer, Core service - Databases
Red Hat
Purkyňova 115 (TPB-C)
612 00 Brno - Královo Pole
ljavorsk@redhat.com
[logo--200]
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it:
https://pagure.io/fedora-infrastructure/new_issue
-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
V Thu, Nov 09, 2023 at 12:11:55PM +0100, Lukas Javorsky napsal(a):
Did we try to persuade upstream to revert the problem? But if they're not receptive then a downstream fix aligned with Debian looks right.
I didn't yet. However, if they decided to stop mapping these characters, I don't think they would be willing to revert it back. They mentioned the option to map it locally as I did in the PR. I assume they want to stop mapping it on their end and let distros decide if they want to do it themselves.
I think an upstream only wants to adhere to the language specification (groff_char(7)). These small differences became prominent with the advent of UTF-8 capable terminals. They have always been visible in a PostScript output.
Imagine you are the upstream and a user sends you a bug report that groff does not behave according to the specification. While another user complains that his nonconforming input behaves weirdly. There is no solution which would satisfy both.
-- Petr
I think an upstream only wants to adhere to the language specification (groff_char(7)). These small differences became prominent with the advent of UTF-8 capable terminals. They have always been visible in a PostScript output.
Imagine you are the upstream and a user sends you a bug report that groff does not behave according to the specification. While another user complains that his nonconforming input behaves weirdly. There is no solution which would satisfy both.
I totally understand why they made the change, I just don't see the reason for requesting them to put the mapping back.
I think the right decision would be that upstreams of the packages that have the man pages written with the wrong character, should be the ones to fix their man pages. However, I'm not very confident that they would willingly do that, and not sure if there are enough volunteers to do that for them.
That being said, the new groff version with the mapping for the old characters is now in stable for both Fedora 39 and Fedora Rawhide.
On Thu, Nov 9, 2023 at 12:41 PM Petr Pisar ppisar@redhat.com wrote:
V Thu, Nov 09, 2023 at 12:11:55PM +0100, Lukas Javorsky napsal(a):
Did we try to persuade upstream to revert the problem? But if they're not receptive then a downstream fix aligned with Debian looks right.
I didn't yet. However, if they decided to stop mapping these characters,
I
don't think they would be willing to revert it back. They mentioned the option to map it locally as I did in the PR. I assume they want to stop mapping it on their end and let distros decide if they want to do it themselves.
I think an upstream only wants to adhere to the language specification (groff_char(7)). These small differences became prominent with the advent of UTF-8 capable terminals. They have always been visible in a PostScript output.
Imagine you are the upstream and a user sends you a bug report that groff does not behave according to the specification. While another user complains that his nonconforming input behaves weirdly. There is no solution which would satisfy both.
-- Petr _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Thu, 2023-11-16 at 23:41 +0100, Lukas Javorsky wrote:
I think an upstream only wants to adhere to the language specification (groff_char(7)). These small differences became prominent with the advent of UTF-8 capable terminals. They have always been visible in a PostScript output.
Imagine you are the upstream and a user sends you a bug report that groff does not behave according to the specification. While another user complains that his nonconforming input behaves weirdly. There is no solution which would satisfy both.
I totally understand why they made the change, I just don't see the reason for requesting them to put the mapping back.
I think the right decision would be that upstreams of the packages that have the man pages written with the wrong character, should be the ones to fix their man pages.
But they *don't* have their man pages written with the "wrong character". That's one reason why this is awkward and controversial.
The man pages are written with the *right* character - an ASCII dash, the character that is actually used for arguments to console commands. groff is converting this to a Unicode dash based on the expectation that it's being used for something more "text-y", and the writer just used an ASCII dash because it's much more convenient to type, but they really want a more text-ish dash. This may be the case for a lot of text, but it's definitely not the case for CLI tool manpages. When they write an ASCII dash, they want it to stay as an ASCII dash.
With the 'new' behaviour, you have to markup or escape your ASCII dashes somehow to prevent groff turning them into unicode dashes.
On Thu, Nov 16, 2023 at 02:52:23PM -0800, Adam Williamson wrote:
On Thu, 2023-11-16 at 23:41 +0100, Lukas Javorsky wrote:
I think an upstream only wants to adhere to the language specification (groff_char(7)). These small differences became prominent with the advent of UTF-8 capable terminals. They have always been visible in a PostScript output.
Imagine you are the upstream and a user sends you a bug report that groff does not behave according to the specification. While another user complains that his nonconforming input behaves weirdly. There is no solution which would satisfy both.
I totally understand why they made the change, I just don't see the reason for requesting them to put the mapping back.
I think the right decision would be that upstreams of the packages that have the man pages written with the wrong character, should be the ones to fix their man pages.
But they *don't* have their man pages written with the "wrong character". That's one reason why this is awkward and controversial.
The man pages are written with the *right* character - an ASCII dash, the character that is actually used for arguments to console commands. groff is converting this to a Unicode dash based on the expectation that it's being used for something more "text-y", and the writer just used an ASCII dash because it's much more convenient to type, but they really want a more text-ish dash. This may be the case for a lot of text, but it's definitely not the case for CLI tool manpages. When they write an ASCII dash, they want it to stay as an ASCII dash.
With the 'new' behaviour, you have to markup or escape your ASCII dashes somehow to prevent groff turning them into unicode dashes.
In case of systemd man pages, it's even more confusing [1]: we write pages in docbook xml, and use that to generate man and html outputs (and even I think some people experimented with epub). We don't want to add the escaping in the xml text because at this point we don't know the output format and the escaping is only needed for one of the output formats. To fix the problem docbook itself would have to do escaping of any dashes it writes as troff. Maybe that'd be more "correct", but it's not happening right now, and hasn't been happening for the last 30 years.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2249869 "systemd‐userdbd.service(8), systemd‐homed.service(8), nss‐systemd(8) are missing" ^ this is the new hyphen ^ ^
Zbyszek