RFC: Description text in packages
Alan Cox
alan at redhat.com
Wed Dec 17 00:12:14 UTC 2008
On Tue, Dec 16, 2008 at 05:40:36PM -0500, Matthias Clasen wrote:
> > Unicode is character encoding
> > HTML tags or similar are semantic markup
>
> Thanks Alan, I know that quite well.
>
> > Trying to extrapolate semantic markup from random ascii symbols is not
> > a reliable or robust path, particularly when you come to internationalise
> > things.
>
> One hopes the ascii symbols in most package descriptions are not
> entirely random... and extrapolating something from them can be quite
There is no reason to assume * for example is a bullet point, it could be a
footnote indicator, maths or ascii art. The Unicode bullet on the other hand
is uneqivocably a bullet point.
So extracting from UTF-8 is safer, but extracting at all is dangerous
> The specification for RPM doesn't imply anything about the description
> field. And this thread is about how to possibly improve the situation by
> agreeing on some form of interpretation.
Right - the field is plain UTF-8 textual data and has been for years. You
want to add a semantic version of it. That is fine but use a new header for
the field the way RPM intends things to be added.
More information about the devel
mailing list