[Fedora-packaging] file-not-utf8 complaints

Hans de Goede j.w.r.degoede at hhs.nl
Sat May 31 06:06:53 UTC 2008


Toshio Kuratomi wrote:
> Jason L Tibbitts III wrote:
>> Normally we fix up non-utf8 documentation and such with a quick call
>> to iconv.  It seems that this is problematic for some; see
>> https://bugzilla.redhat.com/show_bug.cgi?id=226079
>>
>> Any comments on how much we actually care about this, especially in
>> the case that it might not actually be as easy as a call to iconv
>> (such as a changelog file with a pile of random encodings in it).
>>
> Well... The reason that all files must be UTF-8 is exactly the problem 
> that the ChangeLog exhibits so I don't have a lot of sympathy there.

+1,

Although I fully agree with Daniel that blindly converting text-ish 
files which actually specify an encoding in their headers is both wrong 
and dangerous as that actually breaks stuff, normal text files, esp. 
ones in %doc should be in UTF-8, so that when opened they display correctly.

Indeed the changelog is a perfect example of why all plain text files 
must be UTF-8, had it always been UTF-8 the problems between part being 
in west-european encoding and parts in east-european encoding would not 
exist.

Also I think its worth noting that Fedora is not the only distro doing 
this, Debian for example also tries to have all text files in the distro 
in UTF-8.

I'll also put a comment to this extend in the review.

Regards,

Hans



  The
> names and special characters in that file are already corrupted since 
> there's no common encoding and none is recorded with the names.
> Dropping it from the package, as Daniel expressed is certainly an option 
> as there's no requirement that ChangeLogs need to be in a package and it 
> is not something that must be changed.
> 
> Reencoding the xml files that specify an encoding isn't strictly 
> necessary.  We should probably ask upstream whether they are amenable to 
> changing to utf-8.  Since libxml2 deals with utf-8 internally and the 
> upstream author made a nice writeup about why he made that choice, 
> upstream might be amenable to that.  If upstream is not amenable, we 
> should consider changing the Packaging Guidelines to reflect that xml 
> files which specify their encoding do not have to be re-encoded utf-8. 
> (Although we then have to ask ourselves if we should be checking that 
> the xml files actually use the encoding that they specify :-(
> 
> NEWS and other files that are neither specifying an encoding nor mixed 
> up in such a way that they are hopelessly corrupted WRT the original 
> characters should definitely be converted to utf-8.  If Daniel wants to 
> hold open the Merge Review until that has gone in upstream, that is his 
> perogative.
> 
> The most chilling aspect of that review is that the maintainer does not 
> seem to think that it's his responsibility to take issues with the 
> upstream source to upstream.  Since Daniel is upstream, I'm not certain 
> I can see why he feels that someone else should be reporting it upstream 
> before he deals with it.
> 
> -Toshio
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Fedora-packaging mailing list
> Fedora-packaging at redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-packaging




More information about the packaging mailing list