RFC: i18n proposal

Jeff Johnson jbj at redhat.com
Thu Jul 24 15:42:26 UTC 2003


On Thu, Jul 24, 2003 at 04:52:54PM +0200, Enrico Scholz wrote:
> [ f'up2 to rhl-devel; the starting point of this thread is available at
>   http://www.fedora.us/pipermail/fedora-devel/2003-July/001756.html ]
> 
> hp at redhat.com (Havoc Pennington) writes:
> 
> > On Thu, Jul 24, 2003 at 01:19:11AM +0200, Enrico Scholz wrote: 
> >> 
> >> Other thoughts or comments?
> >> 
> >
> > Probably works for now (we've been doing it forever), but in the end
> > the only right thing has to be that translations are part of the
> > package being translated.
> 
> I see a problem in the maintenance of these translations. With the
> specspo approach, there is one big .po-file which is updated by the
> translators. When making the translations a part of the package, there
> will be either thousands of small .po-files on the distribution side
> (Fedora, Red Hat) or the package author has to maintain his .po files
> himself.
> 

Yup, one big file is much easier to hand to a per-locale translator.

And, even though I don't do i18n myself becuase I'm deprived of other
languages, I suspect that one big file is easier on everyone involved
than hundreds of 20 line files.

Is hundreds of teensy .po files doable? Sure, but there are many
technical reasons not to do so.

The really important issue is that i18n is lots and lots of work from
a team of people. IMHO, the work is best left to for-pay translators
(and editors, hackers are not necessarily the best redactors) by commercial
entities like Red Hat.

> IMHO, the latter option will not work because this will result in a
> maintainance nightmare: since there are not enough translators, every
> translator has to look after several packages, has to communicate
> with the author and the author has to release packages with updated
> translations very often. And this for every language...
> 
> 
> So, there stays only the thousands of small .po-files case where the
> translator has to choose one of them. Clicking hundreds of time in the
> browser to download the files sounds very annoying and error-prone to
> me.
> 
> Sure, you can hide this complexity with tools but such tools must be
> written first and it will need a lot of work to make them as powerful
> like the current solutions (e.g. the Emacs po-mode).
> 
> Another advantage of the big .po file is, that common strings
> (e.g. group-names) need do be translated once.
> 

Thousands of teensy *.po files doesn't work? If so, we agree completely.
Otherwise I'm confused by your words, as I see thousands of *.po files
as a logistical nightmare that will just impede the process of l10n.

I'd suggest that someone who actually does (or has done) i18n translations
for package Summary/Description/Group design the process, because the
goal is to make it as easy as possible to complete and maintain the corpus.

Yes, no process design means that only ad hoc tools can be built.

> 
> 
> > Say you are using redhat-config-packages or another package tool. You
> > see a list of packages. You should see nice user friendly names of
> > those packages in your own language
> 
> Ok; but I do not think that this is a matter of translations. To make
> this possible, a new rpm-tag (e.g. 'LongName:' or 'ShortSummary:') which
> defaults to the value of 'Name:' must be created. This new tag can be
> added to the fields going to be translated.
> 
> But I think, only a few packages will profit from this; e.g. consider
> the 'docbook-style-*' packages. What will you use for 'LongName:'? When
> using e.g. _("docbook tools") the package-tool will present dozens of
> _("docbook tools") and user is forced to enable either the raw-view or
> to look at 'Summary:' to see which package it is.
> 

This is a much nastier problem because:

	a) there's a change of data type (i.e.
	    RPM_STRING_TYPE -> RPM_I18NSTRING_TYPE
	involved, it's not just adding new text for RPMTAG_NAME or adding
	new LongName:/ShortName: tag. In fact, the creation of the data
	type was exactly the reason for a major incompatibility. Having
	personally survived (but barely) v2 -> v3 -> v4 -> v3 changes
	in rpm packaging, I wish not to go there ever again. I wasted
	1 year of my life already discussing the ramifications of the
	value contained in byte 5 of an rpm package. Have at it, enjoy,
	I have no desire to go there again.

	b) LongName: appears to have exactly the properties currently
	in %description; ditto ShortSummary: wrto Summary:; otherwise
	see c)

	c) There are significant issues involved in attempting to
	do i18n on a database key like Name:. Yes, the value of
	Name: is displayed as output, and as output, is a candidate
	for translation. Writing a database layer that properly
	handles i18n'ified tags is way outside the scope of what
	I wish to attempt with rpmdb.

	d) There's also the peripheral issue of the i18n name of the
	Name: tag, see "rpm -qi rpm" output. Should "Name:" or "Nom:"
	be displayed in LANG=fr_FR? My own opinion -- after having
	a long discussion with a member of the i18n team -- is that
	RPMTAG_NAME should be identified as a string in the C locale
	bacuase the value is havily used in, say, python constructs
	like
		N = h['name'],
	OTOH, the mechanism to do otherwise already exists if you disagree,
	see popt(3) aliases which are mostly gettext'ified and ready for
	translation. Note: only a single person has even attempted to
	look at the i18n problem wrto /usr/lib/rpm/rpmpopt-x.y.z afaik.)

	e) rpm was written (and still uses) a ctype(3) parser because
	i18n was not even a consideration when rpm was written. The parser
	has broken several times becuase of i18n deployment, can/will
	break further if the tokens parsed (e.g. "Name:" used in error
	messages) are localized.

	f) rpm is totally clueless about encodings, although reasonable
	deployments (because rpm is 8bit clean) might be attempted.

> 
> > Also you want translated descriptions of course.
> >
> > I see no way to reliably do this using specspo. You have to somehow
> > bundle the translations into the RPM package, and also install them
> > when the RPM is installed.
> 
> You are right; specpo has disadvantages. E.g. you will have to download
> the several mebibytes sized package for every small change in the
> translation. There is a time-gap between package- and specspo-release
> also.
> 

My (and Red Hat's at the time) is
	specspo is great for distributions but bad for packages.

The major advantages for distros are
	a) single file encapsulation of large and changing set of text.
	b) uncoupling build from doc processes, they happen on different
	time scales, for different reasons, and mashing the two processes
	into one is a formidable task indeed.
	c) the potential for updating package translations after release.

The major disadvantage for 3rd party packages is that specspo has not
been generalized into thousands of teensy *.po files. Certainly doable,
will even attempt in rpm if/when I see a significant number of packagers,
not distros, even attempting to translate package Summary/Description/Group.

> 
> > Otherwise mixing packages from different sources (including different
> > OS versions) *will* break.
> 
> IMHO, this is not a big problem. The average user (who wants/needs
> translations), will use the big repositories (the upcoming Red Hat
> Linux Project, Fedora, Freshrpms, ...) which can provide their
> specspo-packages.
> 
> There is an issue with rpm that is not easy to "register" such packages;
> but for now, this can be solved with some %triggers and Jeff is open for
> ideas how it can be improved in "better" ways.
> 

Hmmm, hopefilly can be solved without %triggers, but yes I'm willing
to try to accomodate.

> 
> > One approach would add a "PoFilesDir: foo" field to spec files, ...
> 
> Very interesting idea. Because I do not like decentralized maintainance
> of po-files, I would apply it in the following way:
> 
> * The build-hosts of the repositories which are maintaining the translations
>   are "authoritative" regarding translations. "authoritative" means that
>   there are global macros like
> 
>   | %_i18n_update_pofiles       1
>   | %_i18n_translation_domain   fedora-i18n
> 
>   This i18n domain is maintained in the current way: there is a big .po
>   file for every language, translators update it through cvs and release
>   manager compiles it to a .mo-file.
> 
> * while doing 'rpmbuild -bs ...' on these hosts, the resulting srpm-package
>   gets
> 
>   - a tarball with *.po-files containing the trasnlated strings of the
>     package
>   - a tag like 'PoSources: <tarballname>'
> 
>   non-authoritative hosts will not touch this tarball or tag.
> 
> * when doing 'rpmbuild --rebuild ...' for such a prepared srpm, the
>   operations mentioned by you already (make update-po) will be executed
>   and rpm creates translated headers by executing
> 
>   | gettext(header[RPMTAG_SUMMARY/DESCRIPTION/...])
> 
>   for every supported language.
> 
>   'rpmbuild --rebuild' will not differ on authoritative and non-authoritative
>   hosts.
> 
> 
> Problems:
> 
> * it is not implemented yet
> 

I can certainly see ways to simplify and automate building distros that
have packages with po sub-directories.

However, automation does not start with
	Let's add "PoFilesDir: foo" tag because ...
there's far more to automation than parsing syntax (XML != automation).

Yeah there's tha "not implemented yet" problem isn't there ;-)

> 
> 
> > It might be nice if there were some way to add translation resource
> > bundles to an RPM *after* building the RPM,
> 
> Yes; you will need to modify the headers (change localized language tags
> and increase release) and you will have to update the po-tarball. On the
> first glance, this does not look very complicated.
> 

Been there, done that -- twice -- the process was called "drilling" and
was nearly as painful as a root canal.

IMHO: i18n does not belong in rpm metadata anymore than i18n belongs in
tar/cpio headers. Keep i18n out of packages, please.

73 de Jeff

-- 
Jeff Johnson	ARS N3NPQ
jbj at redhat.com (jbj at jbj.org)
Chapel Hill, NC





More information about the devel mailing list