Re: Planning a future L10N infrastructure (including Fedora)

Sunday, 21 September 2008

Hi Dimitris,

Thanks for your comments.

----- "Dimitris Glezos" <dimitris(a)glezos.com&gt; wrote:
...
 2008/9/17 Asgeir Frimannsson <asgeirf(a)redhat.com&gt;:
 > On Tuesday 16 September 2008 23:29:32 Mike McGrath wrote:
 >> > >
 >> > > Please correct me if I'm reading this wrong but I see
 "transifex is
 >> > > great or close to it" and "here's how we're going to
build our
 own
 >> > > solution anyway" ?
 >> >
 >> > Yes, "Transifex is great and will continue to serve us".
 >> >
 >> > BUT:
 >> >
 >> > If you look at the state of the art in L10N outside the typical
 Linux
 >> > projects where PO and Gettext rule, you'll notice we are very
 short on
 >> > areas like: - Translation Reuse
 >> > - Terminology Management
 >> > - Translation Workflow and Project Management
 >> > - Integration with CMSs.
 >> > - Richer Translation Tools
 >> >
 >> > This is an effort in narrowing that gap, and I can't see that
 effort work
 >> > by evolving an existing tool from this 'cultural background'.
 Yes, we can
 >> > get some of the way by developing custom solutions for e.g.
 linking wikis
 >> > to Transifex for CMS integration, or using e.g. Pootle for
 web-based
 >> > translation. But we would still be limited to the core
 architecture of
 >> > the intent of the original developers, which is something that
 would
 >> > radically slow the project down.

 For the record, I believe these are some fine ideas, which I would
 like to see added to Transifex as features (eg. through plugins). I
 have been discussing most of them with people around conferences for
 the past year. An example: Tx already downloaded all the translation
 files from upstream projects, so if someone requests a translation
 file, why not be able to pre-populate it using existing translations
 from all the other projects (translation reuse)?

 Also, I should mention that Transifex isn't (and will never be)
 specific to a particular translation file format (eg. PO) or any
 translation repository. I'd like to support translation of both PO
 and
 XLIFF files. And also support not only VCSs, but CMSs, wiki pages and
 even arbitrary chunks of text. Transifex's goal is to be a platform
 to
 help you manage your translations. 
For the record (since XLIFF is mentioned and since I'm part of the Oasis XLIFF
Technical Committee), I am not aiming to design anything around XLIFF in this project,
other than perhaps support XLIFF is an import/export format for resources in the same way
as we support PO (we do have the odd XLIFF file coming through for translation). I
don't think XLIFF (1.2) is mature enough yet as a L10N resource format.

I know there are some big ideas in transifex. In fact, when transifex is mentioned, often
people refer to the *goal/idea* of transifex, rather the actual current implementation.
Take for example plugins, transifex doesn't currently have a plugin system, neither
does it have workflow, project management, or any concept of translation resources
internally. Transifex today is a simple 'file submission system' with a growing
community aiming to build it into something more. With this in mind, 'building on top
of transifex' really means redefining what transifex really is. For example, 'file
submission' should really be a plugin, not a core feature. That means all of transifex
today (excluding maybe the login UI), should really be plugins to a core model of
projects, people, etc, that currently doesn't exist. 

Defining this 'model' of a repository doesn't really depend much on the
implementation, and in fact many implementations might help push this faster and ensure a
better solution (if it was on the tx roadmap in the first place). And it's not like it
is impossible for e.g. a java based repository to communicate with Transifex for file
submissions, isn't that exactly what the remote-interface of TX (on the roadmap) is
supposed to provide? What I'm hearing is "Don't build something new, continue
building on the python/tg/transifex architecture", which is fully understandable.
However, considering the cost of developing this on top of tx (re-architecture, convincing
all that it is the right path to go, immaturity/stability of libraries for e.g. ajax,
limited workflow support), I honestly think it's better with two projects that
'compliment' each other. There are more than enough tasks for everyone in the
existing Tx roadmap, and the idea is bigger than what a combined development team could
accomplish. Diversifying and pulling in good people from e.g. the java-side of things
might even help speed things up. 

...
 >> Correct me if I'm wrong though, instead of forking or
adapting or
 working
 >> with upstream, you are talking about doing your own thing right?
 >
 > We have a goal of where we want to see L10N infrastructure go, to
 enable us in
 > the future to provide internal (translators paid by Red Hat) and
 community
 > translators with tools to increase their productivity as well as
 better tools
 > to manage the overall L10N process. If there is an 'upstream' that
 provides
 > this, or a platform on to which we could develop this, then yes, we
 would
 > consider 'working with upstream' or (in a worst-case-scenario)
 forking
 > upstream.

 The Translate Toolkit folks are a very friendly bunch, actively
 maintaining and extending the rich library, and always open to
 suggestions. Maybe some (if not all) of the features could be done in
 TT, and the rest that might not fit there, as Python libraries to
 maximize interoperability and community involvement. 
Yes, I know TT very well, and have discussed the library with Dwayne Bailey (the main
visionary behind the project) in the past, even before tx was born. In fact, a
django-migration of Pootle (built on top of the TT) has been on the agenda for a while,
and combining forces with TT is one of the other options I have been strongly considering
for a repository (TT e.g. has a file submission library, and there is a lot of duplication
between tt and tx). Looking at the svn activity of TT (in my rss reader), it is definetly
a project with a 'dangerous' future.

...
 I also think that Transifex could serve as the "UI" for a
lot of
 translation-specific tasks. If there's a library that does X, that
 would help people manage their translations or leverage Transifex's
 strong points of "I read a lot of repositories" and "I write to some
 repositories", then we could provide a web wrapper around it. (eg.
 search for string "X" in all translation files of language "Y", or
 "mark <this> file as a downstream of <that> and send me an msgmerged
 file whenever <that> changes".

 > So to answer your question bluntly, YES - after 4 years involvement
 in
 > industry and community L10N processes - I believe we can do better.
 But
 > holding that thought, remember that this is in many ways
 'middleware', and
 > making use of e.g. the vast amount of knowledge invested in
 Translate Toolkit
 > (file format conversions, build tools, QA) makes sense, and I'm not
 saying
 > 'forget about all that we have invested in tools so far'.

 It might be my poor English or the fact that I usually read long
 mails
 at night, but despite the lengthy descriptions I still don't have a
 clear picture of exactly what problem you'd like to solve, and the
 reasoning behind the decisions being made. 
I do understand there is a 'semantic gap' here, and that we do need to provide a
better description and demonstration of why a new project is necessary. I do believe
everything is theoretically possible to build on top of python/tg and through reuse of
concepts in e.g. tx and TT, but I honestly believe if we are going to manage and drive the
development effort in this, it is more worthwhile to expand beyond the fedora/python
community, and use tools that the core developers would be more comfortable and productive
with. This is not a 'we think you guys should develop this' request, we are taking
ownership of the project, as well as inviting anyone that is interested in the community
to participate and take ownership.

...
 Don't take me wrong -- I think there are some good ideas. But I
feel
 it would be too bad if you guys didn't invest on top of existing
 tools
 (TT for file formats, Transifex for file operations and UI, OmegaT
 for
 translation memory) or just isolate specific solutionsthat don't fit
 into other projects in well-defined libraries (do one thing, to it
 right). Sure, it takes a lot more effort to work *with* other people,
 but it is usually worth it. :-) 
This is *not* about an effort to avoid working with people. It is an effort to get more
people working on this. I know more people in the Java community that is or might be
interested in a open source solution for these problems than in the Python/Fedora/TG
community. And of course adding to this a portion of my natural bias towards Java, and the
fact that the people that would be working on this would initially be much more productive
in Java than in Python (TG2 or django). 

With the fact that we throw this idea out to the fedora/tx community early, please take
that as a sign that we are trying to work with the community, rather than simply
developing something on our own. And I for one will continue being involved with Tx to
some degree, and help out where I can. L10N is an area with a lot of space for
improvement, and an area that has sadly been to some extent 'neglected' except for
Dimitris' recent work. We still have a long way to go before we have what I would call
a L10N infrastructure that serves translators well.

cheers,
asgeir

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006