Hi Dimitris,
Thanks for your comments.
----- "Dimitris Glezos" dimitris@glezos.com wrote:
2008/9/17 Asgeir Frimannsson asgeirf@redhat.com:
On Tuesday 16 September 2008 23:29:32 Mike McGrath wrote:
Please correct me if I'm reading this wrong but I see
"transifex is
great or close to it" and "here's how we're going to build our
own
solution anyway" ?
Yes, "Transifex is great and will continue to serve us".
BUT:
If you look at the state of the art in L10N outside the typical
Linux
projects where PO and Gettext rule, you'll notice we are very
short on
areas like: - Translation Reuse
- Terminology Management
- Translation Workflow and Project Management
- Integration with CMSs.
- Richer Translation Tools
This is an effort in narrowing that gap, and I can't see that
effort work
by evolving an existing tool from this 'cultural background'.
Yes, we can
get some of the way by developing custom solutions for e.g.
linking wikis
to Transifex for CMS integration, or using e.g. Pootle for
web-based
translation. But we would still be limited to the core
architecture of
the intent of the original developers, which is something that
would
radically slow the project down.
For the record, I believe these are some fine ideas, which I would like to see added to Transifex as features (eg. through plugins). I have been discussing most of them with people around conferences for the past year. An example: Tx already downloaded all the translation files from upstream projects, so if someone requests a translation file, why not be able to pre-populate it using existing translations from all the other projects (translation reuse)?
Also, I should mention that Transifex isn't (and will never be) specific to a particular translation file format (eg. PO) or any translation repository. I'd like to support translation of both PO and XLIFF files. And also support not only VCSs, but CMSs, wiki pages and even arbitrary chunks of text. Transifex's goal is to be a platform to help you manage your translations.
For the record (since XLIFF is mentioned and since I'm part of the Oasis XLIFF Technical Committee), I am not aiming to design anything around XLIFF in this project, other than perhaps support XLIFF is an import/export format for resources in the same way as we support PO (we do have the odd XLIFF file coming through for translation). I don't think XLIFF (1.2) is mature enough yet as a L10N resource format.
I know there are some big ideas in transifex. In fact, when transifex is mentioned, often people refer to the *goal/idea* of transifex, rather the actual current implementation. Take for example plugins, transifex doesn't currently have a plugin system, neither does it have workflow, project management, or any concept of translation resources internally. Transifex today is a simple 'file submission system' with a growing community aiming to build it into something more. With this in mind, 'building on top of transifex' really means redefining what transifex really is. For example, 'file submission' should really be a plugin, not a core feature. That means all of transifex today (excluding maybe the login UI), should really be plugins to a core model of projects, people, etc, that currently doesn't exist.
Defining this 'model' of a repository doesn't really depend much on the implementation, and in fact many implementations might help push this faster and ensure a better solution (if it was on the tx roadmap in the first place). And it's not like it is impossible for e.g. a java based repository to communicate with Transifex for file submissions, isn't that exactly what the remote-interface of TX (on the roadmap) is supposed to provide? What I'm hearing is "Don't build something new, continue building on the python/tg/transifex architecture", which is fully understandable. However, considering the cost of developing this on top of tx (re-architecture, convincing all that it is the right path to go, immaturity/stability of libraries for e.g. ajax, limited workflow support), I honestly think it's better with two projects that 'compliment' each other. There are more than enough tasks for everyone in the existing Tx roadmap, and the idea is bigger than what a combined development team could accomplish. Diversifying and pulling in good people from e.g. the java-side of things might even help speed things up.
Correct me if I'm wrong though, instead of forking or adapting or
working
with upstream, you are talking about doing your own thing right?
We have a goal of where we want to see L10N infrastructure go, to
enable us in
the future to provide internal (translators paid by Red Hat) and
community
translators with tools to increase their productivity as well as
better tools
to manage the overall L10N process. If there is an 'upstream' that
provides
this, or a platform on to which we could develop this, then yes, we
would
consider 'working with upstream' or (in a worst-case-scenario)
forking
upstream.
The Translate Toolkit folks are a very friendly bunch, actively maintaining and extending the rich library, and always open to suggestions. Maybe some (if not all) of the features could be done in TT, and the rest that might not fit there, as Python libraries to maximize interoperability and community involvement.
Yes, I know TT very well, and have discussed the library with Dwayne Bailey (the main visionary behind the project) in the past, even before tx was born. In fact, a django-migration of Pootle (built on top of the TT) has been on the agenda for a while, and combining forces with TT is one of the other options I have been strongly considering for a repository (TT e.g. has a file submission library, and there is a lot of duplication between tt and tx). Looking at the svn activity of TT (in my rss reader), it is definetly a project with a 'dangerous' future.
I also think that Transifex could serve as the "UI" for a lot of translation-specific tasks. If there's a library that does X, that would help people manage their translations or leverage Transifex's strong points of "I read a lot of repositories" and "I write to some repositories", then we could provide a web wrapper around it. (eg. search for string "X" in all translation files of language "Y", or "mark <this> file as a downstream of <that> and send me an msgmerged file whenever <that> changes".
So to answer your question bluntly, YES - after 4 years involvement
in
industry and community L10N processes - I believe we can do better.
But
holding that thought, remember that this is in many ways
'middleware', and
making use of e.g. the vast amount of knowledge invested in
Translate Toolkit
(file format conversions, build tools, QA) makes sense, and I'm not
saying
'forget about all that we have invested in tools so far'.
It might be my poor English or the fact that I usually read long mails at night, but despite the lengthy descriptions I still don't have a clear picture of exactly what problem you'd like to solve, and the reasoning behind the decisions being made.
I do understand there is a 'semantic gap' here, and that we do need to provide a better description and demonstration of why a new project is necessary. I do believe everything is theoretically possible to build on top of python/tg and through reuse of concepts in e.g. tx and TT, but I honestly believe if we are going to manage and drive the development effort in this, it is more worthwhile to expand beyond the fedora/python community, and use tools that the core developers would be more comfortable and productive with. This is not a 'we think you guys should develop this' request, we are taking ownership of the project, as well as inviting anyone that is interested in the community to participate and take ownership.
Don't take me wrong -- I think there are some good ideas. But I feel it would be too bad if you guys didn't invest on top of existing tools (TT for file formats, Transifex for file operations and UI, OmegaT for translation memory) or just isolate specific solutionsthat don't fit into other projects in well-defined libraries (do one thing, to it right). Sure, it takes a lot more effort to work *with* other people, but it is usually worth it. :-)
This is *not* about an effort to avoid working with people. It is an effort to get more people working on this. I know more people in the Java community that is or might be interested in a open source solution for these problems than in the Python/Fedora/TG community. And of course adding to this a portion of my natural bias towards Java, and the fact that the people that would be working on this would initially be much more productive in Java than in Python (TG2 or django).
With the fact that we throw this idea out to the fedora/tx community early, please take that as a sign that we are trying to work with the community, rather than simply developing something on our own. And I for one will continue being involved with Tx to some degree, and help out where I can. L10N is an area with a lot of space for improvement, and an area that has sadly been to some extent 'neglected' except for Dimitris' recent work. We still have a long way to go before we have what I would call a L10N infrastructure that serves translators well.
cheers, asgeir
infrastructure@lists.fedoraproject.org