On Wed, Jan 19, 2011 at 10:28 AM, Jesús Franco <tezcatl@fedoraproject.org> wrote:

An event than could interest all of us, related to translation *and*
free/open-source it's coming up and it would be hosted by the prestigious
Universitat Oberta de Catalunya.


The introduction of the site:

       This workshop aims to bring together the experience of researchers
       and developers in the field of rule-based machine translation who
       have decided to get on board the free/open-source train and are
       *effectively contributing to creating a commons of explicit
       knowledge*: machine translation rules and dictionaries, and
       machine translation systems whose behaviour is transparent and
       clearly traceable through their explicit logic.

(I've added the *bold*) This is an area of real interest to me, there are
pretty good translation memories we can access today *because people is
sharing*. You can read all about this here:


My own discussion about why this is worth of your attention:

There is no doubt than machine automated translation is helping in the work
of every translator at Fedora/FOSS at a large. But we shouldn't take for
sure "Google is going better and better and i don't need to do anything
after copypasting my strings for translation on it" as someone as said in
the past.

By the way, i'm in the way of starting a project for sharing among
Fedora/FOSS translators translation memories in a common repository (by
language), and i'd like to know your experience about this:

Which is the best way to share your translation memories on a repo we can
fit to our needs?

I've thought about putting all together in a git repo (by language pairs),
where everybody can push his updates to the memories, and pull the
contributed by their peers, loading it into their preferred translation

But i think is easier to say it than doing it? Or not?

Which is the way do you reuse your own translation memories? How do you
share with your fellows at your Fedora translation team?

Thanks in advance for every comment (even the tiniest) you want to share.
I think most of the efforts should be aimed at building Apertium corpuses.

English is a Germanic language, Apertium allows languages with the same basic rules to more transparent to one another (Although corpuses work in one direction only).

The problem is that we can't build corpuses so easily, in order to create a working corpus many hours of work are needed, if there was only a simpler way of creating them and maintaining them using AI it would be great (or even creating a graphical tool to aid in this mission).

Building "stupid" TMs with statistical data can be sometimes false and they need lots of AI to get better, building good corpuses with the right rules for every language will help computers understand human language in source and destination languages.

Why is it so important?
I want to present you with a problem I had with translating text from Arabic to Hebrew, Google's mechanism is doing the following procedure: Translated the Arabic text to English, English is then translated to Hebrew, You can't even possible imagine how strong is the phrase "Lost in translation" in this case.
BTW, Microsoft translator does much better job than Google's translator when translating from English to Hebrew.

Hebrew and Arabic share basic rules, instead of using English in the middle we can use a mechanism that will take advantage of their similarities to translate between them without going through 3rd language.
Same for Czech and Slovak, apparently many Czech translators are using the Slovak translation instead of translating from English and vice versa.

Apertium website: http://www.apertium.org/

Kind regards,
Yaron Shahrabani
<Hebrew translator>


"We cannot solve our problems with the same thinking we used when we created
them." Albert Einstein
Jesús Franco - Fedora Ambassador and Translator

trans mailing list