2nd intl.workshop on FOSS rule-based machine translation
sh.yaron at gmail.com
Wed Jan 19 10:07:09 UTC 2011
On Wed, Jan 19, 2011 at 10:28 AM, Jesús Franco <tezcatl at fedoraproject.org>wrote:
> An event than could interest all of us, related to translation *and*
> free/open-source it's coming up and it would be hosted by the prestigious
> Universitat Oberta de Catalunya.
> The introduction of the site:
> This workshop aims to bring together the experience of researchers
> and developers in the field of rule-based machine translation who
> have decided to get on board the free/open-source train and are
> *effectively contributing to creating a commons of explicit
> knowledge*: machine translation rules and dictionaries, and
> machine translation systems whose behaviour is transparent and
> clearly traceable through their explicit logic.
> (I've added the *bold*) This is an area of real interest to me, there are
> pretty good translation memories we can access today *because people is
> sharing*. You can read all about this here:
> My own discussion about why this is worth of your attention:
> There is no doubt than machine automated translation is helping in the work
> of every translator at Fedora/FOSS at a large. But we shouldn't take for
> sure "Google is going better and better and i don't need to do anything
> after copypasting my strings for translation on it" as someone as said in
> the past.
> By the way, i'm in the way of starting a project for sharing among
> Fedora/FOSS translators translation memories in a common repository (by
> language), and i'd like to know your experience about this:
> Which is the best way to share your translation memories on a repo we can
> fit to our needs?
> I've thought about putting all together in a git repo (by language pairs),
> where everybody can push his updates to the memories, and pull the
> contributed by their peers, loading it into their preferred translation
> But i think is easier to say it than doing it? Or not?
> Which is the way do you reuse your own translation memories? How do you
> share with your fellows at your Fedora translation team?
> Thanks in advance for every comment (even the tiniest) you want to share.
I think most of the efforts should be aimed at building Apertium corpuses.
English is a Germanic language, Apertium allows languages with the same
basic rules to more transparent to one another (Although corpuses work in
one direction only).
The problem is that we can't build corpuses so easily, in order to create a
working corpus many hours of work are needed, if there was only a simpler
way of creating them and maintaining them using AI it would be great (or
even creating a graphical tool to aid in this mission).
Building "stupid" TMs with statistical data can be sometimes false and they
need lots of AI to get better, building good corpuses with the right rules
for every language will help computers understand human language in source
and destination languages.
Why is it so important?
I want to present you with a problem I had with translating text from Arabic
to Hebrew, Google's mechanism is doing the following procedure: Translated
the Arabic text to English, English is then translated to Hebrew, You can't
even possible imagine how strong is the phrase "Lost in translation" in this
BTW, Microsoft translator does much better job than Google's translator when
translating from English to Hebrew.
Hebrew and Arabic share basic rules, instead of using English in the middle
we can use a mechanism that will take advantage of their similarities to
translate between them without going through 3rd language.
Same for Czech and Slovak, apparently many Czech translators are using the
Slovak translation instead of translating from English and vice versa.
Apertium website: http://www.apertium.org/
> "We cannot solve our problems with the same thinking we used when we
> them." Albert Einstein
> Jesús Franco - Fedora Ambassador and Translator
> trans mailing list
> trans at lists.fedoraproject.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the trans