2nd intl.workshop on FOSS rule-based machine translation

Robert Antoni robert.buj at gmail.com
Sun Jan 23 06:42:52 UTC 2011


All workshop papers are now available at



On Wed, Jan 19, 2011 at 1:02 PM, Yaron Shahrabani <sh.yaron at gmail.com>wrote:

> On Wed, Jan 19, 2011 at 1:11 PM, Jesús Franco <tezcatl at fedoraproject.org>wrote:
>> Yaron, thank you very much for your quick reply, i'll comment
>> interleaving.
>> Yaron Shahrabani wrote:
>> > On Wed, Jan 19, 2011 at 10:28 AM, Jesús Franco
>> > <tezcatl at fedoraproject.org>wrote:
>> >> There is no doubt than machine automated translation is helping in the
>> >> work of every translator at Fedora/FOSS at a large. But we shouldn't
>> take
>> >> for sure "Google is going better and better and i don't need to do
>> >> anything after copypasting my strings for translation on it" as someone
>> >> as said in the past.
>> I use Google just marginally, when i'm not able to find a logic
>> translation
>> (done by another human), among shared TM in TAUS and maybe too in forums.
>> I
>> prefer quality on the translation sense over quantity over number of
>> strings
>> translated (unfortunately, i can't talk by other members of spanish
>> translation team :S).
> LOL, I share the same thought ☺.
> I never ran into TAUS, is there a way to take advantage of it in Linux?
>> > I think most of the efforts should be aimed at building Apertium
>> corpuses.
>> Unfortunately i think apertium is not packaged at Fedora, and in my own
>> experience translating from English to Spanish it returns something like
>> you
>> call "stupid TM". Maybe i'm not getting the point, but my feeling about
>> this
>> is than machines are far of replacing people at this moment, whatever app
>> you use.
> Lack of training and rules, in TM you can train but you can't create rules.
>> > The problem is that we can't build corpuses so easily, in order to
>> create
>> > a working corpus many hours of work are needed, if there was only a
>> > simpler way of creating them and maintaining them using AI it would be
>> > great (or even creating a graphical tool to aid in this mission).
>> My thought about the sharing ideas come from my own experience using
>> virtaal
>> which lets me access to my own TM. Domingo Becker has come with a thought
>> about connecting his translation app to Google or something kinda much the
>> same.
> I truly hope I could contribute back to Google or Microsoft, although I
> don't have control over it.
> How about improving open-tran.eu so we can train it? Isn't it a much
> better resolution?
> Open-Tran.eu is really low on resources so we can't acheive something like
> this with the current instance of Open-Tran.eu, also I think users should
> have the ability to vote on certain translation, this way their statistical
> significance will be AIed as well...
> Virtaal 0.7.0 b2 supports open-tran.eu (a bug in previous versions
> prevented us from using it).
>> > Building "stupid" TMs with statistical data can be sometimes false and
>> > they need lots of AI to get better, building good corpuses with the
>> right
>> > rules for every language will help computers understand human language
>> in
>> > source and destination languages.
>> I think we are talking about different directions. I don't believe
>> computers
>> can understand the obscure corners of a whole human language soon. But i'm
>> pretty sure people is not so stupid than machines if they try to do their
>> best. This is why i'm talking about sharing TM among people, not CPUs.
> I'm all into sharing TMs but I think that without defining rules you are
> just generalizing between languages instead of sharing rules and corpuses,
> training an app to understand your language gramatically so the app can
> decompose your words and sentences and then recontruct them into another
> language is a much cleaner job, althouth we cannot add sarcasm or context to
> the source sentence we can make the computer understand the grammatical
> structure of words and sentences instead of statisctically guessing.
>> > I want to present you with a problem I had with translating text from
>> > Arabic to Hebrew, Google's mechanism is doing the following procedure:
>> > Translated the Arabic text to English, English is then translated to
>> > Hebrew, You can't even possible imagine how strong is the phrase "Lost
>> in
>> > translation" in this case.
>> Imagine what could be happened if a "translator" has taken that result
>> just
>> "because is Google". That's exactly my point.
> Due to a lack of translation resources (for computing and technical
> documentation mostly) most kiddies find themselves using Google as a
> translator without fixing the results, the outcome is an incomprehensible
> app or doc.
>> > BTW, Microsoft translator does much better job than Google's translator
>> > when translating from English to Hebrew.
>> I can't have an idea about that, i use and promote just free (as in
>> freedom)
>> software. It's not about "morality", it's just i can't suggest to a fellow
>> translator "buy this software" (whatever provider he/she can through).
> Its free and supported by Virtaal: http://www.microsofttranslator.com/
> Same deal with Google.
>> > Hebrew and Arabic share basic rules, instead of using English in the
>> > middle we can use a mechanism that will take advantage of their
>> > similarities to translate between them without going through 3rd
>> language.
>> > Same for Czech and Slovak, apparently many Czech translators are using
>> the
>> > Slovak translation instead of translating from English and vice versa.
>> I think the same about Português (and even more Catalá) "twins" languages
>> of
>> Castilian (Spanish), it would be easier to work among similar languages,
>> than putting english in the middle. Actually that wicked idea is the
>> "official" approach in Fedora for guides written in a language other than
>> english, translating first to en_US and from there to another languages.
> That's generalizing but using the same idea for translation just leads to a
> bad translation.
> Languages share certain rules that are lost while translated to each other.
> Portuguese, Spanish, Catalan and Italian can all share corpuses and rules
> that you cannot acheive by translating using English in the middle, even in
> a cultural aspect.
>> Maybe we should stay using english as a common language, but i'm sure than
>> for brazilian and catalan people its easier to understand me in my mother
>> language than through a machine smashing my words in a statistical based
>> software.
>> > Apertium website: http://www.apertium.org/
>> I'm going to apertium site if there are new ideas i can get through
>> there.
>> > Kind regards,
>> > Yaron Shahrabani
>> >
>> > <Hebrew translator>
>> Thanks for sharing your vision about this.
>> Best regards.
>> --
>> Jesús Franco - Fedora Ambassador and Translator
>> http://identi.ca/tzk
>> --
>> trans mailing list
>> trans at lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/trans
> --
> trans mailing list
> trans at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/trans

FQN: Robert Antoni Buj Gelonch
AKA: Robert Buj

Advanced-level training cycle (ca: CFGS) in computer software engineering
(ca: DAI) - les Heures, technical academy (SPA)
Computer Science Engineer - University of Lleida (UDL), Lleida (SPA)
Training course in Handel-C & DK design suite - Celoxica Ltd., Abington (UK)
Training course in Xilinx EDK & Microblaze - Autonomous University of Madrid
(UAM), Madrid (SPA)
Postgraduate Master's degree in .NET Solutions - Open University of
Catalonia (UOC), Barcelona (SPA)
Postgraduate Master's degree in FOSS (Free and open source software)
(upcoming) - Open University of Catalonia (UOC), Barcelona (SPA)

ca: M'agradaria millorar el món, però Déu no em dóna el codi font!
de: Ich würde gern die Welt verbessern, doch Gott gibt mir den Quellcode
en: I would like to improve the world, but God didn't give me the source
es: Me gustaría mejorar el mundo, pero Dios no me da el código fuente!

gpg fingerprint = 0800 D37B C187 CC6E 9D0C 0AF4 265D 0096 AC78 6412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.fedoraproject.org/pipermail/trans/attachments/20110123/2c556eb1/attachment.html 

More information about the trans mailing list