2nd intl.workshop on FOSS rule-based machine translation

Transifex migration

Self presentation

Jesús Franco

Wednesday, 19 January 2011 Wed, 19 Jan '11

2:28 a.m.

Hi! An event than could interest all of us, related to translation *and* free/open-source it's coming up and it would be hosted by the prestigious Universitat Oberta de Catalunya. http://www.uoc.edu/freerbmt11/ The introduction of the site: This workshop aims to bring together the experience of researchers and developers in the field of rule-based machine translation who have decided to get on board the free/open-source train and are *effectively contributing to creating a commons of explicit knowledge*: machine translation rules and dictionaries, and machine translation systems whose behaviour is transparent and clearly traceable through their explicit logic. (I've added the *bold*) This is an area of real interest to me, there are pretty good translation memories we can access today *because people is sharing*. You can read all about this here: http://www.tausdata.org/blog/2010/08/everyones-sharing-translations/ My own discussion about why this is worth of your attention: There is no doubt than machine automated translation is helping in the work of every translator at Fedora/FOSS at a large. But we shouldn't take for sure "Google is going better and better and i don't need to do anything after copypasting my strings for translation on it" as someone as said in the past. By the way, i'm in the way of starting a project for sharing among Fedora/FOSS translators translation memories in a common repository (by language), and i'd like to know your experience about this: Which is the best way to share your translation memories on a repo we can fit to our needs? I've thought about putting all together in a git repo (by language pairs), where everybody can push his updates to the memories, and pull the contributed by their peers, loading it into their preferred translation software. But i think is easier to say it than doing it? Or not? Which is the way do you reuse your own translation memories? How do you share with your fellows at your Fedora translation team? Thanks in advance for every comment (even the tiniest) you want to share. -- "We cannot solve our problems with the same thinking we used when we created them." Albert Einstein Jesús Franco - Fedora Ambassador and Translator http://fedoraproject.org/wiki/User:Tezcatl http://identi.ca/tzk

Show replies by date

Yaron Shahrabani

Wednesday, 19 January Wed, 19 Jan

4:07 a.m.

On Wed, Jan 19, 2011 at 10:28 AM, Jesús Franco <tezcatl(a)fedoraproject.org>wrote:

...

I think most of the efforts should be aimed at building Apertium corpuses. English is a Germanic language, Apertium allows languages with the same basic rules to more transparent to one another (Although corpuses work in one direction only). The problem is that we can't build corpuses so easily, in order to create a working corpus many hours of work are needed, if there was only a simpler way of creating them and maintaining them using AI it would be great (or even creating a graphical tool to aid in this mission). Building "stupid" TMs with statistical data can be sometimes false and they need lots of AI to get better, building good corpuses with the right rules for every language will help computers understand human language in source and destination languages. Why is it so important? I want to present you with a problem I had with translating text from Arabic to Hebrew, Google's mechanism is doing the following procedure: Translated the Arabic text to English, English is then translated to Hebrew, You can't even possible imagine how strong is the phrase "Lost in translation" in this case. BTW, Microsoft translator does much better job than Google's translator when translating from English to Hebrew. Hebrew and Arabic share basic rules, instead of using English in the middle we can use a mechanism that will take advantage of their similarities to translate between them without going through 3rd language. Same for Czech and Slovak, apparently many Czech translators are using the Slovak translation instead of translating from English and vice versa. Apertium website: http://www.apertium.org/ Kind regards, Yaron Shahrabani <Hebrew translator>

...

-- "We cannot solve our problems with the same thinking we used when we created them." Albert Einstein Jesús Franco - Fedora Ambassador and Translator http://fedoraproject.org/wiki/User:Tezcatl http://identi.ca/tzk -- trans mailing list trans(a)lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/trans

Jesús Franco

5:11 a.m.

Yaron, thank you very much for your quick reply, i'll comment interleaving. Yaron Shahrabani wrote:

...

On Wed, Jan 19, 2011 at 10:28 AM, Jesús Franco <tezcatl(a)fedoraproject.org>wrote: > There is no doubt than machine automated translation is helping in the > work of every translator at Fedora/FOSS at a large. But we shouldn't take > for sure "Google is going better and better and i don't need to do > anything after copypasting my strings for translation on it" as someone > as said in the past.

I use Google just marginally, when i'm not able to find a logic translation (done by another human), among shared TM in TAUS and maybe too in forums. I prefer quality on the translation sense over quantity over number of strings translated (unfortunately, i can't talk by other members of spanish translation team :S).

...

I think most of the efforts should be aimed at building Apertium corpuses.

Unfortunately i think apertium is not packaged at Fedora, and in my own experience translating from English to Spanish it returns something like you call "stupid TM". Maybe i'm not getting the point, but my feeling about this is than machines are far of replacing people at this moment, whatever app you use.

...

The problem is that we can't build corpuses so easily, in order to create a working corpus many hours of work are needed, if there was only a simpler way of creating them and maintaining them using AI it would be great (or even creating a graphical tool to aid in this mission).

My thought about the sharing ideas come from my own experience using virtaal which lets me access to my own TM. Domingo Becker has come with a thought about connecting his translation app to Google or something kinda much the same.

...

Building "stupid" TMs with statistical data can be sometimes false and they need lots of AI to get better, building good corpuses with the right rules for every language will help computers understand human language in source and destination languages.

I think we are talking about different directions. I don't believe computers can understand the obscure corners of a whole human language soon. But i'm pretty sure people is not so stupid than machines if they try to do their best. This is why i'm talking about sharing TM among people, not CPUs.

...

I want to present you with a problem I had with translating text from Arabic to Hebrew, Google's mechanism is doing the following procedure: Translated the Arabic text to English, English is then translated to Hebrew, You can't even possible imagine how strong is the phrase "Lost in translation" in this case.

Imagine what could be happened if a "translator" has taken that result just "because is Google". That's exactly my point.

...

BTW, Microsoft translator does much better job than Google's translator when translating from English to Hebrew.

I can't have an idea about that, i use and promote just free (as in freedom) software. It's not about "morality", it's just i can't suggest to a fellow translator "buy this software" (whatever provider he/she can through).

...

Hebrew and Arabic share basic rules, instead of using English in the middle we can use a mechanism that will take advantage of their similarities to translate between them without going through 3rd language. Same for Czech and Slovak, apparently many Czech translators are using the Slovak translation instead of translating from English and vice versa.

I think the same about Português (and even more Catalá) "twins" languages of Castilian (Spanish), it would be easier to work among similar languages, than putting english in the middle. Actually that wicked idea is the "official" approach in Fedora for guides written in a language other than english, translating first to en_US and from there to another languages. Maybe we should stay using english as a common language, but i'm sure than for brazilian and catalan people its easier to understand me in my mother language than through a machine smashing my words in a statistical based software.

...

Apertium website: http://www.apertium.org/

I'm going to apertium site if there are new ideas i can get through there.

...

Kind regards, Yaron Shahrabani <Hebrew translator>

Thanks for sharing your vision about this. Best regards. -- Jesús Franco - Fedora Ambassador and Translator http://identi.ca/tzk

Yaron Shahrabani

6:02 a.m.

On Wed, Jan 19, 2011 at 1:11 PM, Jesús Franco <tezcatl(a)fedoraproject.org>wrote:

...

Yaron, thank you very much for your quick reply, i'll comment interleaving. Yaron Shahrabani wrote: > On Wed, Jan 19, 2011 at 10:28 AM, Jesús Franco > <tezcatl(a)fedoraproject.org>wrote: >> There is no doubt than machine automated translation is helping in the >> work of every translator at Fedora/FOSS at a large. But we shouldn't take >> for sure "Google is going better and better and i don't need to do >> anything after copypasting my strings for translation on it" as someone >> as said in the past. I use Google just marginally, when i'm not able to find a logic translation (done by another human), among shared TM in TAUS and maybe too in forums. I prefer quality on the translation sense over quantity over number of strings translated (unfortunately, i can't talk by other members of spanish translation team :S).

LOL, I share the same thought ☺. I never ran into TAUS, is there a way to take advantage of it in Linux?

...

> I think most of the efforts should be aimed at building Apertium corpuses. Unfortunately i think apertium is not packaged at Fedora, and in my own experience translating from English to Spanish it returns something like you call "stupid TM". Maybe i'm not getting the point, but my feeling about this is than machines are far of replacing people at this moment, whatever app you use.

Lack of training and rules, in TM you can train but you can't create rules.

...

> The problem is that we can't build corpuses so easily, in order to create > a working corpus many hours of work are needed, if there was only a > simpler way of creating them and maintaining them using AI it would be > great (or even creating a graphical tool to aid in this mission). My thought about the sharing ideas come from my own experience using virtaal which lets me access to my own TM. Domingo Becker has come with a thought about connecting his translation app to Google or something kinda much the same.

I truly hope I could contribute back to Google or Microsoft, although I don't have control over it. How about improving open-tran.eu so we can train it? Isn't it a much better resolution? Open-Tran.eu is really low on resources so we can't acheive something like this with the current instance of Open-Tran.eu, also I think users should have the ability to vote on certain translation, this way their statistical significance will be AIed as well... Virtaal 0.7.0 b2 supports open-tran.eu (a bug in previous versions prevented us from using it).

...

> Building "stupid" TMs with statistical data can be sometimes false and > they need lots of AI to get better, building good corpuses with the right > rules for every language will help computers understand human language in > source and destination languages. I think we are talking about different directions. I don't believe computers can understand the obscure corners of a whole human language soon. But i'm pretty sure people is not so stupid than machines if they try to do their best. This is why i'm talking about sharing TM among people, not CPUs.

I'm all into sharing TMs but I think that without defining rules you are just generalizing between languages instead of sharing rules and corpuses, training an app to understand your language gramatically so the app can decompose your words and sentences and then recontruct them into another language is a much cleaner job, althouth we cannot add sarcasm or context to the source sentence we can make the computer understand the grammatical structure of words and sentences instead of statisctically guessing.

...

> I want to present you with a problem I had with translating text from > Arabic to Hebrew, Google's mechanism is doing the following procedure: > Translated the Arabic text to English, English is then translated to > Hebrew, You can't even possible imagine how strong is the phrase "Lost in > translation" in this case. Imagine what could be happened if a "translator" has taken that result just "because is Google". That's exactly my point.

Due to a lack of translation resources (for computing and technical documentation mostly) most kiddies find themselves using Google as a translator without fixing the results, the outcome is an incomprehensible app or doc.

...

> BTW, Microsoft translator does much better job than Google's translator > when translating from English to Hebrew. I can't have an idea about that, i use and promote just free (as in freedom) software. It's not about "morality", it's just i can't suggest to a fellow translator "buy this software" (whatever provider he/she can through).

Its free and supported by Virtaal: http://www.microsofttranslator.com/ Same deal with Google.

...

> Hebrew and Arabic share basic rules, instead of using English in the > middle we can use a mechanism that will take advantage of their > similarities to translate between them without going through 3rd language. > Same for Czech and Slovak, apparently many Czech translators are using the > Slovak translation instead of translating from English and vice versa. I think the same about Português (and even more Catalá) "twins" languages of Castilian (Spanish), it would be easier to work among similar languages, than putting english in the middle. Actually that wicked idea is the "official" approach in Fedora for guides written in a language other than english, translating first to en_US and from there to another languages.

That's generalizing but using the same idea for translation just leads to a bad translation. Languages share certain rules that are lost while translated to each other. Portuguese, Spanish, Catalan and Italian can all share corpuses and rules that you cannot acheive by translating using English in the middle, even in a cultural aspect.

...

Maybe we should stay using english as a common language, but i'm sure than for brazilian and catalan people its easier to understand me in my mother language than through a machine smashing my words in a statistical based software. > Apertium website: http://www.apertium.org/ I'm going to apertium site if there are new ideas i can get through there.

...

> Kind regards, > Yaron Shahrabani > > <Hebrew translator> Thanks for sharing your vision about this. Best regards. -- Jesús Franco - Fedora Ambassador and Translator http://identi.ca/tzk -- trans mailing list trans(a)lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/trans

Robert Antoni Buj i Gelonch

Sunday, 23 January Sun, 23 Jan

12:42 a.m.

Hi, All workshop papers are now available at http://openaccess.uoc.edu/webapps/o2/browse?type=author&value=Interna... Regards, Robert On Wed, Jan 19, 2011 at 1:02 PM, Yaron Shahrabani <sh.yaron(a)gmail.com>wrote:

...

On Wed, Jan 19, 2011 at 1:11 PM, Jesús Franco <tezcatl(a)fedoraproject.org>wrote: > Yaron, thank you very much for your quick reply, i'll comment > interleaving. > > Yaron Shahrabani wrote: > > > On Wed, Jan 19, 2011 at 10:28 AM, Jesús Franco > > <tezcatl(a)fedoraproject.org>wrote: > >> There is no doubt than machine automated translation is helping in the > >> work of every translator at Fedora/FOSS at a large. But we shouldn't > take > >> for sure "Google is going better and better and i don't need to do > >> anything after copypasting my strings for translation on it" as someone > >> as said in the past. > > I use Google just marginally, when i'm not able to find a logic > translation > (done by another human), among shared TM in TAUS and maybe too in forums. > I > prefer quality on the translation sense over quantity over number of > strings > translated (unfortunately, i can't talk by other members of spanish > translation team :S). > LOL, I share the same thought ☺. I never ran into TAUS, is there a way to take advantage of it in Linux? > > > I think most of the efforts should be aimed at building Apertium > corpuses. > > Unfortunately i think apertium is not packaged at Fedora, and in my own > experience translating from English to Spanish it returns something like > you > call "stupid TM". Maybe i'm not getting the point, but my feeling about > this > is than machines are far of replacing people at this moment, whatever app > you use. > Lack of training and rules, in TM you can train but you can't create rules. > > > The problem is that we can't build corpuses so easily, in order to > create > > a working corpus many hours of work are needed, if there was only a > > simpler way of creating them and maintaining them using AI it would be > > great (or even creating a graphical tool to aid in this mission). > > My thought about the sharing ideas come from my own experience using > virtaal > which lets me access to my own TM. Domingo Becker has come with a thought > about connecting his translation app to Google or something kinda much the > same. > I truly hope I could contribute back to Google or Microsoft, although I don't have control over it. How about improving open-tran.eu so we can train it? Isn't it a much better resolution? Open-Tran.eu is really low on resources so we can't acheive something like this with the current instance of Open-Tran.eu, also I think users should have the ability to vote on certain translation, this way their statistical significance will be AIed as well... Virtaal 0.7.0 b2 supports open-tran.eu (a bug in previous versions prevented us from using it). > > > Building "stupid" TMs with statistical data can be sometimes false and > > they need lots of AI to get better, building good corpuses with the > right > > rules for every language will help computers understand human language > in > > source and destination languages. > > I think we are talking about different directions. I don't believe > computers > can understand the obscure corners of a whole human language soon. But i'm > pretty sure people is not so stupid than machines if they try to do their > best. This is why i'm talking about sharing TM among people, not CPUs. > I'm all into sharing TMs but I think that without defining rules you are just generalizing between languages instead of sharing rules and corpuses, training an app to understand your language gramatically so the app can decompose your words and sentences and then recontruct them into another language is a much cleaner job, althouth we cannot add sarcasm or context to the source sentence we can make the computer understand the grammatical structure of words and sentences instead of statisctically guessing. > > > I want to present you with a problem I had with translating text from > > Arabic to Hebrew, Google's mechanism is doing the following procedure: > > Translated the Arabic text to English, English is then translated to > > Hebrew, You can't even possible imagine how strong is the phrase "Lost > in > > translation" in this case. > > Imagine what could be happened if a "translator" has taken that result > just > "because is Google". That's exactly my point. > Due to a lack of translation resources (for computing and technical documentation mostly) most kiddies find themselves using Google as a translator without fixing the results, the outcome is an incomprehensible app or doc. > > > BTW, Microsoft translator does much better job than Google's translator > > when translating from English to Hebrew. > > I can't have an idea about that, i use and promote just free (as in > freedom) > software. It's not about "morality", it's just i can't suggest to a fellow > translator "buy this software" (whatever provider he/she can through). > Its free and supported by Virtaal: http://www.microsofttranslator.com/ Same deal with Google. > > > Hebrew and Arabic share basic rules, instead of using English in the > > middle we can use a mechanism that will take advantage of their > > similarities to translate between them without going through 3rd > language. > > Same for Czech and Slovak, apparently many Czech translators are using > the > > Slovak translation instead of translating from English and vice versa. > > I think the same about Português (and even more Catalá) "twins" languages > of > Castilian (Spanish), it would be easier to work among similar languages, > than putting english in the middle. Actually that wicked idea is the > "official" approach in Fedora for guides written in a language other than > english, translating first to en_US and from there to another languages. > That's generalizing but using the same idea for translation just leads to a bad translation. Languages share certain rules that are lost while translated to each other. Portuguese, Spanish, Catalan and Italian can all share corpuses and rules that you cannot acheive by translating using English in the middle, even in a cultural aspect. > > Maybe we should stay using english as a common language, but i'm sure than > for brazilian and catalan people its easier to understand me in my mother > language than through a machine smashing my words in a statistical based > software. > > > Apertium website: http://www.apertium.org/ > > I'm going to apertium site if there are new ideas i can get through > there. > > Kind regards, > > Yaron Shahrabani > > > > <Hebrew translator> > > Thanks for sharing your vision about this. > Best regards. > -- > Jesús Franco - Fedora Ambassador and Translator > http://identi.ca/tzk > > -- > trans mailing list > trans(a)lists.fedoraproject.org > https://admin.fedoraproject.org/mailman/listinfo/trans > -- trans mailing list trans(a)lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/trans

-- FQN: Robert Antoni Buj Gelonch AKA: Robert Buj Advanced-level training cycle (ca: CFGS) in computer software engineering (ca: DAI) - les Heures, technical academy (SPA) Computer Science Engineer - University of Lleida (UDL), Lleida (SPA) Training course in Handel-C & DK design suite - Celoxica Ltd., Abington (UK) Training course in Xilinx EDK & Microblaze - Autonomous University of Madrid (UAM), Madrid (SPA) Postgraduate Master's degree in .NET Solutions - Open University of Catalonia (UOC), Barcelona (SPA) Postgraduate Master's degree in FOSS (Free and open source software) (upcoming) - Open University of Catalonia (UOC), Barcelona (SPA) ca: M'agradaria millorar el món, però Déu no em dóna el codi font! de: Ich würde gern die Welt verbessern, doch Gott gibt mir den Quellcode nicht! en: I would like to improve the world, but God didn't give me the source code! es: Me gustaría mejorar el mundo, pero Dios no me da el código fuente! gpg fingerprint = 0800 D37B C187 CC6E 9D0C 0AF4 265D 0096 AC78 6412

4835

days inactive

4839

days old

trans@lists.fedoraproject.org

Manage subscription

4 comments

3 participants

tags (0)

participants (3)

Jesús Franco
Robert Antoni Buj i Gelonch
Yaron Shahrabani

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2nd intl.workshop on FOSS rule-based machine translation