-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I'm working on the Jargon Buster. The old XML contains tags that are no longer used. It appears to create a glossary. So what replaces the tags that are there?
<glossentry id="gl-jb-a11y"> <glossterm>a11y</glossterm> <glossdef> <para> An abbreviation for "accessibility," frequently used in programming to avoid unnecessary typing and misspelling. Accessibility is the provision of services for impaired users, such as text-to-speech translation for the visually impaired. The <literal>11</literal> derives from the eleven letters between the beginning <literal>a</literal> and the ending <literal>y</literal>. </para> </glossdef> </glossentry>
Thanks, Eric
On 06/20/2010 01:58 AM, Eric "Sparks" Christensen wrote:
I'm working on the Jargon Buster. The old XML contains tags that are no longer used. It appears to create a glossary. So what replaces the tags that are there?
The validity problem was that the <section> that contained the <glossary> lacked a <title>. I've fixed the validity problem (and cleaned up the rest of the XML a little) and it builds fine.
However, glossaries cannot be translated properly in a number of languages -- notably Chinese and Japanese -- because we cannot yet sort entries in those languages. The output from Publican ends up with entries in a completely jumbled order. (In fact, the characters are being sorted according to their unicode codepoints, but this appears completely random to a human reader.)
We have a long-standing bug against Publican to find a solution [0], and we're slowly chipping away at it as time allows, but it's non-trivial. Note also its appearance on the Publican wishlist, that links to some newly-found work that might help [1].
In the meantime, the only safe solution is not to use DocBook for any such content, where information is simply sorted according to alphabetical order. Until the translation issues can be solved, the Wiki is a much better place for this type of material.
Cheers Rudi
[0] https://bugzilla.redhat.com/show_bug.cgi?id=475684 [1] https://fedorahosted.org/publican/wiki/WishList
On 06/21/2010 10:31 AM, Ruediger Landmann wrote: <snip>
However, glossaries cannot be translated properly in a number of languages -- notably Chinese and Japanese -- because we cannot yet sort entries in those languages.
Are we able to get publican to call an external sorter or external string comparison utility?
Am just thinking that it sounds like publican's present sorting method is internally coded, and not how we need it to be.
If we can get publican to call a script or utility with the strings for sorting, plus accept the return values, would that let us leverage from any other package which has the sorting algorithm right? (even if it's an ugly hack for the short term)
Regards and best wishes,
Justin Clift
On 06/21/2010 04:00 PM, Justin Clift wrote:
On 06/21/2010 10:31 AM, Ruediger Landmann wrote:
<snip> > However, glossaries cannot be translated properly in a number of > languages -- notably Chinese and Japanese -- because we cannot yet sort > entries in those languages.
Are we able to get publican to call an external sorter or external string comparison utility?
Am just thinking that it sounds like publican's present sorting method is internally coded, and not how we need it to be.
If we can get publican to call a script or utility with the strings for sorting, plus accept the return values, would that let us leverage from any other package which has the sorting algorithm right? (even if it's an ugly hack for the short term)
Publican presently implements the sorting that's built into DocBook itself. And yes, the solution to this will no doubt be to use external modules to provide sorting for at least Chinese and Japanese. The links on the Publican Wishlist point to some efforts in this direction that look promising.[0]
Any ideas or suggestions are welcome on the Publican mailing list.[1]
Cheers
Rudi
[0] https://fedorahosted.org/publican/wiki/WishList [1] https://www.redhat.com/mailman/listinfo/publican-list
On Mon, 21 Jun 2010, Ruediger Landmann wrote:
On 06/20/2010 01:58 AM, Eric "Sparks" Christensen wrote:
I'm working on the Jargon Buster. The old XML contains tags that are no longer used. It appears to create a glossary. So what replaces the tags that are there?
The validity problem was that the <section> that contained the <glossary> lacked a <title>. I've fixed the validity problem (and cleaned up the rest of the XML a little) and it builds fine.
However, glossaries cannot be translated properly in a number of languages -- notably Chinese and Japanese -- because we cannot yet sort entries in those languages. The output from Publican ends up with entries in a completely jumbled order. (In fact, the characters are being sorted according to their unicode codepoints, but this appears completely random to a human reader.)
We have a long-standing bug against Publican to find a solution [0], and we're slowly chipping away at it as time allows, but it's non-trivial. Note also its appearance on the Publican wishlist, that links to some newly-found work that might help [1].
In the meantime, the only safe solution is not to use DocBook for any such content, where information is simply sorted according to alphabetical order. Until the translation issues can be solved, the Wiki is a much better place for this type of material.
Cheers Rudi
[0] https://bugzilla.redhat.com/show_bug.cgi?id=475684 [1] https://fedorahosted.org/publican/wiki/WishList
So how do you propose we put together the Jargon Buster? I understand the problem but I don't really know what to do with it.
--Eric
On Mon, Jun 21, 2010 at 12:34:51PM -0400, Eric Christensen wrote:
On Mon, 21 Jun 2010, Ruediger Landmann wrote:
On 06/20/2010 01:58 AM, Eric "Sparks" Christensen wrote:
I'm working on the Jargon Buster. The old XML contains tags that are no longer used. It appears to create a glossary. So what replaces the tags that are there?
The validity problem was that the <section> that contained the <glossary> lacked a <title>. I've fixed the validity problem (and cleaned up the rest of the XML a little) and it builds fine.
However, glossaries cannot be translated properly in a number of languages -- notably Chinese and Japanese -- because we cannot yet sort entries in those languages. The output from Publican ends up with entries in a completely jumbled order. (In fact, the characters are being sorted according to their unicode codepoints, but this appears completely random to a human reader.)
We have a long-standing bug against Publican to find a solution [0], and we're slowly chipping away at it as time allows, but it's non-trivial. Note also its appearance on the Publican wishlist, that links to some newly-found work that might help [1].
In the meantime, the only safe solution is not to use DocBook for any such content, where information is simply sorted according to alphabetical order. Until the translation issues can be solved, the Wiki is a much better place for this type of material.
Cheers Rudi
[0] https://bugzilla.redhat.com/show_bug.cgi?id=475684 [1] https://fedorahosted.org/publican/wiki/WishList
So how do you propose we put together the Jargon Buster? I understand the problem but I don't really know what to do with it.
If Rudi's suggesting moving it to the wiki, I'm a bit sad, since I spent a lot of personal time XML'ifying it from the wiki some time back.
Does this not help at all? http://www.sagehill.net/docbookxsl/GlossarySort.html
If we're not using any glossdiv to divide up entries, will glossary.sort work?
On 06/22/2010 06:25 AM, Paul W. Frields wrote:
If Rudi's suggesting moving it to the wiki, I'm a bit sad, since I spent a lot of personal time XML'ifying it from the wiki some time back.
My suggestion is moving it back to the wiki until we get the sorting problems fixed. Long term, XML is definitely the right solution.
Does this not help at all? http://www.sagehill.net/docbookxsl/GlossarySort.html
No; that's what we do now. It works fine for English, doesn't work at all for Chinese and Japanese, and we haven't rigorously tested what happens at the edges of other languages that use Latin script (keeping in mind that not every language that uses the Latin script collates letters the the same way that English does or orders them the same way that English does all the time).
We can fix most weirdness in alphabetic or syllabic writing systems upstream in the DocBook locale files. Writing systems that use ideograms are the present stumbling block.
If we're not using any glossdiv to divide up entries, will glossary.sort work?
glossdiv isn't the problem here (and we're not actually using it in this case at all). You're right, however, that glossdiv makes thing much, much worse, even in languages that use the same writing system.[0]
Cheers Rudi
[0] an illustration from the Publican User Guide -- http://tinyurl.com/2gx4yq3
On Tue, Jun 22, 2010 at 09:15:09AM +1000, Ruediger Landmann wrote:
On 06/22/2010 06:25 AM, Paul W. Frields wrote:
If Rudi's suggesting moving it to the wiki, I'm a bit sad, since I spent a lot of personal time XML'ifying it from the wiki some time back.
My suggestion is moving it back to the wiki until we get the sorting problems fixed. Long term, XML is definitely the right solution.
Fair enough -- my personal disappointment is probably partly due to my not having enough foresight about handling this problem, or for that matter being aware of it. :-)
Does this not help at all? http://www.sagehill.net/docbookxsl/GlossarySort.html
No; that's what we do now. It works fine for English, doesn't work at all for Chinese and Japanese, and we haven't rigorously tested what happens at the edges of other languages that use Latin script (keeping in mind that not every language that uses the Latin script collates letters the the same way that English does or orders them the same way that English does all the time).
We can fix most weirdness in alphabetic or syllabic writing systems upstream in the DocBook locale files. Writing systems that use ideograms are the present stumbling block.
Yeah, good points all. This is a tricky problem.
If we're not using any glossdiv to divide up entries, will glossary.sort work?
glossdiv isn't the problem here (and we're not actually using it in this case at all). You're right, however, that glossdiv makes thing much, much worse, even in languages that use the same writing system.[0]
[0] an illustration from the Publican User Guide -- http://tinyurl.com/2gx4yq3
OK -- perhaps to make the transition easier, I could probably write a XSL snippet to get the glossary.xml file transferred quickly to wikitext. That would save someone else a bunch of extra labor, right?
On 06/23/2010 12:36 AM, Paul W. Frields wrote:
Fair enough -- my personal disappointment is probably partly due to my not having enough foresight about handling this problem, or for that matter being aware of it. :-)
Another possible solution occurred to me on the train in the way in here this morning -- if (at least for the time being), we positioned this as a glossary of English-language terminology, it makes the collation problem go away, since the primary sort will be on the English term, which we know sorts correctly.
For example, at the moment, we have a Portuguese translation for the "package" entry, as follows:
pacote
Os utilizadores normalmente referem-se a um ficheiro RPM como um pacote.
Which of course will be sorted as "pacote". If we add the English term, we would get an entry like this:
package: pacote
Os utilizadores normalmente referem-se a um ficheiro RPM como um pacote.
sorted under "package". The purpose of the translated glossary now shifts a little from being purely a "dictionary" (as it is in English) into a "translation dictionary".
This makes sense, because the Portuguese "Glossário de Termos" ("Jargon Buster") only contains explanations and translations of terms that originate in English and are in common use in English; it does not contain definitions for any local computing or FOSS jargon or slang of Portuguese (language) origin.
Just another possible take on the problem.
Cheers Rudi