Hi,
Some time ago, I introduced a dependency against the python-markdown2
module in Bodhi. [1][2]
Today, two bugs were opened against Bodhi, both related to the way the
Markdown module parses the text we feed it:
-
https://fedorahosted.org/bodhi/ticket/395
-
https://fedorahosted.org/fedora-infrastructure/ticket/2033
The first one is caused by the fact that « _ » is interpreted by
Markdown: _foo_ is translated to <em>foo</em> and __foo__ to
<strong>foo</strong>.
This doesn't seem like a problem, except in cases like $SOME_LONG_VAR
or
http://foo-bar.com/foo_3_2.tar.gz, which are rather likely to
appear in updates description.
I fixed it by disabling the interpretation of « _ » and « __ ». Not a
great solution though, as this makes us not follow completely the
Markdown syntax.
The second one however is way more tricky, as it comes from the regexp
used to parse the string to translate. Basically, a string like «
***** Important ***** » can be translated in several ways, depending
on the regexp/machine state used:
1. <strong><em>* Important *</em></strong>
2. <strong><em></strong> Important
<strong><em></strong>
3. <em></em><strong><em> Important **</em></strong>
The parser could even try to be smart and close open tags properly
(Trac does something like that, in its own syntax which is close to
Markdown):
4. <strong><em></em></strong> Important
<strong><em></em></strong>
(</em> were added to close the tags)
The possibilities are basically endless.
Unfortunately, the python-markdown2 module does the second. The
resulting HTML is thus not valid, so Kid gives an error 500.
I tried looking at the python-markdown2 module, but the fix won't be
easy (at least not for me :). Also, I doubt it will be useful to
report the bug and eventually submit a patch, as the project seems
dead upstream since december (last commits/mail discussions, bugs are
not answered even with patches attached,...).
One thing I'm really sad of, is that I had seen there was another
Markdown python module: python-markdown [3], but I chose the other
one.
It seems anterior to python-markdown2, however, it is still actively
maintained (last release was in April 2009, but mails and bugs are
still answered by the dev, development is active in Gitorious [4]).
Also, it has two nice side-effects:
1. it fixes our first issue in a much more elegant way, as it is able
to translate _foo_ to <em>foo</em>, while recognizing that it
shouldn't translate $SOME_LONG_VAR to $SOME<em>LONG</em>VAR (ignore
underscores inside a word)
2. it interprets the string « ***** Important ***** » as «
<strong><em>** test </em></strong>** », which (even if not really
the
best translation prettyness-wise), is valid HTML.
All in all, I would be more confident with this module than with the
current one (and I'm not even counting the fact that it has an
extensive test suite).
Should we rebase Bodhi on this module? The port is trivial,
python-markdown is available both in Fedora and EPEL, but I don't
really like moving to another dependency just because we encounter
bugs, especially since I should have more carefully chosen the module
to use in the first place. :(
What do you think?
[1]
https://fedorahosted.org/bodhi/ticket/286
[2]
http://code.google.com/p/python-markdown2/
[3]
http://www.freewisdom.org/projects/python-markdown/
[4]
http://gitorious.org/python-markdown
----------
Mathieu Bridon