In the past we have used mw-render to convert wiki pages to XML. It
isn't 100%, but it can be a huge time saver during the initial
conversion from the wiki. mw-render is part of the python-mwlib
package.
I put together some scripts to read a list of beats and convert them,
and also a sed script to clean up some of the more obvious hiccups from
using mw-render.
mw-render has been kind of up and down. At least as of F18 it was
working. I haven't tried it since.
I used the following script to read a list of beat names and convert
them to XML. Although the conversion isn't 100%, it is way better than
cut and paste when you have a number of beats to do. For later cleanup,
cut and paste is better:
#!/bin/sh
MWR="mw-render_out"
XML="XML_Files"
rm -Rf ${XML} ${MWR}
mkdir ${XML}
mkdir ${MWR}
for i in `cat ./WikiList`;
do
BEAT=Documentation_${i}_Beat
echo ====== $BEAT =======
/usr/bin/mw-render -c
http://fedoraproject.org/w/ \
-w docbook $BEAT -o ${MWR}/${i}.inter;
sed -f sedscr ${MWR}/${i}.inter >${XML}/${i}.xml
done
My clumsy sed script is:
s/<sectioninfo>//
s?</sectioninfo>??
s/<para>/\n<para>\n/
s?</para>?\n</para>?
s/></>\n</
s?<emphasis>?<package>?
s?</emphasis>?</package>?
s? </title?</title?
s?<itemizedlist?\n<itemizedlist?
s/<book>//
s?</book>??
s?<articleinfo>??
s?</articleinfo>??
s?<article lang="en">??
s?</article>??
Mostly it gets rid of things, and also puts some key tags at the
beginning of the line for easier editing and formatting. I generally
use emacs to reformat the result so it is readable.
The WikiList is simply a list of beat names like:
Printing
Desktop
Productivity
Networking
Not perfect, but a good time saver.
--McD