Converting PyDoc to Docbook XML?

Thu Jul 26 15:56:16 UTC 2012

On Thu, Jul 26, 2012 at 09:26:30AM -0400, Steve Gordon wrote:
> Hi all,
> 
> Has anyone attempted converting PyDoc to Docbook XML recently with success? I've tried a project called HappyDoc but it seems to be unmaintained and pretty poorly documented, though it claims on the project site to be able to generate DocBook XML.
> 
> I have also tried generating the PyDoc HTML with `pydoc -w <module>`, converting it to XHTML with tidy:
> 
>     tidy -indent -m -asxhtml -clean -bare -omit ovirtsdk.infrastructure.brokers.html
> 
> ...and then turning it into Docbook with an XSLT [1]:
> 
>     java -cp "/usr/share/java/xalan-j2.jar" org.apache.xalan.xslt.Process -XSL html2db.xsl -IN <module>.html > <module>.xml
> 
> The resulting XML still has a *lot* of errors according to XMLLINT, even before applying the DTD rules. Anyone had any success with a variation of this or maybe a different method entirely?
> 
Do you need to be able to turn anything that pydoc displays into docbook or
just certain things that you have written?  Is this for an ongoing
conversion or a one-off?  pydoc itself is a somewhat naive displayer of text
that's in docstrings.  docstrings are just text formatted so that humans can
look at it and assign meaning based on their past experience.  So simply
converting from pydoc's output to docbook isn't likely to work well.

There's two different approaches I can think of to take -- I haven't used
either but they have a higher likelihood of working than raw pydoc:

* Happydoc is the only tool I've found that extracts python API
  documentation and spits out docbook.  happydoc parses python source files
  to do this.  This allows doing things like extracting both comments and
  docstrings but it also means it does not work with C extensions, only pure
  python code.  I haven't played around with happydoc in years so I don't
  know how good it is at guesing what the semantic meaning of docstrings are
  in order to output nice docbook.
  http://happydoc.sourceforge.net/

* Sphinx is a tool that is widely used by python modules (the python stdlib
  uses docs handcoded in sphinx/restructuredtext.  Other python modules use
  sphinx to extract documentation from docstrings).  You write your
  docstrings in restructuredtext with many extensions to mark the semantic
  content of your document.  Then the sphinx toolchain is used to convert
  that into an output format.  There is not currently a builder for docbook
  so if you went this route, someone would have to write a builder that does
  that.  The advantage that balances that is that sphinx is a semantic markup
  system so you have well-defined markup entities that you can convert from
  rather than guessing.
  http://sphinx.pocoo.org/contents.html

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/docs/attachments/20120726/93f1c032/attachment.sig>