IRC & Gobby session: DocBook XML

Paul W. Frields stickster at gmail.com
Tue Aug 22 22:30:07 UTC 2006


A few people came by IRC and we started up a gobby[1] session to talk
about how DocBook XML works.  Below you'll find the IRC log and the
subsequent gobby log.  I've edited them slightly for readability.  I'm
also attaching copies of the documents we used, inline where they make
sense.  Hopefully people can read this from beginning to end and the
flow generally makes sense, even when we hopped from IRC onto gobby.
There's a bit of Emacs-specific stuff in there[2], but the bulk is
devoted to DocBook XML and how to start writing it using Your Favorite
Editor.

[1] gobby: Interactive, collaborative document editor (available in
Fedora Extras)
[2] And yes, I was very fair to users of vi (several of whom were in
attendance).  Arguments and flames to /dev/null please. ;-)



###
### IRC LOG FOLLOWS
###
###
<stickster>	OK, this might be weird with no one asking questions, but
here goes...
<stickster>	Hands down, the best possible way to write documentation for
Linux or Fedora is DocBook XML.
<cdehaan>	Hm, haven't learned that.
<stickster>	XML is a markup language that looks on its face a lot like
HTML. And in fact, it really isn't that different at all.
<cdehaan>	I never understood how the Wiki was entangled with XML and the
final output of documentation.
<stickster>	Aha!
<stickster>	So I've found someone who wants to ask questions.
<stickster>	Well, skip my stupid intro then, and let's get down to brass
tacks.
<stickster>	The "entanglement" is that we wanted a place where people
could write documents easily without having to learn anything *at
first.*  So quaid has been working with a Summer of Code intern to have
a way to take documents written on the wiki and convert them (more or
less) to DocBook XML.
<cdehaan>	Hm
<stickster>	The sticky, unavoidable problem is that DocBook allows you
to mark a document -- if you want -- with fairly detailed and exact
information.  The Wiki simply can't do it -- not without so many hacks
that no one would want to run the resulting code
<stickster>	But what we can do is have the Wiki -> DocBook converter
make some educated guesses based on how the writer formats the document.
That's why we require specific tagging like `this` for a file or
program, '''this''' for the name of a GUI application, and so on
<stickster>	The converter reads that and turns it into a "best-guess"
DocBook document.  An editor can then go through the document and tag it
correctly.
<cdehaan>	Aah.
<stickster>	Like HTML, XML uses tags to mark elements.  So a filename in
DocBook XML looks like this:  <filename>/etc/services</filename>
<stickster>	However, on the Wiki, a filename is marked like
`/etc/services`, a program is marked like `yum`, and a package name is
marked like `yum-utils`.  So the converter will often tag wrong, and
someone has to correct it in the XML version.
<stickster>	The converter can't tell whether `yum` is a file name,
program name, or package name.  But in DocBook you can!
<stickster>	In DocBook these would be <filename>yum</filename>,
<command>yum</command>, or <systemitem>yum</systemitem>.
<stickster>	(The last one is actually an approximation since the DocBook
version we use doesn't yet have an exact markup for "package.")
<cdehaan>	Hm
<stickster>	Once we have it marked up correctly, though -- how do you
get it back onto the wiki?  According to some people, the answer is "You
don't."  Because every time you had corrections, the version coming
*back* from the Wiki would be wrong. Again.
<cdehaan>	So then, does it cause problems to edit in the Wiki?
<cdehaan>	Is the eventual goal to have everyone using XML?
<stickster>	Problems? No. At least not at first.  In other words, the
drafting can and should be done on the Wiki.  It's a great place for
people to collaborate when they don't know how to use DocBook or CVS
yet.
<stickster>	As for the second question -- it depends on whom you ask.
<stickster>	My personal feeling is that yes, we do want people to work
using XML and CVS, because those can be just as collaborative as a
Wiki. 
<stickster>	What we *don't* want is to scare people off before they get
their feet wet.
<cdehaan>	Ok. So what if someone wanted to work on something, say DUG,
but wanted to use XML? that would be a perfect mess, since someone using
the Wiki and someone using CVS with XML would be using different
versions.
<stickster>	In other words, if you like writing and working on docs,
after working on the Wiki for a little while, you will probably get
eager to try your hand at DocBook and CVS.  
<stickster>	cdehaan: *Exactly.*
<cdehaan>	I like the idea of having one system better, but that's tough,
since the stuff I'm working on now (albeit very, very minute), are on
the Wiki.
<stickster>	The solution to this problem is that drafting should be done
in the wiki -- as long as the participants are all happy to do so -- and
once a draft is done, it can be converted to DocBook.
<stickster>	By that point the contributors should be ready to try the
Better Way of doing business. ;-)
<stickster>	Now, keep in mind this isn't coming from someone who is a
hard-core programmer, or a kernel hacker. 
<cdehaan>	I prefer HTML-style markup, though I'm not familiar with XML.
<cdehaan>	HTML is what I do.
<stickster>	I knew *NOTHING* about DocBook, or XML, or CVS, before I
started with this project.
<stickster>	DocBook XML is as easy as HTML.
<stickster>	The idea is exactly the same. 
<stickster>	You're probably already familiar with XHTML then.
<stickster>	cdehaan: What may trouble you at first is the number of
elements you can use in DocBook.
<Eitch>	I see
<stickster>	After all, most people can handle the easy <html>, <head>,
<body>, <h1>, <p> and so forth in simple HTML
<stickster>	cdehaan: So the idea with DocBook is that you typically are
writing an <article>
<stickster>	Inside the <article> you can have other elements.  You don't
have to use all of them; most are optional.
<cdehaan>	i see.
<stickster>	cdehaan: I was thinking, maybe a gobby instance would
help... are you on a FC5 box?
<cdehaan>	Now that I know this, I really think it's detrimental to the
progress of things to have some things Wiki, some CVS.
<cdehaan>	I'm on OS X at the moment.
<stickster>	Oh, OK.
<stickster>	Well, no then.
<cdehaan>	My MacBook (With FC5) is in for repair (surprise)
<stickster>	Heh.
<cdehaan>	gobby runs on OS X
<cdehaan>	Apparently
<stickster>	Oh, well if you want to set it up real quick, I am going to
try and get a server working here.
<stickster>	I have no idea how to do it, but I bet it's a simple
process.
<cdehaan>	I will do that.
<cdehaan>	It looks to be easy on Windows. I'll go to my Windows box.
<cdehaan>	BRB
###
###
###
###
###     a few minutes later, and after a couple false starts...
###     GOBBY LOG STARTS
###
###
###
###
<cdehaan> hey hey.
<stickster> I'm going to clear out this document for a fresh start.
<cdehaan> OK
<cdehaan> Aah.
<stickster> So here's how we start a document.
###
###
###
###
###     <?xml version="1.0" encoding="UTF-8"?>
###
###
###
###
<cdehaan> Indeed.
<stickster> That's pretty much verbatim.  Never changes.
<Eitch> o>
<stickster> It's just a signifier that the rest of the document will be
an XML 1.0 document of some type, yet unknown.
<stickster> Next, we put in a "document type declaration" or DTD that
explains EXACTLY what type of XML document this is.
###
###
###
###
###     <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
###     "http://docbook.org/xml/4.4/docbookx.dtd">
###
###
###
###
<stickster> Note the DTD has five pieces inside a <! > entity:
<stickster> 1.  DOCTYPE -- in other words, this is a DTD entitiy
<stickster> 2.  article -- the top level element of this document is a
DocBook "article".  You can actually make smaller pieces than that, and
they're perfectly legal.  That in fact is how we combine "chapter"
pieces into a "book".
<stickster> 3.  PUBLIC -- that means that the next piece of the DTD is a
public identifier (well-known) that describes this document's allowed
elements, attributes, etc.
<stickster> If it wasn't PUBLIC it would be SYSTEM, which would mean
some URI on your local box, so you can write XML without a well-known
public DTD, like some private DTD you've made up for your own use.
<stickster> anyhoo...
<stickster> 4. "-//OASIS//DTD DocBook XML V4.4//EN" -- that's called the
FPI string, which is the public identifier for DocBook XML, version 4.4
<stickster> It always looks just like that.  If you don't type it right,
some tools will act strangely.  So copy it from somewhere that works, or
just learn from your mistakes like I did. :-D
<stickster> 5.  "http://docbook.org/xml/4.4/docbookx.dtd" -- that is
also a well-known URL for the actual DTD that describes DocBook XML
V4.4.  There's another one at a different URL located at OASIS web site:
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd  is the right
one, I think.
<stickster> OK...
<cdehaan> So basically what we've done is declared what type of
language/document is to follow?
<stickster> Exactly!
<stickster> If you still don't know what a DTD is, here's the scoop:
For things like DocBook, it's a huge bunch of stuff that's not fun to
read, but which describes how the DocBook schema works.
<stickster> What we've done with this "entity" is to declare that for
any tool that understands how to parse or validate XML, it will find all
the rules it needs to understand this particular schema at the noted
URL. Then the tool can do all the work for you, like checking that your
document is valid.
<stickster> Generally, if you know you're writing an <article>, you can
just copy this from some other article with absolutely zero work on your
part.
<stickster> Since #2 element was an article, that means that our
top-level element is (must be) an <article>.
<cdehaan> What if "#2 element" is NOT article...
<stickster> Well, your computer won't start smoking or anything, but
validation tools will tell you that your document is invalid.
<cdehaan> Ok
<stickster> And all the fun little helpful stuff in Emacs or vi will
start giving you warnings
<cdehaan> But our opening tag should always correspond to the element?
<stickster> Yes
<cdehaan> Alrighty
<stickster> That is always true in DocBook XML and may be true for other
DTDs as well
<stickster> OK, so now that we have an article, what can we put in it?
<cdehaan> More tags
<stickster> Right.
<stickster> In fact, in many cases you can't just start typing text,
because it would be "invalid" according to the DTD.
<stickster> This, for example, is invalid:
###
###
###
###
###     <?xml version="1.0" encoding="UTF-8"?>
###     <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
###     "http://docbook.org/xml/4.4/docbookx.dtd">
###
###	<article>
###	    This is my article.  Isn't it great?
###	</article>
###
###
###
###
<stickster> Now, that *IS* what we call "well-formed" XML, because all
the tags open and close without overlapping each other.
<stickster> Just like HTML, right?
<stickster> It may be "well-formed," but it isn't "valid."  Because the
DocBook DTD tells us that you're not allowed to just type text in your
article like that.
<stickster> (short footnote: From time to time, I might get particular
DTD details wrong.  It's not like I've memorized the whole thing.  I'm
used to doing things a certain way, and there's a chance -- a pretty
good one in fact -- that I could be wrong about where you can and can't
do things.)
<stickster> Aha, I can demonstrate this!
<stickster> I'm going to start another document which you guys should
join.  I'll post my command lines there along with the output.
<cdehaan> ok
<stickster> I'm using an FC5 box, so I'll describe that environment.
<stickster> If you do a "yum groupinstall 'Authoring and Publishing'"
you will get all the tools I am using.
<stickster> Including the one above.  If I save the XML document we just
wrote, and run that xmllint command, here's what the output looks like:
###
###
###
###
###	$ xmllint --postvalid docbook-test.xml
###	<?xml version="1.0" encoding="UTF-8"?>
###	<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
###	"http://docbook.org/xml/4.4/docbookx.dtd">
###	<article>
###        This is my article.  Isn't it great?
###	</article>
###	docbook-test.xml:5: element article: validity error : Element article content does not follow the DTD, expecting ((title , subtitle? , titleabbrev?)? , articleinfo? , tocchap? , lot* , (((calloutlist | glosslist | bibliolist | itemizedlist | orderedlist | segmentedlist | simplelist | variablelist | caution | important | note | tip | warning | literallayout | programlisting | programlistingco | screen | screenco | screenshot | synopsis | cmdsynopsis | funcsynopsis | classsynopsis | fieldsynopsis | constructorsynopsis | destructorsynopsis | methodsynopsis | formalpara | para | simpara | address | blockquote | graphic | graphicco | mediaobject | mediaobjectco | informalequation | informalexample | informalfigure | informaltable | equation | example | figure | table | msgset | procedure | sidebar | qandaset | task | anchor | bridgehead | remark | highlights | abstract | authorblurb | epigraph | indexterm | beginpage)+ , (sect1* | refentry* | simplesect* | section*)) | sect1+ | refentry+ | simplesect+ | section+) , ((toc | lot | index | glossary | bibliography) | appendix | ackno)*), got (CDATA)
###	Document docbook-test.xml does not validate
###
###
###
###
<stickster> Aha!  Check it out.  The validator tells me that I goofed
up.
<stickster> It doesn't follow the DTD -- xmllint just told me so. (Read
error message closely for details.)
<stickster> Starting on line 5, where I typed some text inside the
<article> element, I goofed.  The validator was expecting -- well...
something else.
<stickster> All those names and markings reflect the real DTD
expectations.  (a, b, c) means any one of <a>, <b>, or <c>, for example.
<stickster> Question marks mean "optional," + means "one or more," and *
means "zero or more"
<stickster> Now -- DON'T PANIC. You don't have to remember ANY of this
stuff.
<cdehaan> Heh
<cdehaan> as eyes glaze over
<stickster> This is only a nice little reminder that you did something
wrong.  If you want to find out how to write a document, you don't read
xmllint output. :-)
<stickster> The best idea is to look at a document that's already
written, and follow its lead.  But an idea that is second only to that
in goodness is to download or bookmark "DocBook: The Definitive Guide,"
which is available for download free on the web.
<stickster> http://www.docbook.org/
<stickster> You don't have to do it now, just hold on to that for later.
<stickster> I will try and find a link to directions to getting the
exact V4.4 version out of their CVS, since the one on the web is for
beta-5.0
<stickster> OK
<stickster> .
<stickster> .
<stickster> .
<stickster> Now, let's see what happens when I put my document in the
correct form.
<stickster> Heh
<cdehaan> Oops.
###
###
###
###
###     <?xml version="1.0" encoding="UTF-8"?>
###     <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
###     "http://docbook.org/xml/4.4/docbookx.dtd">
###
###	<article>
###	  <title>My Article</title>
###	  <para>
###	    This is my article.  Isn't it great?
###	  </para>
###	</article>
###
###
###
###
<stickster> OK, I just corrected the article if you'd like to look at it
<cdehaan> Ok, simple tags.
<stickster> Yup, I kept it REALLY simple for this demonstration.  Our
articles for FDP have quite a bit more information, but even what we use
is not that complicated.
<stickster> Now, I'm going to show you the new validation in the
shell-output.txt doc.
<stickster> There's the command.
###
###
###
###
###	$ xmllint --noout --postvalid docbook-test.xml
###	$
###
###
###
###
<stickster> And the output is:
<stickster> NOTHING.
<cdehaan> Sweet.
<cdehaan> WB Bob!
<stickster> The "--noout" means I don't want xmllint to output the
parsed document (which usually looks very much like the input), but
rather just tell me if there's a problem.
<stickster> In this case -- like most Linux commands -- if you don't see
output, everything worked fine.
<stickster> Valid document!
<cdehaan> Cool. You type the command, if you see no output whatsoever,
it's good?
<stickster> Yup.
<stickster> Now, I did that at a command line, but normally I don't even
do that.
<stickster> I write all my DocBook in Emacs along with the "psgml"
package installed.
<stickster> Along with a short configuration file, it means I can do all
of this work right in the editor.
<stickster> MOREOVER --
<stickster> Emacs WILL NOT LET ME MAKE A MISTAKE unless I force it to do
so.
<cdehaan> Oh?
<cdehaan> At what point does it intervene? saving?
<stickster> There's about -- hmm -- maybe TEN special keystrokes I
learned to edit DocBook documents.
<stickster> To open an element, I use either Ctrl+<  or Ctrl+C, Ctrl+E
<stickster> If I then use the [Tab] key, Emacs will do autocompletion --
and it ONLY provides the valid tags.
<stickster> If I really want to, it will let me type something wrong,
but that would be kind of silly.
<cdehaan> I see.
<stickster> If I hit [Tab][Tab], like bash, it will open up a frame with
the available elements (tags) in it
<cdehaan> Cool
<stickster> To close an open element, I use the Ctrl+/  key combo.  
<stickster> If I want to validate a document I'm done drafting, I use
Ctrl+C, Ctrl+V key sequence.
<stickster> Sorry, that last one should have been Ctrl+C, Ctrl+/  to
close an element
<stickster> Most key sequences like this in Emacs are Ctrl+C,
<something>
<stickster> There's lots of other functions, too -- I can block a whole
region and then put it in an element
<stickster> I can sit somewhere in a tagged region and change the
element it's in
<stickster> Many others.
<stickster> But there's only around 7 - 10 functions that I needed to
learn to get started.
<cdehaan> Cool
<stickster> It's not GUI, and it's not WYSIWYG, but what you give up in
mouse-ability you more than make up for in PHENOMENAL COSMIC POWER
<stickster> Many UNIX gurus I know do everything in Emacs.
<cdehaan> I've never used it, so I'll have to check it out.
<stickster> I'm not at that level... but it's the dog's bollocks for
DocBook.
<cdehaan> I presently use vi for text editing
<stickster> vi also has many adherents.  Tommy Reynolds, one of our
resident gurus, is a big vi fan.  I don't bother arguing with him about
it, and I won't bother arguing with you about it either.
<stickster> USE WHAT WORKS.
<stickster> But there's no denying that DocBook XML is the absolute best
way to do markup for technical documentation.
<Eitch> /me uses vim
<cdehaan> Eh, if emacs makes it that easy, no reason not to use it.
<stickster> Once you have a document in DocBook, you can turn it into
(X)HTML, TeX, PDF...
<stickster> I think vim probably has some kind of similar support.  I
have no idea how it works.  I used to be a big-time vi lover but after I
tried Emacs for DocBook, I never went back.  Now I only use Emacs.
<stickster> You can probably find people who are the exact opposite.
The key is, use an editor that (1) supports XML, and (2) supports DTD
parsing and validation.
<stickster> If it does that, DocBook writing becomes SUPER-easy.
<stickster> The editor keeps you from making DTD-related mistakes like
incorrect element names or placement, and you can just concentrate on
your document instead.
<stickster> It is worth reading DocBook: The Definitive Guide to get a
handle on what DocBook's all about, and here's what I use that book most
for --
<stickster> I downloaded it to my local disk, and it has a big list of
every element in alphabetical order.  If I'm looking for a tag that does
X, I look up "X" in the Table of Contents and it's usually there, or
something like it.
<stickster> If I am talking about a keyboard key, I look up "key" and
voila! There's "keycap".
<stickster> After you use a tag a few times, it sticks with you.  So
now, just like I know that <table><tr><td>something</td></tr></table>
does stuff in HTML, I know that
<keycombo><keycap>Ctrl</keycap><keycap>C</keycap></keycombo> does stuff
in DocBook
<stickster> It keeps you from having to decide, "Hmm, should I write
Applications -> Internet, or Applications => Internet?"  Nope, instead
you just do
<guimenuchoice><guimenu>Applications</guimenu><guisubmenu>Internet</guisubmenu></guimenuchoice>
<stickster> When you render it to HTML it comes out with the neat little
arrows inserted because they're part of the stylesheets for turning
DocBook into HTML.
<stickster> So..
<stickster> .
<stickster> .
<stickster> Does this make it a little less terrifying?
<stickster> This was all *completely* new to me just a few years ago.
quaid, Tammy Fox, and a couple other nice folks taught me what I needed
to know, and the rest you just kind of pick up as you go along.
<cdehaan> It makes perfect sense
<cdehaan> Basically just learning tags and using a good editor.
<stickster> If only I'd had Gobby when I started!!! :-)
<cdehaan> Heh
<cdehaan> :)
<stickster> Now, next week we'll tackle CVS.
<stickster> :-)
<stickster> Is this a good time for doing this?
<stickster> I know some of you are in different TZ
<cdehaan> it works. *I* probably won't be on next week, as next Tuesday
I'll be moving into my dorm.
<stickster> No problem
<cdehaan> But in the fairly near future, yes.
<stickster> Maybe we'll shoot for a different time.  I'll start a thread
in fedora-docs-list and people can get back to me with a time
<cdehaan> that works.
<stickster> OK then, thanks for coming by, everybody!
<cdehaan> Thanks for the intro!
<stickster> You're welcome, thanks for helping with FDP
<stickster> OK Eitch, I'm going to close up shop here
###
###
###
###	END TRANSCRIPT
###

-- 
Paul W. Frields, RHCE                          http://paul.frields.org/
  gpg fingerprint: 3DA6 A0AC 6D58 FEC4 0233  5906 ACDB C937 BD11 3717
       Fedora Project Board: http://fedoraproject.org/wiki/Board
    Fedora Docs Project:  http://fedoraproject.org/wiki/DocsProject
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.fedoraproject.org/pipermail/docs/attachments/20060822/5c5d5576/attachment.bin 


More information about the docs mailing list