Paul W. Frields
stickster at gmail.com
Mon Jun 13 14:10:56 UTC 2005
(Excuse the <pedantic> mode, the initial summary is for historical
The Fedora Documentation Project's (FDP) goal is to "create easy-to-
follow, task-based documentation for Fedora Core users and
developers." The FDP is part of the Fedora Project. The Fedora
Project's overall goal is to "build a complete, general purpose
operating system exclusively from open source software."
For a few days, some of the Fedora Documentation Project folks have been
contemplating using an XML normalization utility to clean up XML into a
standardized presentation before a CVS commit happens. The clean up
would include, but not be limited to, the following:
* set standard fill-column (the columnar position after which
unprotected text is wrapped)
* set indentation size
* set block/inline tag vertical spacing
While performing the cleanup procedure, any utility used must also
protect the DTD block, any CDATA containers, and any similar containers
such as <screen>, <programlisting>, and <literal>. It must be
configurable in such a way that changes to the configuration can be
provided cleanly via CVS. Clients would perform this procedure as part
of a "make" target before committing changes. The details on this part
have yet to be worked out, but we would certainly try to make it as
painless as possible -- possibly even simply making it a prerequisite to
any other constructive "make" target.
Without this step, we are running a risk of generating a lot more white
noise in CVS and on the fedora-docs-commits list. In 2003, with a
smaller, less visible project, the use of Emacs/PSGML was simply
*required*, more or less, so normalization was enforced on the client
side without any additional fuss. With more participants, however, we
have to confront the fact that people want to use their own favorite
tools. XML normalization makes cooperation on the same document
possible for writers and editors who enjoy different tools, by ensuring
that CVS diffs are sensible.
The "tidy" utility is GPL and in Fedora Extras, and it will do some XML
cleanup, but it is not designed for this purpose. It was designed as an
HTML normalization engine, and simply has some XML functionality. The
"xmlformat" utility is designed from the ground up as an XML normalizer,
but it is *not* GPL. Thankfully, Tommy Reynolds brought the xmlformat
licensing specification to my attention last week, so I've had a little
time to think about it.
The xmlformat utility is still open source software; although IANAL, I
did a pretty thorough review of the licensing of xmlformat and other
open source software requirements, and this seems pretty clear-cut.
Note that the "open source" requirement *does not* mean the software has
to be GPL, or BSD. It merely needs to meet the requirements and
definition of "open source." The terms are clear-cut enough that we
may not need an official legal opinion from Mark Webbink, but I am
willing to put a link to this message at the appropriate wiki location
for him to look at if anyone thinks it's necessary.
Here are the facts of licensing pertaining to xmlformat:
(A) The original portions of xmlformat by Paul DuBois, paul at kitebird
com, are licensed under a BSD-style license. The BSD-style license
is an open source license. The only portion of xmlformat not covered by
this license is the implementation of the REX shallow parser.
(B) The REX shallow parser, which is copyrighted by Robert D. Cameron,
cameron at sfu ca, is licensed under terms shown below in their
"The following code may be freely used and distributed provided that
this copyright and citation notice remains intact and that modifications
or additions are clearly identified."
The REX shallow parser clearly meets the requirements of the Open Source
Definition, to wit:
1. No royalties or fees are imposed upon redistribution of REX. The
license specifically and categorically permits free distribution without
2. The source code for REX is publicly available.
3. The copyright holder for REX allows modifications or derivative
works. The licensing terms require these be clearly identified, but it
puts no restrictions on their creation. In addition, the terms
explicitly permit free use of the material.
4. The license for REX does not set out any requirements for the
licensing of modified versions, other than to require the modifications
be identified as such. This requirement would allow modified versions
to be distributed as the original version plus patches, as doing so
would clearly identify modifications and thus meet the requirement.
5. The license for REX does not discriminate against persons or groups,
nor against fields of endeavor.
6. The only requirements of the licensing terms flow through to and
with any redistributed versions of REX, that is, the requirements for
the copyright and citation notices to remain intact, and for
modifications to be identified.
7. The license applies to the entire REX implementation. There are no
subordinate parts or components as such.
8. The license does not restrict other software, and in fact REX can be
distributed with other software. The xmlformat utility is itself an
example of this use.
Because the REX software, and thus xmlformat, clearly meet all the
requirements of the open source definition, we should be able to use it
in our toolchain without incurring any difficulty. (REX probably also
meets the definition of "free software," although it is not copylefted
and thus does not share the same distinction as GPL software.) I'll
prepare an RPM of this package and see about getting it into Fedora
Extras. In the meantime, we can keep testing and evaluating other
methods of XML normalization. So far, xmlformat does the best job that
I've seen, but I'm sure there must be other tools out there.
Does anyone know whether Expat could easily do what we are trying to
accomplish, or am I talking apples to oranges?
= = = = =
Paul W. Frields, RHCE http://paul.frields.org/
gpg fingerprint: 3DA6 A0AC 6D58 FEC4 0233 5906 ACDB C937 BD11 3717
Fedora Documentation Project: http://fedora.redhat.com/projects/docs/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.fedoraproject.org/pipermail/docs/attachments/20050613/b756f64a/attachment.bin
More information about the docs