Greetings FDPers,
Tommy and I have been collaborating on a method to mitigate our problems with multiple editors. When FDP started, we envisioned a beautiful, pristine world in which everyone used Emacs+psgml -- pipe down, Tommy ;-D -- and collaborating writers and editors would not end up reformatting each other's work, since they were using the same tools. However, that expectation has proved somewhat unreasonable, especially in light of (1) our goal to lower barriers for entry as far as possible, and (2) the proliferation of XML editors out there. As a result, the FDP canon will, in all likelihood, no longer require people to use specific tools, although a preference for Emacs+PSGML is likely to linger, since the majority of the editors are using those tools.
An XML normalization engine (xmlformat) will reformat docs when they are imported or committed to CVS. It uses a configuration file which should not complicate life for writers or editors using most tools of which we're aware. Because the normalization does not change the actual copy of the file on your disk, you don't have to worry about how the committed version might have different whitespace, margins, blocking, etc. from your copy on disk. You simply keep editing as usual. Any person checking out a file from CVS will get the normalized version, which they can feel free to *reformat* locally. Any such changes will be "stripped" when commits are made. Only the content matters.
CAVEAT: Once you make a change to a .xml document where your local copy is NOT normalized, when you use the "cvs diff" command, the diff will be very long because of the normalization differences. Sorry, folks, that part's out of our hands; it was the price we paid to allow more tools out there, and it seems worth it. The diff that actually ends up going to the fedora-docs-commits list from the CVS server *will* be sane.
If you must have a sane local diff, you can run "cvs up -C <file>" to replace your non-normalized file with the normalized version from the repository. If you have uncommitted work in the file, commit it before you run that command, or it will be lost. The only drawback is that your editor may want to reformat the file again when you next load it. This will not affect CVS.
So now, THE INVITATION:
For now, the only module that is normalized is the xml-normalize/ module. Please feel free to add files there and observe the results... i.e. beat on the script. If something breaks, please report problems to the list, Tommy, or me. Please check the log for a file before you diddle with it, so as not to inconvenience another tester. Thanks and have fun!
Uttered "Paul W. Frields" stickster@gmail.com, spake thus:
Only the content matters.
CAVEAT: Once you make a change to a .xml document where your local copy is NOT normalized, when you use the "cvs diff" command, the diff will be very long because of the normalization differences.
If you would like to see the magic, just update your "docs-commons" directory and see three new files in a "docs-common/bin" directory:
1) xmlformat -- the reformatting script 2) xmlformat-dfp.conf -- configuration file for the script 3) tidy-bowl -- driver script
You can try it out by doing this, in your document working directory:
$ ../docs-common/bin/tidy-bowl my-file.xml
Caution: this rewrites your input file inplace, so it may be wise to keep a copy ;-)
This can also help reduce a lengthly local diff to what the CVS server would actually see.
HTH
I've been using xml-normalize and also tidy-bowl locally, and they both seem to work perfectly :)
The first test document I used with tidy-bowl was edited with emacs/nxml and it realigned the text correctly. I also messed around with spacing and text alignment within xml-normalize.
On Fri, 2005-07-08 at 22:20 +0100, Stuart Ellis wrote:
I've been using xml-normalize and also tidy-bowl locally, and they both seem to work perfectly :)
The first test document I used with tidy-bowl was edited with emacs/nxml and it realigned the text correctly. I also messed around with spacing and text alignment within xml-normalize.
Thanks for testing this stuff. The documents I've done seem to work great. Has anyone else done any local comparisons?
Before this gets moved up to include all modules, Tommy and/or I will probably implement some of the content testing code which ensures that the "after" document retains all the content of the original. This would assure us that any absolute weirdness in the program would keep it from munging up someone's nice document on the way to the repository.
On Fri, 2005-07-08 at 22:20 +0100, Stuart Ellis wrote:
I've been using xml-normalize and also tidy-bowl locally, and they both seem to work perfectly :)
The first test document I used with tidy-bowl was edited with emacs/nxml and it realigned the text correctly. I also messed around with spacing and text alignment within xml-normalize.
I did some runs using ../docs-common/bin/tidy-bowl against the SELinux FAQ, which I *believed* to be a shining example of correct indentation, etc.
Apparently I have some extra whitespace in there. :) I also had to reindent using the 72 characters width before running a meaningful diff.
http://people.redhat.com/kwade/fedora-docs/selinux-faq-post_tidy-bowl.diff
Every piece of that file had sgml-fill-element used (C-c C-q), save the flush-left <screen> et al. I would have expected zero or a few differences.
Let's look at the kind of diffs generated:
* Whitespace decisions -- that seems to come from "para normalize = yes". I can accept that I have extraneous whitespace, and it was picked up and fixed. That's good.
* Some tags have breaks before and after, perhaps meaning they are not being recognized as inline. These include <ulink> and <abbrev>. I suppose I should put a line in xmlformat-fdp.conf, except there is one for <ulink> already to make it inline. Hmmm. I tried adding "entry- break" and "exit-break" set to 0 for ulink, but it still makes line breaks.
* A number of paragraphs appear to have been indented by a single space. These might have been spaces-not-tabs indents that I didn't clean up after I fixed my .emacs to stop doing that. I.e., I went back to using tabs-as-tabs to match with the default Emacs install on Fedora Core, but I may not have fixed up this document and it is using spaces-as-tabs.
* All of my double-spaces after periods have been replaced with single- spaces. I thought that DocBook liked the double-space after a period? Or am I just left in the typewriter age on that one?
On the whole, the document looks as if it is better and certainly more normalized for the treatment. Hopefully my mysteries aren't that mysterious.
- Karsten
On Fri, 2005-07-08 at 15:41 -0700, Karsten Wade wrote:
- Some tags have breaks before and after, perhaps meaning they are not
being recognized as inline. These include <ulink> and <abbrev>. I suppose I should put a line in xmlformat-fdp.conf, except there is one for <ulink> already to make it inline. Hmmm. I tried adding "entry- break" and "exit-break" set to 0 for ulink, but it still makes line breaks.
That's odd. I tend to put links outside of the main text, so I didn't see this.
<para> A link: </para>
<para> <ulink url="http://fedora.redhat.com/docs/selinux-apache-fc3/">http://fedora.redhat.com/docs/selinux-apache-fc3/</url> </para>
- A number of paragraphs appear to have been indented by a single space.
These might have been spaces-not-tabs indents that I didn't clean up after I fixed my .emacs to stop doing that. I.e., I went back to using tabs-as-tabs to match with the default Emacs install on Fedora Core, but I may not have fixed up this document and it is using spaces-as-tabs.
My pgsgml-processed documents have the columns indented with a single tab, which tidy-bowl converts to 7 spaces.
- All of my double-spaces after periods have been replaced with single-
spaces. I thought that DocBook liked the double-space after a period? Or am I just left in the typewriter age on that one?
I automatically double-space after a period when typing, but I've noticed that pgsgml would reset my spacing variably.
Those documents were written with emacs 21 (standard FC3 package) and pgsgml, and the setup may have been mildly broken, so YMMV. I've been editing them with FC4 emacs and Tim Waugh's nxml package before passing them to tidy-bowl.
On Sat, 2005-07-09 at 00:59 +0100, Stuart Ellis wrote:
On Fri, 2005-07-08 at 15:41 -0700, Karsten Wade wrote:
- All of my double-spaces after periods have been replaced with single-
spaces. I thought that DocBook liked the double-space after a period? Or am I just left in the typewriter age on that one?
Further to this, I checked and single space after period is now the generally accepted convention for electronic and printed documents. Doublespace is for monospace lettering/fonts, so it's right when using a typewriter. So tidy-bowl/xmlformat is doing the right thing, and we need to change our typing habits :)