hi antoine...
i had already tried tidy with no success... it still seems to still generate warnings...
initFile -> tidy ->cleanFile -> perl app (using xpath/livxml)
the xpath/linxml functions in the perl app complain regarding the file. my thought is that tidy isn't cleaning enough, or that the perl xpath/libxml functions are too strict!
but the weird thing is that i can use Firefox with the DOM/Xpath plugin, and I can create an XPath Query that I can use within the Firefox/Plugin to generate the correct resulting list of items/elements based on the XPath Query. However, when i then use the same XPath Query, and the same wep page in my test app, i get the warnings/errors from the perl xpath/libxml functions....
i'm wondering if there's a way that i can call the Firefox Engine (using the Plugins, and have it do all the processing/parsing) and let it return the list of items/elements to me.....
-bruce
-----Original Message----- From: Antoine [mailto:melser.anton@gmail.com] Sent: Saturday, July 01, 2006 12:30 AM To: bedouglas@earthlink.net; For users of Fedora Core releases Subject: Re: developing using the firefox engine
On 01/07/06, bruce bedouglas@earthlink.net wrote:
hi...
i've been trying (unsuccessfully) to parse/process html files. i'm almost certain that the issue has to do with the fact that the html is not valaid html.. running the html through various apps "tidy/html validator/etc..." complain with warnings.
I have been having a similar problem with html, though this time the guilty party is mshtml. That piece of dog vomit *can not be made* to produce xhtml!!! I get the pseudo-html from mshtml, run it through sgmlreader+converter class and get (x)html out. I can then parse + process the file with standard xml/xsl tools. It took me an age to find good things for .net though - you shouldn't have nearly as many problems on linux/fedora. I suggest you pass it through tidy and get xhtml out. It may give you some junk but you don't really have many other options... Cheers Antoine
-- This is where I should put some witty comment.