On Tue, Apr 05, 2005 at 11:35:35AM -0400, seth vidal wrote:
> Is that worth adding yet another XML Parser package to the
distribution
> used by a single tool ? Is there a compatibility layer to still use
> libxml2 ?
> If I remember correctly, the performance problem wasn't libxml2 itself
> but the specific usage within yum, i.e. collecting the data, libxml2 by
> itself is parsing the megabyte sized file in less than a tenth of a second.
> I'm surprized the solution ends up going to use a python specific library
> instead of trying to find why the interface between libxml2 and yum generated
> that problem. I don't remember you saying you would switch library as a result.
well what happened was this:
Icon was working on repoview and decided to try out CelementTree b/c he
was using kid anyway and it used it. After some preliminary tests it
showed up as significantly faster parsing the metadata. For
primary.xml.gz the times went from 21s for 1800ish pkgs to 7s. Then when
he switched it to use iterparse() the memory footprint dropped below 10M
for the whole parse.
libxml2 should be able to work for parsing on constant memory, if you
use the reader and you use it for primary.xml.gz, if you used the tree
then freeing the trees after imports are teh best way.
Check out the numbers on the cElementTree webpage. They're
fairly
compelling.
There have been lot of rambling even within the Python community about
those numbers. One thing is sure, it never took 21 seconds to parse any of
the primary.xml.gz on any of my boxes at any point in time, with any of
the yum versions I ever used !
The biggest reason I've not talked to you about it much is that
for the
last few weeks I've been in kinda deep-hack mode and not communicating
as much as I have in the past.
Is a lack of communication a reason to push a new duplicate package on
Fedora Core ?
Daniel
--
Daniel Veillard | Red Hat Desktop team
http://redhat.com/
veillard(a)redhat.com | libxml GNOME XML XSLT toolkit
http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine
http://rpmfind.net/