While trying to recreate the mm2_crawler crash without the MirrorManager
database as backend I discovered that the crawler mainly uses python's
httplib to do all the HEAD requests. For repomd.xml file, which are
actually downloaded, the crawler switches to urlgrabber. Which seems to
be problematic in threaded applications. Or in combination with httplib.
Or something.
The easiest solution seems to be to rewrite the single
urlgrabber.urlread() to use one of the other available methods.
So a question to the python experts. Which implementation is the
"best" to download a single repomd.xml via either http or ftp?
I would replace it with urllib2. Is that the correct replacement?
Adrian