https://fedorahosted.org/fedora-infrastructure/ticket/3268
notes that a mirror might not be removed from the list even though
it's stale.
In particular, there is a code path called add_parents() whose job it
is to mark all parent directories of a target directory up-to-date or
not, if those parent directories had not already been determined to be
up-to-date for themselves. This can happen if a directory has no
files in it, for example, only child directories. This code path had
an incorrect key lookup, specifically:
- parent = '/'.join(splitpath[:-1])
- try:
- hcd = host_category_dirs[(hc, parent)]
which was looking up the parent directory in the host_category_dirs
cache (which is later operated on). However, the actual key here is
not a the string form of the parent directory name, it is a Directory
object. So it's looking up the wrong thing, failing the lookup, and
then proceeding to mark all its parent directories up-to-date
incorrectly. In particular, it is marking all parent directories
up-to-date (e.g. pub/epel/5/i386) when a child subdirectory
(pub/epel/5/i386/repoview/layout) is marked up-to-date, even if the
parent directory is not in fact up-to-date.
The patch below fixes this by splitting out the parent directory
lookup function into its own function for readability, and fixes the key
lookup.
I've tested this on bapp02 against a stale mirror that was previously
marked up-to-date incorrectly, and it fixes it.
I'd like to hotfix bapp02 to address this.
Thanks,
Matt
--
Matt Domsch
Technology Strategist
Dell | Office of the CTO
--- crawler_perhost 2010-09-06 14:46:21.000000000 +0000
+++ crawler_perhost 2012-05-12 01:20:54.604906708 +0000
@@ -348,21 +348,24 @@
break
return pref
-
-def add_parents(host_category_dirs, hc, d):
- splitpath = d.name.split('/')
+def parent(directory):
+ parentDir = None
+ splitpath = directory.name.split(u'/')
if len(splitpath[:-1]) > 0:
- parent = '/'.join(splitpath[:-1])
+ parentPath = u'/'.join(splitpath[:-1])
try:
- hcd = host_category_dirs[(hc, parent)]
- except KeyError:
- try:
- parentDir = Directory.byName(parent)
- host_category_dirs[(hc, parentDir)] = True
- except SQLObjectNotFound: # recursed out of the directory structure
- parentDir = None
-
- if parentDir and parentDir != hc.category.topdir: # stop at top of the category
+ parentDir = Directory.byName(parentPath)
+ except SQLObjectNotFound:
+ pass
+ return parentDir
+
+def add_parents(host_category_dirs, hc, d):
+ parentDir = parent(d)
+ if parentDir is not None:
+ if (hc, parentDir) not in host_category_dirs:
+ print "directory %s adding parent %s, unknown up2date state" %
(d.name, (hc, parentDir))
+ host_category_dirs[(hc, parentDir)] = None
+ if parentDir != hc.category.topdir: # stop at top of the category
return add_parents(host_category_dirs, hc, parentDir)
return host_category_dirs