Greetings Dan. I'm the Mirror Wrangler for the Fedora Project, and
author of the tool we use to know which mirrors are up-to-date:
MirrorManager.
For most mirrors, rsync directory listings of each of the two Category
trees you're carrying (Feodra Linux, Fedora EPEL) is the fastest way
to get a full list of what content your mirror has. It falls back to
doing individual HTTP HEAD requests on a subset of the files if rsync
isn't available, and FTP DIR calls if HTTP isn't available.
In your case, as you don't have an rsync target, the crawler takes a
relatively long time to make all the HTTP HEAD calls to your mirror,
over 2 hours. Nearly all other mirrors that likewise don't have rsync,
take well under 2 hours.
Now, for other mirrors serving rsync such as
mirrors.us.kernel.org
shown here, we see times such as:
07/27/2013 12:29:05 AM Starting crawl
07/27/2013 12:29:05 AM scanning Category Fedora Linux
07/27/2013 12:31:26 AM rsync time: 0:01:14.965211
07/27/2013 12:34:21 AM scanning Category Fedora EPEL
07/27/2013 12:34:41 AM rsync time: 0:00:12.548085
07/27/2013 12:34:59 AM scanning Category Fedora Secondary Arches
07/27/2013 12:40:12 AM rsync time: 0:02:49.361520
07/27/2013 12:47:02 AM scanning Category Fedora Other
07/27/2013 12:47:13 AM rsync time: 0:00:06.161032
07/27/2013 12:57:27 AM Total directories: 5805
07/27/2013 12:57:27 AM Changed to up2date: 0
07/27/2013 12:57:27 AM Changed to not up2date: 0
07/27/2013 12:57:27 AM Unchanged: 5805
07/27/2013 12:57:27 AM Unknown disposition: 0
07/27/2013 12:57:27 AM New HostCategoryDirs created: 87
07/27/2013 12:57:27 AM HostCategoryDirs now deleted on the master, marked not up2date: 0
07/27/2013 12:57:27 AM Ending crawl
The whole process takes under 30 minutes.
I raise this because I know you do a good job running your mirror
generally, so this seems anomalous. I also know you have HTTP
KeepAlives turned on, I can see those results in the crawler debug
logs. It seems each HTTP HEAD request takes a second or more, which
when doing hundreds of such across all the directories in your
complete mirror, adds up.
In the past, mirror admins have suggested reducing the value of
/proc/sys/vm/vfs_cache_pressure, from the default value 100, to a
lower number, causing the kernel to prefer to keep dentries when under
memory pressure:
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
vfs_cache_pressure
------------------
Controls the tendency of the kernel to reclaim the memory which is
used for caching of directory and inode objects.
At the default value of vfs_cache_pressure=100 the kernel will attempt
to reclaim dentries and inodes at a "fair" rate with respect to
pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes
the kernel to prefer to retain dentry and inode caches. When
vfs_cache_pressure=0, the kernel will never reclaim dentries and
inodes due to memory pressure and this can easily lead to
out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.
Please take a look and see if a change is warranted on your side, or
if you see different behaviour for HTTP HEAD calls than I am.
Thanks,
Matt