Greetings. I'm the Mirror Wrangler for the Fedora Project, and author
of the tool we use to know which mirrors are up-to-date:
MirrorManager. We have recently begun using rsync to retrieve the
list of files on a particular mirror, if you've registered rsync URLs
in MirrorManager (which you have).
For most mirrors, rsync directory listings of each of the four
Category trees you're carrying (Feodra Linux, Fedora EPEL, Fedora
Archive, and Fedora Seconary Arches) is the fastest way to get a full
list of what content your mirror has. It falls back to doing FTP DIR
listings, and then individual HTTP HEAD requests on a subset of the
files, if rsync isn't available.
In your case, the crawler takes a relatively long time to retrieve the
directory listing from your mirror. By category, I see something like this:
07/26/2013 06:00:13 PM Starting crawl
07/26/2013 06:00:13 PM scanning Category Fedora EPEL
07/26/2013 06:15:20 PM rsync time: 0:14:54.263938
07/26/2013 06:15:35 PM scanning Category Fedora Linux
07/26/2013 07:32:58 PM rsync time: 1:16:25.136438
07/26/2013 07:38:11 PM scanning Category Fedora Archive
(during which the 2-hour cumulative timeout expires, and the crawler
is killed. It never completes Fedora Archive, and never starts Fedora
Secondary Arches).
The commands invoked to get the listings are:
rsync --temp-dir=/tmp -r --exclude=.snapshot --exclude='*.~tmp~' --no-motd
rsync://ftp.heanet.ie/pub/fedora/epel/
rsync --temp-dir=/tmp -r --exclude=.snapshot --exclude='*.~tmp~' --no-motd
rsync://ftp.heanet.ie/pub/fedora/linux/
rsync --temp-dir=/tmp -r --exclude=.snapshot --exclude='*.~tmp~' --no-motd
rsync://ftp.heanet.ie/pub/fedora-archive/
Now, for other mirrors serving rsync such as
mirrors.us.kernel.org
shown here, we see times such as:
07/27/2013 12:29:05 AM Starting crawl
07/27/2013 12:29:05 AM scanning Category Fedora Linux
07/27/2013 12:31:26 AM rsync time: 0:01:14.965211
07/27/2013 12:34:21 AM scanning Category Fedora EPEL
07/27/2013 12:34:41 AM rsync time: 0:00:12.548085
07/27/2013 12:34:59 AM scanning Category Fedora Secondary Arches
07/27/2013 12:40:12 AM rsync time: 0:02:49.361520
07/27/2013 12:47:02 AM scanning Category Fedora Other
07/27/2013 12:47:13 AM rsync time: 0:00:06.161032
07/27/2013 12:57:27 AM Total directories: 5805
07/27/2013 12:57:27 AM Changed to up2date: 0
07/27/2013 12:57:27 AM Changed to not up2date: 0
07/27/2013 12:57:27 AM Unchanged: 5805
07/27/2013 12:57:27 AM Unknown disposition: 0
07/27/2013 12:57:27 AM New HostCategoryDirs created: 87
07/27/2013 12:57:27 AM HostCategoryDirs now deleted on the master, marked not up2date: 0
07/27/2013 12:57:27 AM Ending crawl
The whole process takes under 30 minutes. By relative difference,
your mirror is taking 60x more time to serve the same directory list as does
other mirrors. I would have to increase the timeout to crawl your
mirror from 2 hours to 30 hours, by which point the content would have
changed yet again, several times...
I raise this because I know you do a good job running your mirror
generally, so this seems anomalous.
In the past, mirror admins have suggested reducing the value of
/proc/sys/vm/vfs_cache_pressure, from the default value 100, to a
lower number, causing the kernel to prefer to keep dentries when under
memory pressure:
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
vfs_cache_pressure
------------------
Controls the tendency of the kernel to reclaim the memory which is
used for caching of directory and inode objects.
At the default value of vfs_cache_pressure=100 the kernel will attempt
to reclaim dentries and inodes at a "fair" rate with respect to
pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes
the kernel to prefer to retain dentry and inode caches. When
vfs_cache_pressure=0, the kernel will never reclaim dentries and
inodes due to memory pressure and this can easily lead to
out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.
Please take a look and see if you see different behaviour for rsync
directory listings than I am, and if the behaviour is indeed not what
you would expect, consider what could be done to adjust.
I note that you are running report_mirror after each rsync content
pull, thank you. If you look, I suspect that too takes a long time,
for the same reason.
Thanks,
Matt