ftp.tsukuba.wide.ad.jp Fedora content slow rsync listing

Saturday, 27 July 2013

Greetings.  I'm the Mirror Wrangler for the Fedora Project, and author
of the tool we use to know which mirrors are up-to-date:
MirrorManager.  We have recently begun using rsync to retrieve the
list of files on a particular mirror, if you've registered rsync URLs
in MirrorManager (which you have).

For most mirrors, rsync directory listings of each of the Category
trees you're carrying (Fedora Linux, Fedora EPEL, Fedora Archive) is
the fastest way to get a full list of what content your mirror has.
It falls back to doing individual HTTP HEAD requests on a subset of
the files, if rsync isn't available, and then FTP DIR if http isn't
available.

In your case, the crawler takes a relatively long time to retrieve the
directory listing from your mirror, using rsync.  I see something like this:

07/27/2013 07:00:21 AM Starting crawl
07/27/2013 07:00:21 AM scanning Category Fedora Archive
during which the 2-hour cumulative timeout expires, and the crawler
is killed.  It never completes Fedora Archive, and never starts either
of the others.

https://admin.fedoraproject.org/mirrormanager/host/1192 is a link to
your mirror information in MirrorManager, including a crawler log.

Now, for other mirrors serving rsync such as archive.kernel.org
shown here, we see times such as:

07/27/2013 05:03:11 AM Starting crawl                                                     

       07/27/2013 05:03:11 AM scanning Category Fedora Archive                            

              07/27/2013 05:09:40 AM rsync time: 0:04:22.125007                           

                     07/27/2013 05:20:21 AM Total directories: 1973                       

                            07/27/2013 05:20:21 AM Changed to up2date: 0                  

                                   07/27/2013 05:20:21 AM Changed to not up2date: 0       

                                          07/27/2013 05:20:21 AM Unchanged: 1973          

                                                 07/27/2013 05:20:21 AM Unknown
disposition: 0                                                                            
                                                                   07/27/2013 05:20:21 AM
New HostCategoryDirs created: 1972                                                        
                                                                           07/27/2013
05:20:21 AM HostCategoryDirs now deleted on the master,
07/27/2013 05:20:21 AM Ending crawl                                                       

The whole process takes under 20  minutes, the rsync time alone is
less than 4 1/2 minutes.  Your mirror is taking over 2 hours for same.

I raise this because I know you do a good job running your mirror
generally, so this seems anomalous.

In the past, mirror admins have suggested reducing the value of
/proc/sys/vm/vfs_cache_pressure, from the default value 100, to a
lower number, causing the kernel to prefer to keep dentries when under
memory pressure:

https://www.kernel.org/doc/Documentation/sysctl/vm.txt

vfs_cache_pressure
------------------

Controls the tendency of the kernel to reclaim the memory which is
used for caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt
to reclaim dentries and inodes at a "fair" rate with respect to
pagecache and swapcache reclaim.  Decreasing vfs_cache_pressure causes
the kernel to prefer to retain dentry and inode caches. When
vfs_cache_pressure=0, the kernel will never reclaim dentries and
inodes due to memory pressure and this can easily lead to
out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.

Please take a look and see if a change is warranted on your side, or
if you see different behaviour for rsync directory listings than I am.

Thanks,
Matt

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009