Thoughts and question about MM2's UMDL script
Adrian Reber
adrian at lisas.de
Fri Jun 26 18:45:18 UTC 2015
On Fri, Jun 26, 2015 at 06:11:44PM +0200, Pierre-Yves Chibon wrote:
> Yesterday and today I spent a little time going over the UDML script of
> MirrorManager2.
> Going through it, I ended up with few questions regarding it.
>
> * Repository name
> UMDL's code clearly says:
> # historically, Repository.name was a longer string with
> # product and category deliniations. But we were getting
> # unique constraint conflicts once we started introducing
> # repositories under repositories. And .name isn't used for
> # anything meaningful. So simply have it match dir.name,
> # which can't conflict.
> And quickly grepping through MM2's sources, I could not find a reference to
> this, we alway rely on the repository's prefix, not its name.
>
> Question: Should we drop this?
> It makes things confusing and is basically noise since we do not use it anywhere.
It was a helpful column for fixing errors with the repos. But as the
database is so huge everything we could drop should be dropped.
[...]
> * The directory table
> So looking at the database and more precisely the directory table in that
> database, it seems we store all the directories of the tree, ie:
> /pub/alt/
> /pub/alt/anaconda/
> /pub/alt/bfo/
> /pub/alt/bfo/gpxe-20120514
> ...
> This makes me a little pondering. What is the interest of keeping the whole
> list of directories in the DB ?
> After all, as far as I understand, the UMDL finds the repo in the tree (repo
> being defined by the presence of a 'repodata' folder containing the repomd.xml
> or by the presence of a 'summary' file and an 'objects' folder).
> For these repo, we look for the most recent files, stores this info in the DB
> and later use it to check if the mirrors are up to date.
>
> But do we need to checking that ``pub/fedora/linux`` exists when we later check
> that ``pub/fedora/linux/updates/testing/21/x86_64/`` exists and is up to date?
>
> I am under the impression currently that dropping un-necessary directories would
> save DB space (the directories being then linked in the host_category_dir table
> listing for each host, in each category which dir are present) as well as
> crawling time (both in the UMDL and in the crawler).
Again, dropping unnecessary information from the database sounds good.
Although this one sounds a bit more complex as you always have to delete
directories if subdirectories appear and add directories if
subdirectories disappear.
> * Non-directory based support in UDML.
>
> So the UMDL script currently supports three ways of crawling the tree:
> * file
> * rsync
> * directory
>
> We, in Fedora, are only using the last one. I believe the `rsync` mode was added
> to support Ubuntu and the file mode is basically a simplified version of the
> directory mode, but that we do not use at at the moment.
>
> I would like to propose that we drop support for rsync. I feel that it may be
> simpler and easier to create an UMDL and a crawler for each distro that would
> like to use MirrorManager than maintaining a one-script-fits-all UMDL that is
> in fact tested for only one of the scenario.
> That being said, if we ever have interest from Ubuntu, CentOS or any other
> communities, we should definitively look into making the UMDL and crawler as
> re-usable as possible for them, but keeping the distro-specific bits separated.
Like already mentioned, RPM Fusion uses the rsync mode as the master
mirror is 'far' away from the MirrorManager installation. It is still
using MM1 on CentOS 5 and currently I am not immediately planing on
upgrading to MM2. So it could be removed and I should be able to write
the necessary umdl rsync crawler once I need it.
Another thought about umdl I had concerns the file mode. We have for
the categories 'Fedora EPEL' and 'Fedora Linux' files called
'fullfilelist'. Maybe that would be an option for umdl to use to reduce
I/O on the NFS mounts. Only actually reading the files and metadata from
NFS if it is necessary. Just one of those ideas.
Adrian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 811 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20150626/1212d79a/attachment.sig>
More information about the infrastructure
mailing list