Dell Customer Communication
--
Matt Domsch
Senior Distinguished Engineer & Executive Director
Dell | Software Group, Office of the CTO
-----Original Message-----
From: Pierre-Yves Chibon [mailto:pingou@pingoured.fr]
Sent: Monday, June 29, 2015 4:08 AM
To: Domsch, Matt
Cc: infrastructure(a)lists.fedoraproject.org
Subject: Re: Thoughts and question about MM2's UMDL script
On Fri, Jun 26, 2015 at 06:00:18PM +0000, Matt_Domsch(a)Dell.com wrote:
* Readable status of directories
The Directory table has a 'readable' property, none of our directories
is not readable.
Question is: what is the use-case for this boolean?
== MD == Pre-bitflip content, which UMDL can see but the normal public can't yet. Are
you no longer bitflipping? Then it doesn't matter.
Ok, I see the use-case in the crawler, but in the UMDL, how did it work?
The UMDL would not be allowed to read a given folder?
== MD == UMDL can read it, but the crawler can't. UMDL sets readable=False; crawler
then doesn't delete the directory (or care if it can't read it) because it
doesn't expect it to be readable. Otherwise, when readable=True but a given mirror
doesn't have that content, crawler marks that host_category_directory for deletion.
I am under the impression currently that dropping un-necessary
directories would save DB space (the directories being then linked in
the host_category_dir table listing for each host, in each category
which dir are present) as well as crawling time (both in the UMDL and in the crawler).
== MD == You need non-repo directories for ISOs at least; there was a time when we were
able to mirror the entire Fedora static web content too; able only because MM tracked all
directories, not just repository directories. MM1 also tried to be a "generic"
mirror manager, not just a Fedora-specific mirror manager, so I intentionally tracked
everything, not just Yum repos.
Idea: what if we were tracking only the folders that have files in them, so for example
http://dl.fedoraproject.org/pub/epel/5/ would not end-up in the database.
In addition, we could add a sort of blacklist to avoid storing
http://dl.fedoraproject.org/pub/ just due to the presence of the DIRECTORY_SIZES.txt file
This would reduce the number of directories we store for the Atomic tree.
== MD == I didn't optimize for a few non-file-containing directories. You're
welcome to if you see a need. But it's saving just a few entries out of
hundreds/thousands.
* Non-directory based support in UDML.
So the UMDL script currently supports three ways of crawling the tree:
* file
* rsync
* directory
We, in Fedora, are only using the last one. I believe the `rsync` mode
was added to support Ubuntu and the file mode is basically a
simplified version of the directory mode, but that we do not use at at the moment.
I would like to propose that we drop support for rsync. I feel that it
may be simpler and easier to create an UMDL and a crawler for each
distro that would like to use MirrorManager than maintaining a
one-script-fits-all UMDL that is in fact tested for only one of the scenario.
That being said, if we ever have interest from Ubuntu, CentOS or any
other communities, we should definitively look into making the UMDL
and crawler as re-usable as possible for them, but keeping the distro-specific bits
separated.
== [file] was used early on for dev and testing. It's not interesting. [rsync] would
be used when you don't have access to a master mirror (or very close replica). Perhaps
the rpmfusion setup still needs this. I would have for testing Ubuntu, certainly. It
shouldn't be needed for production when the content being mirrored out is managed by
the same people operating mirrormanager, as is the Fedora case.
Apparently RPMFusion does need this, so it needs to stay, the question becoming:
Should we split the different UMDL types into different scripts?
The idea being that allow easier optimization then.
(Note: I'm having this idea now but since I did not looked at what/how we could
optimize, it may end-up remaining in the same file)
== MD == the parsing routing is pretty short; not worth a separate executable for.