I have two students interested in diploma thesis called Yum plugin for
suggesting packages based on usage:


TL;DR - from anonymized access log, create a database of suggested
packages using data mining techniques and provide a Yum plugin that
would suggest "Users of vim also installed: ctags, git, ..."

I would suggest that you will need to find a lower level mirror than any of the top level ones. The reason being is that there is a lot of noise in the top levels from other mirrors and various developers using mock/reposync to get packages. [I tried using the dl.fedoraproject.org mirrors for this a while ago and a large % of the yum traffic is from mock builds.] If you go with smaller mirrors you may need more than one.

