I have two students interested in diploma thesis called Yum plugin for
suggesting packages based on usage:
TL;DR - from anonymized access log, create a database of suggested
packages using data mining techniques and provide a Yum plugin that
would suggest "Users of vim also installed: ctags, git, ..."
I am gonna create a Fedora Feature wiki page shortly describing this in
more detail. Our goal is to offer this project for integration into
Fedora later on, at least provide Fedora packages for it.
To do that, we need good source of data. It would be best to collect
access logs from one or two main Fedora mirrors. We would provide short
script in Python that would parse access logs and anonymize the data (IP
address hash-salted) and filtered only relevant data (RPM files from
latest Fedora release or updates repositories). That would be phase one
which should give us a sample data.
Phase two would be to integrate this script with logrotate and for one
Fedora release cycle (Fedora 19) the script would collect relevant
anonymized data into a file. Final suggested package database would be
created from this file (or maybe files to allow us to move them on the
fly out of the stat directory).
The big (legal) question is if we are able to provide this anonymized
data to public, or if we want to sign NDA with all people involved. I am
CCing Tom for this question.
I need your help with connecting to relevant people. Any comments are
Many thanks and I hope this effort will lead to improving user
experience with Fedora packaging.
Lukas "lzap" Zapletal
irc: lzap #theforeman
The infrastructure team will be having it's weekly meeting tomorrow,
2013-12-05 at 19:00 UTC in #fedora-meeting on the freenode network.
#topic New folks introductions and Apprentice tasks.
If any new folks want to give a quick one line bio or any apprentices
would like to ask general questions, they can do so in this part of the
meeting. Don't be shy!
#topic Applications status / discussion
Check in on status of our applications: pkgdb, fas, bodhi, koji,
community, voting, tagger, packager, dpsearch, etc.
If there's new releases, bugs we need to work around or things to note.
#topic Sysadmin status / discussion
Here we talk about sysadmin related happenings from the previous week,
or things that are upcoming.
#topic Upcoming Tasks/Items
#topic Open Floor
Submit your agenda items, as tickets in the trac instance and send a
note replying to this thread.
More info here: