Anonymized access log from a fedora mirror

Steven Acres steven at swatteksystems.com
Fri May 3 12:42:02 UTC 2013


This sounds akin to apt (debian), emerge (gentoo), 'recommends'.
Added functionality but not a functional dependency. Am I on mark with
intention?
On May 3, 2013 8:35 AM, "Lukas Zapletal" <lzap at redhat.com> wrote:

> Hello,
>
> I have two students interested in diploma thesis called Yum plugin for
> suggesting packages based on usage:
>
> http://bit.ly/18hrHbL
>
> TL;DR - from anonymized access log, create a database of suggested
> packages using data mining techniques and provide a Yum plugin that
> would suggest "Users of vim also installed: ctags, git, ..."
>
> I am gonna create a Fedora Feature wiki page shortly describing this in
> more detail. Our goal is to offer this project for integration into
> Fedora later on, at least provide Fedora packages for it.
>
> To do that, we need good source of data. It would be best to collect
> access logs from one or two main Fedora mirrors. We would provide short
> script in Python that would parse access logs and anonymize the data (IP
> address hash-salted) and filtered only relevant data (RPM files from
> latest Fedora release or updates repositories). That would be phase one
> which should give us a sample data.
>
> Phase two would be to integrate this script with logrotate and for one
> Fedora release cycle (Fedora 19) the script would collect relevant
> anonymized data into a file. Final suggested package database would be
> created from this file (or maybe files to allow us to move them on the
> fly out of the stat directory).
>
> The big (legal) question is if we are able to provide this anonymized
> data to public, or if we want to sign NDA with all people involved. I am
> CCing Tom for this question.
>
> I need your help with connecting to relevant people. Any comments are
> appreciated.
>
> Many thanks and I hope this effort will lead to improving user
> experience with Fedora packaging.
>
> --
> Later,
>
>  Lukas "lzap" Zapletal
>  irc: lzap #theforeman
> _______________________________________________
> infrastructure mailing list
> infrastructure at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/infrastructure
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20130503/074e9d79/attachment.html>


More information about the infrastructure mailing list