EPEL Django deprecated

Stephen John Smoogen smooge at gmail.com
Sat Jun 8 16:23:45 UTC 2013


On 7 June 2013 19:21, Karsten 'quaid' Wade <kwade at redhat.com> wrote:

>
> A tracking-mirror could go something like this:
>
> * Logs are rotated out to the trash regularly, e.g. 24 hours.[1]
> * Data is gathered from logs in real time in an anonymous fashion, so
> nothing non-anonymous is inserted in to the database. No connection is
> retained between the data in the database and the logs not yet thrown away.
>

I have been trying to come up with a better way of saying the following but
haven't been able to.

Please do not use the word anonymous data. Trying to make data truly
anonymous takes a LOT of work with nebulous gain. You have to do more than
just change out ip addresses with something else. You have to remove
timestamps, shuffle data around, drop some data and duplicate other, and
all other kinds of things which done wrong can either not really anonymize
the data or make the data worthless to trying to determine what is going on
in it. Phd's come up with new methods all the time that fall apart in
reality because of some assumption that was forgotten.

We can not promise anonymity, and trying to is not something that I could
see happening in a volunteer organization.

Two throwing away logs gets you into trouble because the first thing you
find is that you have a new question but you can't answer it with your old
data because you weren't logging it. At which point you need 6 months of
new data before you can answer that question. Plus logs are useful when you
run into other issues like "Hey look someone broke into the system how did
they do that?" Cross referencing http/ftp/rsync logs to the breakin usually
shows where the attacker was really starting from which can help others.  I
would say that any logs we keep are kept for X time where X is longer than
6 months and less than 2 years.

If a mirror is set up, it is set up. Data is collected and stored and
analyzed following the laws and rules of conduct that are set up for the
people who can view and analyze that data. What is published from that
follows those laws and rules of conduct also. Going beyond that without a
staff of trained and knowledgeable statisticians who have done this sort of
thing before is a recipe for disaster.

-- 
Stephen J Smoogen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/epel-devel/attachments/20130608/a9625203/attachment.html>


More information about the epel-devel mailing list