EPEL Django deprecated

Karsten 'quaid' Wade kwade at redhat.com
Sat Jun 8 17:27:20 UTC 2013


On 06/08/2013 09:23 AM, Stephen John Smoogen wrote:
> On 7 June 2013 19:21, Karsten 'quaid' Wade <kwade at redhat.com> wrote:
> 
>>
>> A tracking-mirror could go something like this:
>>
>> * Logs are rotated out to the trash regularly, e.g. 24 hours.[1]
>> * Data is gathered from logs in real time in an anonymous fashion, so
>> nothing non-anonymous is inserted in to the database. No connection is
>> retained between the data in the database and the logs not yet thrown away.
>>
> 
> I have been trying to come up with a better way of saying the following but
> haven't been able to.

OK, I get what you are saying, you make good sense.

Let me go back a few steps to see if I'm trying to solve a problem that
doesn't need solving.

We as sysadmins know that the Internet is not designed to be an
anonymous place. People may not think about it much, but their daily
journeys across the Internet are easily tracked back to them. We can
call that a not-well-known fact.

So in thinking about that fact, what I said (and you tore down) doesn't
really make sense - it's trying to anonymize information that people
aren't intending to be anonymous by the simple fact they are connected
to the public Internet. Even if they aren't aware of how easy it is to
backtrack on IP connections made, that ease is the nature of the network.

Privacy policies then are just ways of saying what one is or is not
going to do with collected non-anonymous data. Perhaps we just have a
robust, clear, and well-known privacy policy?

One aspect of anonymity we can't easily ignore is the spectre of a court
order coming to open up data protected by that privacy policy. Once we
have collected and retained data, our responsibilities around that data
seem to go up greatly. Therefore there is a temptation from a certain
mindset to retain nothing. What's the best compromise?

In terms of the goal of collecting data to help EPEL, I presume anything
we did with analysis would want to include making available the analyzed
dataset. Is that possible to do while protecting privacy?

Maybe privacy is the goal more than anonymity? And can we make datasets
available by obfuscating certain details to protect privacy? Maybe there
is an "anonymous enough" position we can take?

- Karsten

> Please do not use the word anonymous data. Trying to make data truly
> anonymous takes a LOT of work with nebulous gain. You have to do more than
> just change out ip addresses with something else. You have to remove
> timestamps, shuffle data around, drop some data and duplicate other, and
> all other kinds of things which done wrong can either not really anonymize
> the data or make the data worthless to trying to determine what is going on
> in it. Phd's come up with new methods all the time that fall apart in
> reality because of some assumption that was forgotten.
> 
> We can not promise anonymity, and trying to is not something that I could
> see happening in a volunteer organization.
> 
> Two throwing away logs gets you into trouble because the first thing you
> find is that you have a new question but you can't answer it with your old
> data because you weren't logging it. At which point you need 6 months of
> new data before you can answer that question. Plus logs are useful when you
> run into other issues like "Hey look someone broke into the system how did
> they do that?" Cross referencing http/ftp/rsync logs to the breakin usually
> shows where the attacker was really starting from which can help others.  I
> would say that any logs we keep are kept for X time where X is longer than
> 6 months and less than 2 years.
> 
> If a mirror is set up, it is set up. Data is collected and stored and
> analyzed following the laws and rules of conduct that are set up for the
> people who can view and analyze that data. What is published from that
> follows those laws and rules of conduct also. Going beyond that without a
> staff of trained and knowledgeable statisticians who have done this sort of
> thing before is a recipe for disaster.
> 
> 
> 
> _______________________________________________
> epel-devel mailing list
> epel-devel at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/epel-devel
> 


-- 
Karsten 'quaid' Wade
http://TheOpenSourceWay.org  .^\  http://community.redhat.com
@quaid (identi.ca/twitter/IRC)  \v'  gpg: AD0E0C41

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 255 bytes
Desc: OpenPGP digital signature
URL: <http://lists.fedoraproject.org/pipermail/epel-devel/attachments/20130608/436c81e4/attachment.sig>


More information about the epel-devel mailing list