On Sun, Jan 13, 2019 at 11:54 AM Stephen John Smoogen <smooge(a)gmail.com> wrote:
On Sat, 12 Jan 2019 at 22:25, Nico Kadel-Garcia <nkadel(a)gmail.com> wrote:
> On Fri, Jan 11, 2019 at 4:37 PM Roberto Ragusa <mail(a)robertoragusa.it> wrote:
> > On 1/8/19 4:22 PM, Lennart Poettering wrote:
> > > If all you want to do is count, then it should be entirely sufficient
> > > to do it like this:
> > >
> > > GET
> > >
> > > the first time within each one-week window and a simple
> > >
> > > GET /metalink?repo=fedora-28&arch=x86_64&edition=<blah>
> > >
> > > all other times.
> > As an additional improvement, is it really needed to count every machine?
> > We can subsample a lot, and only let some specific machines to show
> > up for counting.
> The difficulty is not the counting. Requiring safe counting and
> aggregation by the server is a requirement that no server or
> intermediate server or proxy needs to follow, and would require
> configuration or filtering control of a server that is outside of
> client hands. It's not legally or technologically mandated. The great
> use fo r the data is tracking hosts, metadata that is saleable and
> likely to help provide a new form of tracking information.
> Writing this into the dnf behavior is typical, but i't's not
> beneficial to the clients. It's beneficial to the mirrors, who are
> likely to sell the data. While it may be that infamous problem, a
> "Simple Matter Of Programming(tm)" to sanitize the data, there are
> strong motivations to collect it and sell it, and I'd expect various
> mirrors to start doing so within moments of the activation of the
1. The mirrors do not see this.
If it's not available to the mirrors, then anyone who hardcodes a
mirror's URL into the local "baseurl" settings is not going to be
counted this way, and we're back at the "we don't know how many
clients there are" problem. If only the "mirrorlist" hosts see the
UUID, "countme" or any other identical client ID.
2. We aren't talking about UUIDs anymore and just a countme
variable being sent periodically. If a countme is going to be too much data to send, then
clients are probably already sending way too much data already.
Then can we change the title of the thread?
If the "countme" variable is unique and sent only to the host
providing the mirrorlist, it's tracking data. That host becomes
responsible for anonymization, and it is *too late* unless the data
encrypted at the client, say with the GPG key of the relevant
repository, and that starts requiring GPG private keys on the host
providing the mirrorlist. If it's bonig across the wire, even with
SSL, man-in-the-middle is an old, old problem.
Whether the mirrorlist back end software is promised to be sanitized,
it's tracking data. Sadly, I've been through this in other venues. The
data was considerd "safe" because it was "anonymized". Except that
original web traffic was tappable, along with IP addresses and unique
client information. A subpoena, a Patriot Act request, or even a
foreign worker with an H1-B visa reporting back to foreign
intelligence or a technology competitor could obtain a great deal of
Am I paranoid? Yes. Am i paranoid *enough*? I'm not so sure, we've
seen assembly of pseudonymous data and metadata throughout the history
of intelligence work. Demanding it, and handling it safely, is often
an exercise in people claiming "no one would do that!", "no one would
bother to investigate that", and people misusing it as a matter of
course. I'd suggest it's not even worth the effort to demand or to
collect with such concerns.
Nico Kadel-Garcia <nkadel(a)gmail.com>