On Sun, 13 Jan 2019 at 12:47, Nico Kadel-Garcia <nkadel@gmail.com> wrote:
On Sun, Jan 13, 2019 at 11:54 AM Stephen John Smoogen <smooge@gmail.com> wrote:
> On Sat, 12 Jan 2019 at 22:25, Nico Kadel-Garcia <nkadel@gmail.com> wrote:
>> On Fri, Jan 11, 2019 at 4:37 PM Roberto Ragusa <mail@robertoragusa.it> wrote:
>> >
>> > On 1/8/19 4:22 PM, Lennart Poettering wrote:
>> >
>> > > If all you want to do is count, then it should be entirely sufficient
>> > > to do it like this:
>> > >
>> > >    GET /metalink?repo=fedora-28&arch=x86_64&edition=<blah>&countme=1 HTTP/1.1
>> > >
>> > > the first time within each one-week window and a simple
>> > >
>> > >    GET /metalink?repo=fedora-28&arch=x86_64&edition=<blah> HTTP/1.1
>> > >
>> > > all other times.
>> >
>> > As an additional improvement, is it really needed to count every machine?
>> > We can subsample a lot, and only let some specific machines to show
>> > up for counting.
>> The difficulty is not the counting. Requiring safe counting and
>> aggregation by the server is a requirement that no server or
>> intermediate server or proxy needs to follow, and would require
>> configuration or filtering control of a server that is outside of
>> client hands. It's not legally or technologically mandated. The great
>> use fo r the data is tracking hosts, metadata that is saleable and
>> likely to help provide a new form of tracking information.
>> Writing this into the dnf behavior is typical, but i't's not
>> beneficial to the clients. It's beneficial to the mirrors, who are
>> likely to sell the data. While it may be that infamous problem, a
>> "Simple Matter Of Programming(tm)" to sanitize the data, there are
>> strong motivations to collect it and sell it, and I'd expect various
>> mirrors to start doing so within moments of the activation of the
>> feature.
> 1. The mirrors do not see this.

If it's not available to the mirrors, then anyone who hardcodes a
mirror's URL into the local "baseurl" settings is not going to be
counted this way, and we're back at the "we don't know how many
clients there are" problem. If only the "mirrorlist" hosts see the
UUID, "countme" or any other identical client ID.

Since you seem to have avoided reading the emails where this was detailed, here is the simplest version of the countme proposal. [Please see Lennarts email and replies for the non shortened version.]

Once a time period (day, week, month), an update would just add a countme=1 to it. 

There is no more client id. There is no data other than that. We would just count all the countme=1 and get an idea of what was going on. It isn't an exact number but it puts some amount of solid-ness in the fuzzy cloud. The more complicated version which mattdm is wanting is that countme gets incremented by the week after install. Nothing else. No data from the /etc/machine-id, no data from /var/yum/uuid etc.

> 2. We aren't talking about UUIDs anymore and just a countme variable being sent periodically. If a countme is going to be too much data to send, then clients are probably already sending way too much data already.

Then can we change the title of the thread?

Nico, you know this better than me. This is email not a forum. People can rename threads but depending on the email software it will just look like a completely different thread. I think a rename has been done, but people keep responding on this thread. 

Stephen J Smoogen.