On 1/8/19 4:22 PM, Lennart Poettering wrote:
If all you want to do is count, then it should be entirely
to do it like this:
the first time within each one-week window and a simple
GET /metalink?repo=fedora-28&arch=x86_64&edition=<blah> HTTP/1.1
all other times.
As an additional improvement, is it really needed to count every machine?
We can subsample a lot, and only let some specific machines to show
up for counting.
That is, apply the logic above only if(hash(machine_id)%1000==0)
(this becomes a poll instead of a referendum, results must then be multiplied by 1000)
Or, to avoid having somebody constantly be counted and other constantly ignored,
the rule could be if(hash(machine_id)%1000==hash(weekofthecentury)%1000)
With this setup I know that 99.9% of the weeks I'm not reporting anything at all.
Of course 1000 is a constant that may be tuned, but looks a good choice
to me if the expected total number is on the order of 1 million.
Roberto Ragusa mail at robertoragusa.it