Hello everyone.
I recently got to know that Fedora's DNF creates an UUID to help keep track of the number of unique Fedora users. As per my understanding, before implementing this UUID mechanism, they obtained their user-base estimate through the use of IP addresses. I would appreciate it if someone could clarify these concerns of mine: - Is the generated UUID based on the hardware configuration of the Fedora user, or is it a random UUID? (If the user re-installs Fedora, will the re-generated UUID be alike to the first one, in any way?) - Will the user's UUID be sent to package mirrors each time they perform an update/installation of packages? (If so, would this mean that a malicious mirror could potentially map a user's UUID with all the associated package-requests?) - Is there any way to opt out of providing data for this user-base statistical analysis? Could someone also point to the file in the source-code (https://github.com/rpm-software-management/dnf) where this UUID-feature has been implemented?
On 02/05/2021 09:35, None via users wrote:
- Is there any way to opt out of providing data for this user-base statistical analysis?
Yes.....
Go to /etc/yum.repos.d and remove "countme=1" from any of *.repo files.
On 02/05/2021 09:35, None via users wrote:
I recently got to know that Fedora's DNF creates an UUID to help keep track of the number of unique Fedora users. As per my understanding, before implementing this UUID mechanism, they obtained their user-base estimate through the use of IP addresses.
And, have you read https://fedoraproject.org/wiki/Changes/DNF_Better_Counting
- And, have you read https://fedoraproject.org/wiki/Changes/DNF_Better_Counting I went through it; it seems like there's no UUID created—just a "countme" variable. Is that right?
On 02/05/2021 10:28, ml-devel@keemail.me wrote:
- And, have you read https://fedoraproject.org/wiki/Changes/DNF_Better_Counting https://fedoraproject.org/wiki/Changes/DNF_Better_Counting
I went through it; it seems like there's no UUID created—just a "countme" variable. Is that right?
Correct.
"For this reason, we don’t want to use any identifier like /etc/machine-id which may be used for other purposes — or in fact any UUID at all."
On 02/05/2021 10:28, ml-devel@keemail.me wrote:
- And, have you read https://fedoraproject.org/wiki/Changes/DNF_Better_Counting https://fedoraproject.org/wiki/Changes/DNF_Better_Counting
I went through it; it seems like there's no UUID created—just a "countme" variable. Is that right?
This would also be of interest to you...
https://bugzilla.redhat.com/show_bug.cgi?id=1672504
ed.greshko@greshko.com wrote:
This would also be of interest to you...
The link was helpful. I have one more question:
- Where is this countme variable sent? Is it to Fedora itself (the host getfedora.org), or the package mirrors, that are mostly hosted by third-parties? (In other words, when I install a package, will a mirror receive the value of my countme variable?)
On Sun, 2 May 2021 05:12:55 +0200 (CEST) None via users users@lists.fedoraproject.org wrote:
ed.greshko@greshko.com wrote:
This would also be of interest to you...
The link was helpful. I have one more question:
- Where is this countme variable sent? Is it to Fedora itself (the
host getfedora.org), or the package mirrors, that are mostly hosted by third-parties? (In other words, when I install a package, will a mirror receive the value of my countme variable?)
I think the answer to your question is that the variable is sent to the mirror, so yes, a mirror will receive the flag. However, they appear to have gone to great lengths to avoid leaking any identifiable information. See this link:
https://github.com/rpm-software-management/dnf/pull/1450/commits/24e6fadf032...
On Sun, May 2, 2021 at 3:15 PM stan via users users@lists.fedoraproject.org wrote:
I think the answer to your question is that the variable is sent to the mirror, so yes, a mirror will receive the flag. However, they appear to have gone to great lengths to avoid leaking any identifiable information. See this link:
https://github.com/rpm-software-management/dnf/pull/1450/commits/24e6fadf032...
That's correct. The flag is only sent with a metalink or mirrorlink request, which is, in case of Fedora, done centrally to the mirrors.fedoraproject.org (so called MirrorManager) server. The subsequent requests for the repodata itself sent to a specific mirror won't include that flag.
There are two pieces of information the countme flag carries:
1) the age of the installation (one of 4 values, see the link above)
2) the fact it's one of the first N metalink requests made that week from that particular system (i.e. we randomly pick a number from 1 to N during the first request that week, and then decrement it on every subsequent request; when it hits 1, we add the countme flag). Currently, we've hardcoded N to be 4, based on some very rough estimation of how many requests there usually are on a typical (workstation) installation throughout a week, but it's quite arbitrary at the moment.
The goal of 2) was to avoid signaling "hey, look everybody, this is my first DNF metadata refresh this week", as that alone could indicate some usage patterns of that system (e.g. when it was booted up) and thus is, in a way, user-specific. So adding this little randomization component helps mitigate this. The idea was to minimize any kind of information leakage with this flag, as Stan puts it.
On Sun, May 2, 2021 at 3:47 PM Michal Domonkos mdomonko@redhat.com wrote:
- the age of the installation (one of 4 values, see the link above)
Oh, just a little correction - there were some later changes made to that man page (the age "buckets" in particular) that are not reflected in the linked PR, so please check out the current dnf.conf(5) man page instead.
On Sun, May 2, 2021 at 3:47 PM Michal Domonkos mdomonko@redhat.com wrote:
On Sun, May 2, 2021 at 3:15 PM stan via users users@lists.fedoraproject.org wrote:
I think the answer to your question is that the variable is sent to the mirror, so yes, a mirror will receive the flag. However, they appear to have gone to great lengths to avoid leaking any identifiable information. See this link:
https://github.com/rpm-software-management/dnf/pull/1450/commits/24e6fadf032...
That's correct. The flag is only sent with a metalink or mirrorlink request, which is, in case of Fedora, done centrally to the mirrors.fedoraproject.org (so called MirrorManager) server. The subsequent requests for the repodata itself sent to a specific mirror won't include that flag.
Another correction - turns out I didn't read Stan's reply carefully; he says it's sent to a mirror, however that's not the case :) It's just the MirrorManager instance that receives it.
On Sun, May 02, 2021 at 03:35:48AM +0200, None via users wrote:
I recently got to know that Fedora's DNF creates an UUID to help keep track of the number of unique Fedora users.
It does not. I initially proposed this, similar to what openSUSE does, but the actual implementation does not use a UUID at all. You can read about the actual implementation here:
https://dnf.readthedocs.io/en/latest/conf_ref.html#options-for-both-main-and...
As per my understanding, before implementing this UUID mechanism, they obtained their user-base estimate through the use of IP addresses.
That's correct.
I would appreciate it if someone could clarify these concerns of mine:
- Is the generated UUID based on the hardware configuration of the
Fedora user, or is it a random UUID? (If the user re-installs Fedora, will the re-generated UUID be alike to the first one, in any way?)
There is no UUID; all sytems of the same general age (1 week, 2-4 weeks, 5-24 weeks, > 24 weeks) with the same release, os_variant, and architecture are all aggregated together.
- Will the user's UUID be sent to package mirrors each time they perform an update/installation of packages? (If so, would this mean that a malicious mirror could potentially map a user's UUID with all the associated package-requests?)
No; there is no UUID. Additionally, the countme value is sent once per week and not with every request.
- Is there any way to opt out of providing data for this user-base statistical analysis?
Yes; disable "countme" in the DNF repo configs as documented above. I hope you won't, though, because this information is really helpful to us in planning, and as you can see is designed to be minimally invasive. The goal is to count, not track.
Could someone also point to the file in the source-code (https://github.com/rpm-software-management/dnf) where this UUID-feature has been implemented?
https://github.com/rpm-software-management/libdnf/pull/807
Michal Domonkos mdomonko@redhat.com wrote:
Another correction - turns out I didn't read Stan's reply carefully;
he says it's sent to a mirror, however that's not the case :) It's just the MirrorManager instance that receives it.
Ah, I see. Many thanks for the clarification.
Matthew Miller mattdm@fedoraproject.org wrote:
I hope you won't, though, because this information is really helpful to us in
planning, and as you can see is designed to be minimally invasive. The goal is to count, not track.
I don't plan to, since disabling it would itself open up doors for fingerprinting, haha. Anyhow, your answer was well detailed. Thanks!
Another question:
Is there a way to disable non-HTTPS mirrors? I don't see any reason why your ISP should be able to see what distribution you use, and most importantly, the exact package versions you're installing/updating. (I know DNF will only install signed packages, so I don't have any concerns regarding security.)
Thanks
On 2021-05-02 8:52 a.m., None via users wrote:
Is there a way to disable non-HTTPS mirrors? I don't see any reason why your ISP should be able to see what distribution you use, and most importantly, the exact package versions you're installing/updating. (I know DNF will only install signed packages, so I don't have any concerns regarding security.)
I actually try to do the opposite, because I made a caching proxy and it can't cache https requests. However, things like COPR only have an https option.