On Thu, 17 Jun 2021 at 12:27, Justin Forbes <jmforbes(a)linuxtx.org> wrote:
On Wed, Jun 16, 2021 at 3:23 PM Matthew Miller <mattdm(a)fedoraproject.org> wrote:
>
> On Wed, Jun 16, 2021 at 02:57:17PM +0200, Vitaly Zaitsev via devel wrote:
> > >We'll at least gather information about capabilities of Fedora
> > >users hardware.
> > Telemetry is evil. It must not be allowed.
>
> Well, that's certainly A Position. I don't think it's anything nearly
so
> absolute, though, and depends on what, who, how, why, and a host of other
> things. And "it can help us answer questions like this for our community"
is
> a pretty non-evil "why".
I think there can be a lot of benefit in anonymized hardware data (not
mandatory). It does help answer questions like this, but more
importantly, it would make a lot of the kernel work a bit easier, or
at least more focused. It answers questions like, "should we enable
these drivers as they are likely to be used?" or "can we disable these
drivers because no one is using them?". It is also very helpful in
working out bug priority in drivers. A lot of people never bother
filing bugs, and are happy to keep booting a known good kernel since
we allow parallel installs. If we get a few users chiming in, and
realize the hardware in question is used by a significant chunk of
users, it would tell me that perhaps that should take priority over a
bug which impacts hardware with considerably fewer users. Yes, you
have to be extremely careful about what data you collect, and how that
data is handled, but if done correctly, there are a lot of benefits.
The major problem is that 'anonymized' data does not exist. Pretty
much every method which says it 'anonymizes' stuff does not and can
lead to a strong 'fingerprint' back to an individual or group. The
only methods which truly do seem to stop this basically add so much
random noise to the data that it is 'useless' for whatever analysis.
[AKA you might as well just tell /dev/urandom to give you a couple of
gigabytes of answers.]
This means that any program you use to collect information needs to
assume that it will have to be regularly purged/cleaned/etc. You will
have to only take snapshots of very high level data to compare to
other timeframes. It also has to assume that there are enough people
who do not want to be watched (even if you ask for them to volunteer
info) that they will feed you bad data. This is why we had more
PDP-11's in smolt than we had some valid architectures we shipped.
Once you start finding these limits, you realize you don't want to mix
it with data you need for continual operation of service. If you do,
you will lose that data regularly also, may be told to turn off, or
find yourself spending too many resources to keep it. This is the
reason I object to adding too much to mirrorlist data. That is a
service which we need to keep up and we need to keep some history for
general operations. [We need to know how many resources we are
servicing and how long it takes to respond. Having multiple years
helped show when the mirror program began to fall over from too many
customers and the fact that the change in certain yum cron jobs were
increasingly causing issues.]
--
Stephen J Smoogen.
I've seen things you people wouldn't believe. Flame wars in
sci.astro.orion. I have seen SPAM filters overload because of Godwin's
Law. All those moments will be lost in time... like posts on BBS...
time to reboot.