Improving metrics gathering

Matt Domsch matt at domsch.com
Thu Feb 4 16:53:38 UTC 2010


On Thu, Feb 04, 2010 at 10:37:14AM -0600, Bruno Wolff III wrote:
> On Thu, Feb 04, 2010 at 10:16:09 -0600,
>   Matt Domsch <matt at domsch.com> wrote:
> > 
> > The biggest concern people have with using any UUID in any form is the
> > "trackability" that comes inherent with it.  Given enough log data
> > that includes UUIDs, one could potentially use it to understand
> > something about a user that they otherwise wouldn't want you to know.
> 
> A possible concern is that if you are primarily trying to determine if
> there is more than one machine using a single IP address to get yum
> updates do you really need to track those updates accross multiple IP
> addresses? This information could be more revealing as it can show travel
> patterns.

Yes, I suppose it could.

> If that isn't a requirement, perhaps yum could create a uuid
> per externel IP address. Doing that behind NAT is a bit problematic though.

ugh.  Then yum has to keep track of public IPs it's been behind
(meaning it needs to use somethink akin to whatismyip.com), and
UUIDs for each?  No thanks.

> 
> > For implementation details, I suggest yum create and persist a single
> > UUID for each installed system.  This UUID would be separate from any
> > smolt UUID.  Yum would include this UUID in HTTP requests.  Yum would
> > only provide this UUID when making mirrorlist requests, not when
> > downloading content (from mirrors or other).  All yumlib-using
> > applications such as PackageKit would then inherit this capability.
> > On the back end, Fedora Infrastructure would add capability to log
> > this UUID for each request, just as it logs mirrorlist requests
> > today.  FI scripts would then use this UUID to accurately count the
> > number of installed instances over time, recognizing that systems can
> > get re-installed (and thus get new yum UUIDs), but over time can
> > provide more accurate trending than we can get today.
> 
> Are you planning on logging UUID IP pairs or logging IP addresses independently
> of UUIDs?

Don't know - depends on how the actual UUID is passed.  If it shows up
in the normal apache logs, they'd be in the same file, in fact on the
same line.  If it shows up some other way, it might be in a different
file.  Either way, both bits are available on the same server in the
same process.

I did forget to mention that there would of course be a way to opt
out, but I'd want the default to be 'enabled'.

Thanks,
Matt


More information about the advisory-board mailing list