<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On 7 June 2013 13:48, Matthew Miller <span dir="ltr">&lt;<a href="mailto:mattdm@fedoraproject.org" target="_blank">mattdm@fedoraproject.org</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Fri, Jun 07, 2013 at 01:31:36PM -0600, Stephen John Smoogen wrote:<br>

&gt; The easiest way I could see is just get a better sampling method which<br>

&gt; would be to have funding for a mirror which we then put into mirror-manager<br>

&gt; and we know that this is a sampling versus a request info. (basically we<br>

&gt; would see what packages are downloaded directly and then extend that sample<br>

&gt; from the amount of downloads to the 500,000 systems that check in via<br>

&gt; mirrormanager). The problems involved are paying for systems, storage, and<br>

&gt; bandwidth for such items.<br>

<br>

</div>Maybe one of the mirrors would be able to provide logs?<br><span class="HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br></div><div style>Possibly. In the past mirror admins have not wanted to do so for many reasons (can&#39;t keep logs longer than 24 hours for policy reasons, can&#39;t give over logs without a formal agreement and then with as much redacted as possible, if we do it for X then we have to do it for everyone so no thankyou.) When I was at my university gig, it had to go up 4 levels of management before I gave up at the sub-CIO level.) </div>

<div style><br></div><div style>I have tried looking at the top level mirrors but most of the data is swamped out by other sites mirroring and lots of people doing development work and pointing to repos directly. This led to some strange statistics where trying to pull out even most of the noise made for various packages to &quot;stand out&quot; until I realized they were pulled in for cross-compiles and such (or the site that likes to do partial mirrors every couple of hours but always pulls in the same 4 packages each time even when it pulls in others.) I am expecting that other mirrors are going to run into that which means that stuff that a lot of sites could give out (just the urls per day) versus the IP address, URL would mean that the data would have a lot of weird noise that makes say zvbi show up high because it is both getting mirrored as the last package on the server and also because 8 packages use it as depends (not true but I can&#39;t remember the package that showed up a ton.)</div>

<div style><br></div><div style>In either case, it is what got me to realize that a mirror is needed to allow for better statistics of this sort because the data can be cleaned as needed versus pre-cleaned and reanimated. </div>

<div style><br></div></div><div><br></div>-- <br><div dir="ltr">Stephen J Smoogen.<br><br></div>

</div></div>