[announce] yum: parallel downloading

Roberto Ragusa mail at robertoragusa.it
Sun May 20 16:20:17 UTC 2012


On 05/20/2012 06:10 AM, Glen Turner wrote:
> On 19/05/12 01:04, José Matos wrote:
> 
>> The total number of connections should be the same, as far as I
>> understand only the number of connections from a single host will be three.
> 
> The risk is the rise in the maximum number of concurrent connections. A
> server happily supplying 50,000 concurrent connections should not be
> assumed to remain happy at 150,000 concurrent connections.

Why do you think that there will be 150,000 concurrent connections?
The difference could be that instead of
- 50,000 concurrent users, each downloading one file
you have
- 16,667 concurrent users, each downloading 3 files
The number of concurrent users is now lower because, well, each of them
now completes a "yum update" in one third of the time.

Reality could be different for several reasons (are users bandwidth limited?
if the server bandwidth limited?), but the concept is fine and it has
been perfectly expressed by Jose.

>> Since it should be safe to assume that the downloads are independent
>> events then there should not be any significant difference for busy
>> servers. :-)
> 
> I am afraid that I have missed your point here. I am somewhat blinded by
> the use of the word "independent". I have a statistical background and
> that word carries a meaning similar to "unrelated".

50,000 connections from different users are independent.
50,000 connections from 16,667 users doing 3 connections are almost
as independent as before.
Statistically, consider a random variable which is 0 (not downloading)
or 1 (downloading).
Compare:
  sum of N independent variables
to:
  three times the sum of N/3 independent variables
If N>>3, not only the average is the same, but higher-order statistics are
only slightly higher. It is reasonable to say that the probability distribution
is practically the same.

An example:

- is_downloading_one_file probability p=0.01
- number of users N=1,000,000

--> concurrent downloads: average=10,000 (sigma=~100)

vs

- is_downloading_three_files probability p=(0.01/3)
- number of users N=1,000,000

--> concurrent downloads: average=10,000 (sigma=~170)

-- 
   Roberto Ragusa    mail at robertoragusa.it


More information about the devel mailing list