Reliability of Fedora infrastructure to download cloud images

Attila Fazekas afazekas at redhat.com
Mon Jun 23 08:56:58 UTC 2014





----- Original Message -----
> From: "Kashyap Chamarthy" <kchamart at redhat.com>
> To: "Kevin Fenzi" <kevin at scrye.com>
> Cc: infrastructure at lists.fedoraproject.org, mattdm at fedoraproject.org, afazekas at redhat.com
> Sent: Friday, June 20, 2014 5:55:17 AM
> Subject: Re: Reliability of Fedora infrastructure to download cloud images
> 
> On Thu, Jun 19, 2014 at 09:20:14AM -0600, Kevin Fenzi wrote:
> > On Thu, 19 Jun 2014 00:24:55 +0530
> > Kashyap Chamarthy <kchamart at redhat.com> wrote:
> > 
> > > [I'm not subscribed to this list, please keep me in CC.]
> > > 
> > > Heya,
> > > 
> > > A little while ago, we (Matthew Miller, myself, Attila Fazekas
> > > (upstream OpenStack developer) had an IRC discussion (on
> > > #openstack-qa, Freenode) with OpenStack upstream CI infrastructure
> > > folks about their concerns for continuing to have Fedora as a default
> > > to run as CI voting guest (Nova instance). They (mostly Sean Dague -
> > > a major upstream OpenStack contributor who voiced these) outlined a
> > > few issues:
> > 
> > I'm not famillar with the terminology, what does a 'voting guest' mean?
> 
> Sorry for being unclear. It means, any proposed OpenStack change/patch
> has to be executed on a Fedora virtual machine too, only once it passes
> the tests on Fedora, patches will be merged to upstream git. I cc'd
> Attila, he can correct me if I said something wrong.
> 
If the job is voting on the gate pipeline
 it can prevent incompatible changes.
> > 
> > >   1. It's not possible to download from the fedora infrastructure
> > >      reliably - 10% failure rate from their cloud providers (HP and
> > >      RAX).
> > >       - About this point, when mattdm inquired - "is the failure in
> > >         hitting the fedora mirrors or fedora core infrastructure?",
> > >         their response - "I don't fully know, I think going through
> > > the url we are using we get bounced to mirrors".
> > 
> > Yeah, more data would be very nice here... what url(s) they are using,
> > what error codes if any they get back?

I saw the image download failure at least once,
but I cannot find the pattern for the failure :(.
IMHO it was less than 10% failure rate,
but open-stack infra/QA notices issues above 0.1% failure rate.

If I or anyone see the failure pattern again he can add a query
to the http://status.openstack.org/elastic-recheck/. 
In this case we would know how much issues happens exactly.

Anyone who sign the Openstack contributor agreement,
can propose queries to the repo: 
https://github.com/openstack-infra/elastic-recheck/tree/master/queries

Here are the image download urls:
https://github.com/openstack-dev/devstack/blob/master/stackrc#L357

> 
> Looking at the script[1] that creates the CI VM, it uses this URL --
> https://dl.fedoraproject.org/pub/fedora/linux/releases/20/Images/x86_64/Fedora-x86_64-20-20131211.1-sda.qcow2
> 
> 
>   [1] https://github.com/openstack-dev/devstack/blob/master/stackrc#L353
> 
>  
> > Are these the released cloud images? f19/20? Or nightlies or ?
> 
> Released, official images.
> 
> > How often do they download? Once a image is loaded, I am not sure why
> > they would re-download it unless it's changed?
> 
> I just confirmed, they (CI infra) download and cache it. But, once every
> 24 hours, they rebuild the caches. It's the humans that download it
> manually (without any caching environment) that face the bottlenecks
> they say.
>
AFAIK every worker node downloads the L2 images once it's lifetime,
I do not know what is the average lifetime of these vms.
An L2 image version switch can lead to ~500 image download in 1 hour.
 
> > Or unless they are
> > grabbing nightly rawhide images?
> 
> They won't prefer to do this as only distribution tested image will be
> used used in OpenStack CI environment.
> 
> > >   2. There are possibly issues with the normal upstream fedora image
> > >      that could be fixed with custom respin.
> > >       - NOTE: I'm doubtful of this idea, as existing Fedora cloud
> > > images itself are not really extensively tested. I'd think focusing on
> > >         _official_ cloud images and having a solid set of tests so
> > > that it can be consumed by cloud projects (OpenStack, etc).
> > > 
> > >       - Having a custom respin means that we're off the main path for
> > >         testing of the image -- which again needs _some_ level of
> > >         assurance that it can be used in a higher-level cloud
> > > project's CI infr- which again needs _some_ level of assurance that
> > > it can be used in a higher-level cloud project's CI infra.
> > 
> > Yeah, I would think we would like to avoid that... and try and merge in
> > the changes they need for images instead of them going and making their
> > own that only they use.
> 
> Oh, it's my poor wording, they didn't mean to say _they'd_ create these
> custom images. OpenStack infra is clear - they'd only use reasonably
> well-tested imges from Distributions.
> 
> > >   3. Another important point OpenStack infra folks emphasized is -
> > > these images will get 4000 test runs a week on them
> > 
> > Cool.
> >  
> > > Any suggestions to allay these are welcome.
> > 
> > Happy to try and solve any bottlenecks they are having...
> 
> Yeah, folks are testing more than ever with Fedora lately.
> 
> OpenStack infra/qa folks have an upcoming meet up discuss several,
> Fedora is also on their topic. Will let you know if they provide more
> specific, technical feedback from OpenStack infra.
>  
> 
> Thanks.
> 
> 
> --
> /kashyap
> 


More information about the infrastructure mailing list