Reliability of Fedora infrastructure to download cloud images
Attila Fazekas
afazekas at redhat.com
Mon Jun 23 08:56:58 UTC 2014
----- Original Message -----
> From: "Kashyap Chamarthy" <kchamart at redhat.com>
> To: "Kevin Fenzi" <kevin at scrye.com>
> Cc: infrastructure at lists.fedoraproject.org, mattdm at fedoraproject.org, afazekas at redhat.com
> Sent: Friday, June 20, 2014 5:55:17 AM
> Subject: Re: Reliability of Fedora infrastructure to download cloud images
>
> On Thu, Jun 19, 2014 at 09:20:14AM -0600, Kevin Fenzi wrote:
> > On Thu, 19 Jun 2014 00:24:55 +0530
> > Kashyap Chamarthy <kchamart at redhat.com> wrote:
> >
> > > [I'm not subscribed to this list, please keep me in CC.]
> > >
> > > Heya,
> > >
> > > A little while ago, we (Matthew Miller, myself, Attila Fazekas
> > > (upstream OpenStack developer) had an IRC discussion (on
> > > #openstack-qa, Freenode) with OpenStack upstream CI infrastructure
> > > folks about their concerns for continuing to have Fedora as a default
> > > to run as CI voting guest (Nova instance). They (mostly Sean Dague -
> > > a major upstream OpenStack contributor who voiced these) outlined a
> > > few issues:
> >
> > I'm not famillar with the terminology, what does a 'voting guest' mean?
>
> Sorry for being unclear. It means, any proposed OpenStack change/patch
> has to be executed on a Fedora virtual machine too, only once it passes
> the tests on Fedora, patches will be merged to upstream git. I cc'd
> Attila, he can correct me if I said something wrong.
>
If the job is voting on the gate pipeline
it can prevent incompatible changes.
> >
> > > 1. It's not possible to download from the fedora infrastructure
> > > reliably - 10% failure rate from their cloud providers (HP and
> > > RAX).
> > > - About this point, when mattdm inquired - "is the failure in
> > > hitting the fedora mirrors or fedora core infrastructure?",
> > > their response - "I don't fully know, I think going through
> > > the url we are using we get bounced to mirrors".
> >
> > Yeah, more data would be very nice here... what url(s) they are using,
> > what error codes if any they get back?
I saw the image download failure at least once,
but I cannot find the pattern for the failure :(.
IMHO it was less than 10% failure rate,
but open-stack infra/QA notices issues above 0.1% failure rate.
If I or anyone see the failure pattern again he can add a query
to the http://status.openstack.org/elastic-recheck/.
In this case we would know how much issues happens exactly.
Anyone who sign the Openstack contributor agreement,
can propose queries to the repo:
https://github.com/openstack-infra/elastic-recheck/tree/master/queries
Here are the image download urls:
https://github.com/openstack-dev/devstack/blob/master/stackrc#L357
>
> Looking at the script[1] that creates the CI VM, it uses this URL --
> https://dl.fedoraproject.org/pub/fedora/linux/releases/20/Images/x86_64/Fedora-x86_64-20-20131211.1-sda.qcow2
>
>
> [1] https://github.com/openstack-dev/devstack/blob/master/stackrc#L353
>
>
> > Are these the released cloud images? f19/20? Or nightlies or ?
>
> Released, official images.
>
> > How often do they download? Once a image is loaded, I am not sure why
> > they would re-download it unless it's changed?
>
> I just confirmed, they (CI infra) download and cache it. But, once every
> 24 hours, they rebuild the caches. It's the humans that download it
> manually (without any caching environment) that face the bottlenecks
> they say.
>
AFAIK every worker node downloads the L2 images once it's lifetime,
I do not know what is the average lifetime of these vms.
An L2 image version switch can lead to ~500 image download in 1 hour.
> > Or unless they are
> > grabbing nightly rawhide images?
>
> They won't prefer to do this as only distribution tested image will be
> used used in OpenStack CI environment.
>
> > > 2. There are possibly issues with the normal upstream fedora image
> > > that could be fixed with custom respin.
> > > - NOTE: I'm doubtful of this idea, as existing Fedora cloud
> > > images itself are not really extensively tested. I'd think focusing on
> > > _official_ cloud images and having a solid set of tests so
> > > that it can be consumed by cloud projects (OpenStack, etc).
> > >
> > > - Having a custom respin means that we're off the main path for
> > > testing of the image -- which again needs _some_ level of
> > > assurance that it can be used in a higher-level cloud
> > > project's CI infr- which again needs _some_ level of assurance that
> > > it can be used in a higher-level cloud project's CI infra.
> >
> > Yeah, I would think we would like to avoid that... and try and merge in
> > the changes they need for images instead of them going and making their
> > own that only they use.
>
> Oh, it's my poor wording, they didn't mean to say _they'd_ create these
> custom images. OpenStack infra is clear - they'd only use reasonably
> well-tested imges from Distributions.
>
> > > 3. Another important point OpenStack infra folks emphasized is -
> > > these images will get 4000 test runs a week on them
> >
> > Cool.
> >
> > > Any suggestions to allay these are welcome.
> >
> > Happy to try and solve any bottlenecks they are having...
>
> Yeah, folks are testing more than ever with Fedora lately.
>
> OpenStack infra/qa folks have an upcoming meet up discuss several,
> Fedora is also on their topic. Will let you know if they provide more
> specific, technical feedback from OpenStack infra.
>
>
> Thanks.
>
>
> --
> /kashyap
>
More information about the infrastructure
mailing list