On Wednesday, December 02, 2015 06:42:09 AM Kamil Paral wrote:
> > Taking all of this into account, would this be a reasonable
idea?
> > 1. At Go/No-Go voting time, all updates which block F-N release but
> > belong to F-M (M<N) release, must be already pushed stable. If this
> > is not the case and it's the last blocking issue, selected tasks
> > (like copying compose trees into appropriate places) can be
> > performed, and Go/No-Go will be rescheduled to the day and time when
> > it is expected that those updates will have been pushed.
>
> I think thats not a great idea. It gets back to why we only slip in one
> week increments. If say we have a go/no-go on a thursday and the only
> thing blocking it is some update thats not pushed stable all the way
> yet, we reschedule for friday and if it's not done then we schedule for
> saturday? This means everyone has to work extra hours without even
> being sure when the release will be.
If the update is pending stable and just not pushed, it might sense to move
it one day, yes (most probably skipping weekends, though). If it needs more
testing, we might decide to postpone it a several days. If it's not
available at all yet, waiting an extra week might be the right choice. So
it would depend on the situation and best guess of folks at Go/No-Go.
I am with Kevin here, we have things tightly coupled with mirrors and
mirroring, making changes by a day or two throws timings way off. purely
because we have a built in sync buffer of the weekend. To slip the go/no go
decision to Monday we would need to push out the ship date from Tuesday to
Friday to give mirrors syncing time and that is making things somewhat tight.
We really need to slip a week for any slip
> Leaves less time to sync mirrors,
> update common bugs, etc etc.
I would say the opposite - all of that can start happening right away, it's
not blocked on waiting for the FN-x push. So in case the announcement gets
out on Tuesday as usual, it's the same time, but if it gets pushed back to
Wednesday or later day, it's more time for these tasks to happen. The only
exception is that FN-x updates repo, which will get shorter sync time
because we want to make sure people download the fixed packages, not old
ones. Currently that behavior is undefined.
we can not put the bits onto the mirrors until we are sure they are the bits,
otherwise we offer the mirrors lots of churn, wasted iops and bandwidth and we
lose mirrors.
> So, the alternative there would be to slip a week to get it
pushed, but
> some people may find that excessive.
That's why I wanted to propose something more flexible, but hey, it's just
an idea.
In order to be more flexible here we would really need to change fundamentally
how we push bits to the mirrors. If we had a CDN of our own under our control
we would have more options available, but the cost of that would be massive.
> > 2. We will
> > create a new mirrormanager script which will go through the specified
> > metalink(s) and remove all metadata hashes which are older than
> > provided timestamp/hash.
>
> Something like that should be pretty easy to do I would think.
> (Although I am not a mm developer)
Looking into existing MM scripts, I have the same opinion, but I can contact
Adrian to confirm. If we want to make it even simpler, we can drop all
alternative metadata and leave just the current hash (that script would be
run once the push containing that critical update is performed).
I am okay with
having a way to say ship only the latest metadata.
> > 3. If there are such updates as mentioned in
> > point 1., RelEng will use this script to remove old metadata
> > alternatives from the metalink, which means only metadata from the
> > day this update was pushed or newer will be kept. In order to not
> > increase mirror strain too much, this doesn't need to be used
> > immediately, but just shortly before the release announcement (so
> > that mirrors have time to sync latest packages, and the user load is
> > distributed among more mirrors including those with current-1 or
> > current-2 trees as long as possible). 4. Once the script is run in
> > point 3., we can post the release announcement in 6 hours.
> >
> > I know there still one manual step involved (figuring out in which
> > push the blocker update was included), but I don't know how to better
> > solve it, especially if we don't want to wait for too long.
> >
> > I would be interested in Infra/RelEng feedback for the technical part
> > of this (CCing Kevin and Dennis). Do you think this is reasonable
> > solution, or am I completely off the track here? Do you see any
> > better options?
>
> So, looking back, we had the case of that dnf-system-upgrade. Are there
> any others in the past, or are we making a bigger than life deal out of
> one case?
I don't want to exaggerate the topic, but I'd also like to find and describe
a process how we can avoid it next time. It will be needed twice a year at
maximum.
I believe there were a few similar issues in the past, but I can't really
point to any other examples. In majority of cases, this is likely to be
related to system upgrade (system-upgrade, dnf, plymouth, systemd, gpg
keys).
there is the potential always of hitting issues. with upgrades. an older
release gets a higher nvr and things get messy. It is not an issue just at
release time.
> Also, that case could have been solved by dropping the
alternates in
> metalink as you suggest above at 2 right?
Yes.
> One thing that perhaps we could improve is to somehow note these sorts
> of things to releng. I just checked irc logs and I didn't see any
> mention of that dnf-system-upgrade plugin update being important until
> nov 3rd. Would a tracker ticket help this?
In the future, these issues should be tracked by blocker bugs app using
bugzilla tracker and a specific keyword, so we should not lose track of
this. But as mentioned, pushing to stable is not enough, we also need to
make sure old content is not served to users. That's why the "dropping
alternative metadata from metalink" idea. We can file a releng ticket for
this, and either include a description of what needs to be done, or link to
some wiki SOP. QA can take care of all of that. The only thing that we need
to ensure is that it really is handled before the announcement goes live,
so it needs to be listed somewhere in RelEng/Infra "new release" SOP. --
we have no way of ensuring always that people are getting the latest data, or
that they have the latest bits installed. but people can always shoot
themselves in the foot. people can and will do a distro update without
updating the running os first. I would suggest not filing a rel-eng ticket and
telling us what to do as that will not go over well. We should now sit down
and work out a process. then likely a ticket needs to be filed asking that the
process be followed.
Dennis