Up top I'd like to note there are really kind of two buckets of
'special blockers' for any given release. If the release being
validated is N, they are:
1) Bugs for which the fix must be in the 0-day update set for N
2) Bugs for which the fix must be stable in N-1 and N-2 by N release day
I'd like to return to the high-level overview for this topic and discuss the changes
we plan to do in our SOPs.
So far, we decided to call bugs from bucket 1) as Accepted0Day and bugs from bucket 2) as
AcceptedPreviousRelease. I also worked on some technical details for ensuring
AcceptedPreviousRelease updates get pushed on time. Now we need to discuss what *happens*
when we have one of these two bugs.
== Question #1: Do we slip always? ==
With media blockers, we need to create new media, which ensures a slip (there were a few
exceptional situations in the past where we managed to build and test fixed media in a
day, and therefore postponed the Go/NoGo decision for a day). With non-media blockers, the
affected artifact is either the repository tree (we need to push a new build for
Accepted0Day), or a previous release repository tree (we need to push an update for
AcceptedPreviousRelease). For Accepted0Day, this will most likely involve critical bugs in
components which are not on the default installation media (but for example negatively
influence them, or prevent some other important functionality). For
AcceptedPreviousRelease, this will most likely involve bugs in upgrading the system, or a
few other specific cases like creating a bootable media of the new release or booting the
new release in a VM.
Now the question is whether exactly the same rules apply (i.e. if this is not fixed at
go/no-go, we slip), or whether in certain cases we would decide to not slip.
Since pushing an update can be done relatively fast, I can imagine that people would
propose to not slip if an update is prepared and tested, but not yet pushed stable.
Earlier in this thread, I tried to point out that this is not enough, because things can
go wrong on multiple levels and we really need to insist that update is already stable
(and metalink metadata adjusted to not allow usage of older pushes). Of course this can be
handled with the same trick as media blockers sometimes, i.e. postponing the Go/NoGo
decision for a day, provided RelEng approves. But in general I think we should not avoid
slipping and just "hope for the best". These bugs were accepted as blockers and
we need to make sure people don't hit them, even if they have a bad luck of being
assigned to an older mirror.
Do you see any other cases where we should either not slip, or it would be tempting to not
slip and we should discuss and define such use case explicitly?
== Question #2: For how long do we slip? ==
Earlier in this thread, I suggested some kind of a dynamic slip that would reflect how
fast we can resolve things (for example perform a push). But both Kevin from Infra and
Dennis from RelEng didn't think it was a good idea, and claimed we should slip as
usual, i.e. a week (if I misunderstood something, please correct me). Of course this is
their field, not QA's, so I definitely believe their judgment.
Do you have some other ideas/proposals, in general or in some specific situations
regarding the slip length?
Thanks a lot for feedback,