The slip down memory lane

Thu Aug 12 20:08:52 UTC 2010

On Thu, Aug 12, 2010 at 2:19 PM, Mike McGrath <mmcgrath at redhat.com> wrote:
> Since 2006 I counted 18 slips (I think one or two of those may just be a
> single slip listed twice).  Lets not yell, lets not flame war, lets not
> point fingers.  How can we fix this?

[snip]

> This is a collective failure.

While I agree that we've had a lot of schedule slips, and it's not
ideal to have a slip in the schedule, I don't agree that a schedule
slip is necessarily a failure *per se*.  In the case of the slip in
the Alpha, I'd even go so far as to say that we're doing something
right -- The RC did not meet its release criteria, and so we did what
we said we were going to do.  Testing found problems, and we blocked
on those problems.  A lot of different individuals have put a lot of
time and effort into getting the Alpha ready, and to say that the
result of all their work is a failure because we slipped the schedule
a week is a bit short-sighted in my opinion.

To me, the important thing here is that we learn from the experience,
and try to make things better.  The fact that we've got a (admittedly
basic and somewhat manual) test process in place shows that we really
do care about the quality of the distribution we ship.  So, the
question comes down to this -- how do we learn from the process, and
make the next release smoother than the last?

I have three suggestions.  First of all, take a look at John
Poelstra's F14 retrospective page at
https://fedoraproject.org/wiki/Fedora_14_Schedule_Retrospective.  John
has actively been trying to document the lessons we're (hopefully?)
learning from our mistakes, so that we can improve next time.  If
we're not learning from our past mistakes, we're not moving forward.
I'm sure John would appreciate our help in documenting the reasons we
slip.

My second suggestion is for FESCo to take a more active role in
tracking the major changes that land in the distribution and judging
the impact that they might have before freezes. While the major Python
and systemd changes didn't end up blocking our release, I'm sure they
had an impact on our ability to build test composes, and also our
ability to thoroughly test the RCs before the go/no-go meeting.

Third, let's all pitch in and help the QA team with some of their
automated testing, so that we can more easily test RCs and know what
shape they're in.  We simply don't have the resources to do everything
manually.

Also, let me be clear.  I'm not treating the six-month release cycle
as sacred or immutable.  I'm willing to work with the Board and FESCo
to determinte wether or not the length of the cycle should be changed.
 I'm not convinced, however, that simply lengthening the cycle will
solve the problem, unless we find better ways of making things happen
before the last minute.  For better or worse, deadlines make work
happen, and from time to time deadlines get broken.  Obviously, these
things can and need to be discussed in a measured and appropriate way.
 Let me also point out that there has to be a healthy balance between
the objective measures ("X number of blocker bugs still open, Y number
of tests failing") and the subjective measures ("It *feels* like it's
ready for an Alpha release"), and I think the current go/no-go meeting
does a fairly good job of finding that balance.

In short, the process isn't perfect and has room for improvement...
but following our process shouldn't be viewed as a failure either.

--
Jared Smith
Fedora Project Leader