More on storage validation strategy (was: Re: Criterion revision proposal: KDE default applications)

Adam Williamson awilliam at redhat.com
Tue Dec 17 20:49:45 UTC 2013


On Tue, 2013-12-17 at 09:45 -0700, Chris Murphy wrote:
> On Dec 14, 2013, at 12:25 PM, Adam Williamson <awilliam at redhat.com> wrote:
> > 
> > I really would like to see other people's proposals in this area. I'm
> > not at all convinced I'm going to be the person who comes up with the
> > best idea. I'd love to know what cmurf would suggest as an overall
> > approach to designing a set of Final release criteria or a storage
> > validation test plan, for instance.
> 
> What do you think of moving any blocking storage related criteria and
> tests, from final to beta or even alpha? Why not move as much
> potential for blockers to alpha and beta releases as possible?
> 
> An example of this is moving resize test and criterion to beta (or
> split between alpha and beta if that's sensible and helpful). If
> resize were busted, do we really only want to find out and start
> dealing with it, and maybe slipping on it, during final? Seems risky,
> especially if a fix depends on upstream developers. Or public beta
> eats OS X or Windows for lunch.

Personally, no, I don't think that's correct. In fact we just got done
doing the opposite change (after F18, we weakened the Alpha storage
criteria quite a lot).

My take is this: the criteria define what needs to be working for us to
release. They do not define what we ought to be testing at each
milestone.

We ought to be testing everything at each milestone, ideally. If that's
not possible we need to test as much as we can. I'm not a huge fan of
the Release Level column in the matrices, because it kinda gives the
wrong impression. Even if a given test would only block Final release if
it failed, not Alpha or Beta, *that doesn't mean we should only run it
at Final*. We should run it at Alpha and Beta too, so we know if it's
broken.

I'm as guilty of taking the shortcut and punting on doing early tests as
anyone. We're all human. But conceptually, the questions of 'what needs
testing when?' and 'what do we block the release on?' are separate
questions, and it is incorrect to make modifications to the release
criteria simply to 'force' us to test stuff earlier. The correct
question to be answering when deciding "what should be in the Alpha
release criteria?" is, simply, "what needs to be working for us to ship
a product labelled Alpha?" It's that simple.

> Since alpha and beta blocking criteria are still in effect post-beta,
> there will still be storage related blocking bugs after beta release.
> But there wouldn't be new blockers based on additional criteria.

Just to reiterate the above, *that shouldn't be happening now anyway*
(except when the code actually breaks between Beta and Final, which does
happen). We should be doing the testing by Beta and identifying all the
Final blockers that exist by Beta. This is the goal I'm trying to work
towards with all the revisions I'm proposing and thinking about, insofar
as that's plausible with fedora.next hanging over us. We should be able
to run our entire set of validation tests at Alpha and Beta, we should
not be staging the test workload.

>  Rather than increasing the quality of beta, the main idea is to
> increase the predictability of final and reduce risk of regressions
> and final release slip. 

I think this is a great goal indeed.

> I think guided partitioning should be fairly rock solid, and even
> though it's the "simple" path, it's still a beast of a matrix.

Yup. That's definitely one of the problems.

>  I mentioned this in a different thread, but I think either LVM or LVM
> Thin Provisioning needs to be demoted. We don't need two LVM options
> in Guided. And if we can't get buy off on that, then we'll have to
> just eat that extra testing, because I think Guided shouldn't get
> people into trouble.

I broadly agree with this. If we were designing to the test cases, I'd
say we should just throw that damn dropdown out entirely. You want
something other than our default filesystem (whatever it is), you go to
custom partitioning. But the best design for testers is not necessary
the best design for users :) It's possibly worth considering, though.

As I mentioned, oldUI only let you pick LVM or ext4. newUI added btrfs,
with the 'everything's going btrfs!' plan in mind, I think. And F20
added LVM thinp. So now we have 2x what oldUI had. (And actually, I
think the 'don't use LVM' checkbox was only added to oldUI in like F15
or something).

We did have a kind of clear policy with oldUI, which is that we tested
its version of 'guided' partitioning quite extensively, and custom
partitioning got a lot less in terms of testing and criteria guarantees.
We've been pushing that boat out since F18; maybe we need to pull it
back in again. newUI guided does, I think, provide just about all the
same possibilities oldUI non-custom did, with a slightly different
approach. I think, whatever the details we come up with, the broad
direction "emphasize testing of guided, de-emphasize testing of custom"
- along with the improvements to the guided UI that Mo is currently
working on - looks productive for the future.

So, I'm thinking...just looking at my rough draft matrix...perhaps we
need something like it, or even bigger, but with a policy attached in
some way (I don't necessarily mean a literal policy document, I'm
talking in conceptual terms). The policy would be something like testing
priority goes to guided, always, and we should aim to do complete
testing of guided at each milestone, including Alpha. For custom
partitioning, I think we should look to define some kind of reasonable -
*small* - subset of its functionality that we can 'stand behind'. We
would write criteria for that, and a test plan for it, and like the
guided partitioning, aim to cover it at every milestone, with #2
priority after the guided testing. The kind of stuff that would fall in
this bucket would be the custom partitioning 'greatest hits' - things
that are reasonably common that you can't do via non-custom. 'Ghetto
upgrade of a previous default Fedora install' is an obvious one.
'Install to this fairly sane layout I pre-created' is another. We'd have
to try and think through others to go in the set, maybe together with
the anaconda folks.

Then we could have 'bonus point' matrices for much more custom
partitioning functionality, and try to do as much of that testing as we
can, and ask anaconda to treat major bugs we discover as a high
priority. But we don't block on them, and this testing always comes #3
after the guided and 'blocking custom' paths. As long as this overall
plan is sufficiently clear so they don't scare people off, we can make
the 'bonus point' matrices as huge as we like.

What do you think of that as an approach?

> Custom partitioning needs to be triaged for certain use cases we
> really want to work, and make those blocking if they fail. It may not
> be the same list for i386, x86_64/EFI, and ARM. e.g. we supposedly
> block on raid5 for x86_64, but does that make sense for ARM? Other
> combinations, even if there's a crash, would be non-blocking bugs, and
> only the subjective FE determination applies.

Right.

> Obviously the data corruption proscription is still in place, so
> crashes that lead to mangled partition tables or previously working
> file systems, presumably would block. However, I wonder if that
> criterion should be split in two: clearly not OK block worthy cases
> probably ought to be an alpha or beta blocker at the latest; and those
> that are suitable for FE or merely being documented could be permitted
> post-beta since they're unlikely to block.

The wording that they must be 'fixed or documented' already has this
effect - it gives us the flexibility to make that call on an
issue-by-issue basis.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net



More information about the test mailing list