Criterion revision proposal: KDE default applications

Sat Dec 14 19:25:40 UTC 2013

On Sat, 2013-12-14 at 13:47 -0500, Richard Ryniker wrote:
> The defining characteristic of the Live images is that they run
> without installation on a user's disks.  Beyond that, they have the
> capability to install a minimal Fedora system to a local disk.  Once
> the size limit for a live image is increased beyond the capacity of a
> CD (or other common media), there seems no reason to limit the live
> image size to 1 GB, or any arbitrary value.

There's no reason to limit it to any arbitrary value, but none of the
values we've used or discussed is arbitrary. 1GB is the size of a 1GB
USB stick, 2GB is the size of a 2GB USB stick. 3GB for example would
make no sense as no-one makes media in that size. Obvious 'sensible'
target sizes are 650MB CD, 700MB CD, 1GB USB, 2GB USB, 4GB USB, 4GiB
(because of FAT32 filesystem issues...), single-layer DVD, 8GB USB,
dual-layer DVD, 16GB USB.

> Rather than struggle to find what can be be discarded from a live
> image in order to achieve a particular size, why not build the DVD
> product as a live system, or expand the existing live recipe to
> include more of the frequently used packages and not build the DVD?
> The DVD installer program has much more flexibility, but this is due
> to that arbitrary size limit, I expect: there just is not space for
> the full Fedora installation program (plus local repository) in
> today's live images.

No, it isn't, really. The live installer can only install the package
set you actually get when booting live, because all it does is dump
itself to the hard disk; it doesn't actually contain or install any
packages per se. So I don't see how we can do the DVD-as-live-image
thing in any practical way; the point of the DVD is that you can install
_different_ environments from it, but there's no way you can do that
from a live image.

It's also worth noting that the installer team doesn't like installing
from a live environment. It involves rather a lot of messy hacks and can
cause breakage; the major problem they have is they cannot possibly
trust that the system is in a 'clean' state when the installer
initializes, which they can control from the 'traditional' installer
mode.

One thing they've floated as an idea is to have a separate 'installation
environment' you could boot into from the live images - so you could
either boot into 'try it out' live mode, or 'install it' installer mode,
but not run the installer from within the live mode.

> A sizable group of users may have very limited hardware resources -
> no network, only a CD drive.  This group would be a reasonable target
> for a "Limited Resources" spin that seeks to tailor Fedora to such an
> environment.  For example, these systems may have too little memory to
> support anaconda (and many of the applications in Fedora's default
> configuration).  Maybe a new SIG for this target.  (Perhaps it already
> exists and I, with richer hardware resources, never looked for it.)

I don't believe it does, no. It seems like an interesting idea, so if
you're interested in making it happen, you could certainly go ahead and
make a proposal to the relevant authorities...

>   Modifications to limit and
> make more specific the actual test coverage can help Fedora users
> better understand what they have, and reflect the reality of QA
> resource limits.

I think we've always had this as a goal, but we've just winded up
drifting slightly over the last few releases to the point where we're
really struggling to keep up with the workload.

> To this end, it would be good to have better distinction between what
> QA tests and what other groups - SIG, upstream, packager, spin
> creator, architecture group, etc. - are responsible to test.  QA test
> criteria is one place to document this, at least the QA view of it.

This is something viking-ice is talking about as well, and I certainly
agree with you guys that there's some value to it. The only problem I
have is that I don't really see a lot of evidence that many other groups
test their stuff at all in any particularly organized way, though I may
well be missing something. I'd have a lot of time for a vision where we
place some responsibility on the desktop SIG for desktop testing, on the
KDE SIG for KDE testing and so on, but like I keep saying for the
specific proposal here, we have to make sure it actually *happens*. I'm
not a fan of QA just throwing our hands up and unilaterally saying
"okay, we're just going to test X and trust that someone else cares
about making sure Y and Z work", at least not until we've made some
good-faith effort to make things better in a collaborative way.

At present it feels a lot like people only test things when we (QA) draw
up the test plans and then go out and make a fairly strenuous effort to
plead with others to test them. I mean, if we just fire a TC/RC and
stick the desktop matrices up, it's only intermittently that I've seen
desktop SIG folks show up and run through the validation testing. If I
or someone else posts to their mailing list with a specific plea for
testing or goes into IRC and bugs them about it, then we get a higher
success rate, but then again QA is taking on the responsibility for the
testing, essentially...

> Adam, your recent work on storage tests reads like a Unit Test Plan
> for anaconda: something that tries to exercise all the code paths of
> the program. 

Ha, oh god, not even close. That matrix would probably cause mediawiki
to explode, and would also I think have to exist in at least four and
possibly five dimensions. I mean, for a trivial example you can take the
matrix I posted and just double the entire thing, because we should run
every single test combination for both msdos and GPT disk labels...that
kind of thing. There are whole 'variables' like that which I just punted
on because they made things too impractical.

The matrix I posted - aside from the 'sanity check' tests, which are a
kind of arbitrary grab bag - is actually one of the *smaller* versions
of the several I thought about and roughly drafted. If you want me to
pull out a Wild-Ass Guess I expect those tests would only cover maybe
5%, 10% of all the possible paths through storage. I was very much
thinking 'smallest practical set of tests to cover the things we
currently claim to be blocking on', not 'test everything'. 'Test
everything', when it comes to storage, is entirely impossible.

For the theoretically-minded, I'll note two design choices in storage
which make this problem exponentially harder: the 'refresh' button,
which re-runs the disk scan from inside the custom partitioning screen,
and the fact that you can run the Installation Destination spoke
multiple times and when you do so it does not start over from scratch,
but tries to take the choices you made the previous time through into
account. Each of these things on its own effectively makes the 'problem
space' for storage testing infinite, both together mean I just pretty
much gave up on even the possibility of testing those things and just
threw in two token tests - it's kind of like, for upgrading, we only
test upgrading a very clean installation of the previous release,
because it's just utterly impractical to try and cover anything more.

I should also make explicit that along with thinking about revising the
storage test case set, I'm thinking about the storage release criteria,
because obviously those things go hand in hand. I think it's very clear
at this point that we need to tighten down the Beta criteria in a few
places, and completely toss out the utterly unworkable Final criterion
we have - "The installer must be able to create and install to any
workable partition layout using any file system and/or container format
combination offered in a default installer configuration." - and replace
it with a specific and limited list of things we decide we actually care
about and can plausibly manage to test within the 'time and resources'
calculation I posted about before, in the same style as we have put in
place for Alpha and Beta. If 18, 19 and 20 have taught us anything it's
that we cannot possibly stand behind any interpretation of that Final
criterion, from a QA or development standpoint.

So, perhaps if my proposal is just too big for us to sensibly digest, an
alternative approach would be to look at the criteria revision first -
trying to keep in mind, while we're drafting the criteria, what the
validation matrix for that criteria set would look like, and what kind
of issues we'd be 'leaving out' as non-blockers - and try to nail down
some new criteria that allow us to draw up a new test plan that is
practical for us in resource terms.

I'll throw an item for this on the agenda for Monday's meeting - it'd be
great if folks who are interested can come along to discuss it.

>  Is this the proper function of Fedora QA?  If it is,
> what is the proper fraction of the anaconda development budget
> to be allocated to Fedora QA for this purpose?

Practically speaking, 0%. One of the other major problems we have in
this area is that anaconda is, I think, unarguably under-resourced. All
my observations about the sheer hugeness of the storage 'problem space'
obviously apply to developing it as well as testing it, yet for the last
year or two we have had precisely one developer working on it, dlehman.
This is utterly unsupportable. anaconda team now has a second full-time
storage developer, Anne Mulhern, but even that's not exactly an
abundance of riches.

As I mentioned in another mail, anaconda is actually working on
implementing CI and a unit test suite at present, which will be a great
thing when it's done.

> I use this as an example to support your observation that QA clearly
> does not have resources to test all it might want to test, and clearer
> definition is required for what QA will test and what others have to
> test.  Your storage test cases look like something anaconda developers
> should run before they let a new version loose.

Again I'd say in theory no, in practice the sets are probably quite
similar: I think the anaconda CI stuff, once it's actually done, will
wind up exercising much the same set of things I drew up, but the
starting point was quite different. I really did start out by trying to
think about what's the smallest set of things we really ought to try and
test per release. The problem is that every time I sit down and try and
think about that, even the _smallest_ set I can come up with is huge.

I really would like to see other people's proposals in this area. I'm
not at all convinced I'm going to be the person who comes up with the
best idea. I'd love to know what cmurf would suggest as an overall
approach to designing a set of Final release criteria or a storage
validation test plan, for instance.

> Throw away a package because it was not tested, failed a test, or
> missed a deadline is not a solution, however vexed one might be about
> what has not been done.

I don't think anyone's proposed 'throwing away' any packages - all of
the various proposals have instead been about the question of what
packages are installed *by default*, which is a very different thing.

>   It is natural that many changes will occur in
> Fedora's package roster.  I counted 39,500 rpm files in the F20
> repository.  Lots of changes, for many reasons, are normal, but it
> is not sensible to expect all these parts to work properly.  It is
> not even reasonable to expect the ten percent of these files on the
> DVD to all work properly.

We certainly don't. Our current expectation is much lower: we expect
that every package installed by default when you install KDE or GNOME
from the DVD or live, without changing the 'default' package set, should
run and pass a 'basic functionality test' (which is rather different
from 'working properly'). I'm suggesting that even that expectation is
unrealistic.

> Fedora QA, as a group, should probably take a pragmatic approach and
> focus on "critical path" and installation issues that affect large
> groups of users, and refer more detailed tests to others.  Do more,
> certainly, if resources permit.  And try to influence others to be
> more aware and effective in their test activities.

That already is our approach, honestly. We are nowhere even remotely
close to covering or attempting to cover 'everything'. I actually don't
think anyone would disagree with your suggestion above, but the problem
is that it's already what we're trying to do, and the problems we're
having are with defining and interpreting what is the minimal core of
'really essential' stuff that we should test...
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net