<p>+1</p>
<p>I really like the proposal, don't really have anything to add. Makes sense to me. Kudos. </p>
<p>-AdamM (From Android)</p>
<p><blockquote type="cite">On Mar 26, 2010 5:51 PM, "Adam Williamson" <<a href="mailto:awilliam@redhat.com">awilliam@redhat.com</a>> wrote:<br><br>Hi, folks. At the last QA meeting, I volunteered (dumb of me!) to draft<br>
a policy for testing updates - basically, a policy for what kind of<br>
feedback should be posted in Bodhi for candidate updates.<br>
<br>
This turns out to be pretty hard. =) Thinking about it from an<br>
high-level perspective like this, I think it becomes pretty clear that<br>
the current system is just broken.<br>
<br>
The major problem is it attempts to balance things that don't really<br>
balance. It lets you say 'works for me' or 'doesn't work' and then sums<br>
the two and subtracts the second from the first to give you a 'rating'<br>
for the update.<br>
<br>
This doesn't really mean anything. As has been rehashed many times,<br>
there are situations where an update with a positive rating shouldn't go<br>
out, and situations where an update with a negative rating should. So<br>
the current system isn't really that great.<br>
<br>
I can't think of a way to draft a policy to guide the use of the current<br>
system in such a way that it will be really reliable. I think it'd be<br>
much more productive to revise the Bodhi feedback system alongside<br>
producing a policy.<br>
<br>
So, here's a summary of what the new system should aim for.<br>
<br>
At the high level, what is this system for? It's there for three<br>
purposes:<br>
<br>
1) to provide maintainers with information they can use in deciding<br>
whether to push updates.<br>
<br>
2) to provide a mechanism for mandating a certain minimum level of<br>
manual testing for 'important' packages, under Bill Nottingham's current<br>
update acceptance criteria proposal.<br>
<br>
3) to provide an 'audit trail' we can use to look back on how the<br>
release of a particular update was handled, in the case where there are<br>
problems.<br>
<br>
Given the above, we need to capture the following types of feedback, as<br>
far as I can tell. I don't think there is any sensible way to assign<br>
numeric values to any of this feedback. I think we have to trust people<br>
to make sensible decisions as long as it's provided, in accordance with<br>
any policy we decide to implement on what character updates should have.<br>
<br>
1. I have tried this update in my regular day-to-day use and seen no<br>
regressions.<br>
<br>
2. I have tried this update in my regular day-to-day use and seen a<br>
regression: bug #XXXXXX.<br>
<br>
3. (Where the update claims to fix bug #XXXXXX) I have tried this update<br>
and found that it does fix bug #XXXXXX.<br>
<br>
4. (Where the update claims to fix bug #XXXXXX) I have tried this update<br>
and found that it does not fix bug #XXXXXX.<br>
<br>
5. I have performed the following planned testing on the update: (link<br>
to test case / test plan) and it passes.<br>
<br>
6. I have performed the following planned testing on the update: (link<br>
to test case / test plan) and it fails: bug #XXXXXX.<br>
<br>
Testers should be able to file multiple types of feedback in one<br>
operation - for instance, 4+1 (the update didn't fix the bug it claimed<br>
to, but doesn't seem to cause any regressions either). Ideally, the<br>
input of feedback should be 'guided' with a freeform element, so there's<br>
a space to enter bug numbers, for instance.<br>
<br>
There is one type of feedback we don't really want or need to capture:<br>
"I have tried this update and it doesn't fix bug #XXXXXX", where the<br>
update doesn't claim to fix that bug. This is a quite common '-1' in the<br>
current system, and one we should eliminate.<br>
<br>
I think Bill's proposed policy can be modified quite easily to fit this.<br>
All it would need to say is that for 'important' updates to be accepted,<br>
they would need to have one 'type 1' feedback from a proven tester, and<br>
no 'type 2' feedback from anyone (or something along those lines; this<br>
isn't the main thrust of my post, please don't sidetrack it too<br>
much :>).<br>
<br>
The system could do a count of how many of each type of feedback any<br>
given update has received, but I don't think there's any way we can<br>
sensibly do some kind of mathematical operation on those numbers and<br>
have a 'rating' for the update. Such a system would always give odd /<br>
undesirable results in some cases, I think (just as the current one<br>
does). I believe the above system would be sufficiently clear that there<br>
would be no need for such a number, and we would be able to evaluate<br>
updates properly based just on the information listed.<br>
<br>
What are everyone's thoughts on this? Thanks!<br>
--<br>
Adam Williamson<br>
Fedora QA Community Monkey<br>
IRC: adamw | Fedora Talk: adamwill AT fedoraproject DOT org<br>
<a href="http://www.happyassassin.net" target="_blank">http://www.happyassassin.net</a><br>
<font color="#888888"><br>
--<br>
test mailing list<br>
<a href="mailto:test@lists.fedoraproject.org">test@lists.fedoraproject.org</a><br>
To unsubscribe:<br>
<a href="https://admin.fedoraproject.org/mailman/listinfo/test" target="_blank">https://admin.fedoraproject.org/mailman/listinfo/test</a><br>
</font></blockquote></p>