Update testing policy: how to use Bodhi

Mon Mar 29 17:51:29 UTC 2010

I think Adam's plan is sensible, and would be a significant improvement
over the current system.  If we know we like the general concept of
disaggregation, the next question is what this would look like in
practice.

On Fri, 2010-03-26 at 17:13 -0700, Jesse Keating wrote:
> So that's just separating aggregate positive karma from aggregate
> negative karma.  Basically saying that your update must have a net +3
> positive, without /any/ negative.
> 
> Unfortunately many people fail at understanding what "regression"
> actually means, and thus can "DOS" an update inappropriately, which is
> kind of why we've aggregated both negative and positive together.

I don't think Denial of Service is really a concern here.  It would be
far better for a maintainer to have to manually review feedback before
pushing an update, than for an update with a regression to go out.
Automation is most useful in boring cases where nothing unexpected has
happened, and I don't think it's a substitute for active attention from
package maintainers in more complex cases.

I think disaggregating the different types of feedback into more than
just positive and negative will make that feedback more useful,
especially if the HTML form is well designed.  I'm imagining something
like this:

>>
* What type of testing did you do?
  * Installed only
  * Everyday use
  * Attempted to reproduce bug #______
  * Ran test plan __[link]__

If you experienced any problems, please file a report in bugzilla and
use the resulting bug number(s) below.

* I encountered bug ______ in this release.
  * Did you have the same problem(s) in version # ______ [currently 
    released]? (yes/no/don't know)
* I could not reproduce bug ______ in this release.

Comments: _____

[Submit feedback button]
<<

Because the system knows which bugs the update is supposed to fix, and
taking into account the answers given, it can automatically determine
whether or not the tester is reporting a failure to fix as intended, or
a regression (possibly involving a new bug).  The system should also be
able to determine if people file reports that say "this update doesn't
fix this bug that it doesn't claim to fix".  The sentiment that others
have expressed so far is that this type of feedback should be
suppressed, but I think if the user interface attempts to prevent people
from doing that, they'll just try and shoehorn it in anyway.  I think
it's better to accept that feedback and segregate it so it can be
ignored.  (Or used for other purposes, like highlighting common bugs in
Bugzilla.)  The form I suggested above also strongly encourages people
to file newly discovered bugs; even if they aren't new in this
particular release, that's still very useful.

I agree it's not useful to distill all of this feedback into a single
number.  Maintainers can set their own preferences, but good default
logic seems to me to be:
* Don't auto-push unless at least one person (or more) says the bug
we're trying to fix is actually fixed.  (And only certain people count
for critical path packages.)
* Don't auto-push if anyone says the bug we're trying to fix isn't
fixed.
* Don't auto-push if anyone says there are new bugs in this release that
definitely weren't in the previous one.

Adam, your goal of being able to take multiple types of feedback in a
single "Submit" is nice from a usability perspective.  It would require
either rearranging or repeating the form suggested above, or making the
form dynamic.  An alternative that might simplify things along different
lines would be to simply require testers to "Submit" more than once,
making different choices on each run.

On Sat, 2010-03-27 at 03:50 +0100, Kevin Kofler wrote:
> > 2. I have tried this update in my regular day-to-day use and seen a
> > regression: bug #XXXXXX.
>
> The problem is that type 2 feedback can also be invalid, e.g.:
> * the regression may actually be caused by another update which is not part 
> of the update group being commented on,
> * the regression may be caused by an earlier update which is already stable, 
> the user just didn't notice it right away (so this is actually a "doesn't 
> fix a bug which the update doesn't claim it fixes" situation, not a 
> regression; delaying or blocking the update does nothing to fix the 
> regression which already slipped through to stable),
> * the "regression" may actually be an issue with:
> - yum or PackageKit (usually already fixed, but the user needs to update yum 
> or PackageKit first),
> - the user's setup (invalid repository configuration, attempts to mix 
> different versions of a multilib package etc.),
> - mirroring or some other networking issue completely unrelated to the 
> update,
> etc.

Some of these situations are addressed by asking, "does this also
occur in current release, number ___"?

Your point is accurate; soliciting feedback from a community testers
is going to produce some number of false alarms.  Ideally, when
negative feedback is received that appears to indicate a regression,
maintainers will stop and attempt to figure out what's going on!  If
the symptom simply indicates a problem in another area, that's a
useful discovery, and the corresponding bug report can get passed
on.

We could reduce the number of false alarms by restricting feedback to
people who are more likely to be more thorough and more skillful
(e.g. ProvenTesters), but that would also reduce the amount of testing
that gets done, and increase the number of problems that sneak through
unnoticed.

My suggestion would be to design a new form and try it out in practice
for a little while.  If there are still certain categories of feedback
that annoy maintainers (e.g. problems in yum), we can add a checklist
to the top of the form, educating testers on how to do some minimal
troubleshooting on their own in order to provide more useful
feedback.  But the more demands the feedback form makes on testers,
the fewer people will use it.

-B.