Kevin Fenzi wrote:
For a few years I was keeping track of updates that caused big
problems:
https://fedoraproject.org/wiki/Updates_Lessons
That is also very anecdotal evidence though. And with only one exception
(the broken dependency in celt), the updates in the above list above are
all:
* either updates that have made it through to stable despite being broken,
in all cases since March 2010 (i.e., after the dnssec-conf one that was the
trigger for the main Bodhi crackdown) despite (or even because of) the new
update policies, (In other words, the strict rules have NOT prevented the
breakage in stable. At least in one case (the 2014-03 one), you have
explicitly documented autokarma as the cause for the breakage:
"User:Adamwill noticed several cases where updates had been submitted to
stable despite a valid AutoQA test failure, usually not via a manual push
but via karma automatism.")
* or critical updates that have not made it through to stable in a timely
manner, usually because of the karma requirements. (In other words, those
issues were directly CAUSED by the strict policies!)
And this list is by no means complete. I remember way more bad updates and
delayed updates (sometimes even together: bad update gets through instantly
through autokarma, the regression fix is stuck in testing for days, because
the testers just did not care about and/or were unable to test the regressed
use case), too many to write down a list like this (and indeed, I have no
such list). Your list has only the worst failures of the bureaucratic update
policy.
The orig "very bad" update was a dbus update that broke
everything.
That was one of the issues that had triggered a perceived demand for
stricter rules, but the Bodhi crackdown was actually implemented only
starting from March 2010, after the dnssec-conf update (which was pushed
directly to stable, as was still allowed in February 2010).
The D-Bus one actually affected a large range of users, but it had a trivial
workaround (update from the CLI, i.e.: su -c "yum update" – yum because DNF
was not a thing at the time). The dnssec-conf one affected only a very small
percentage of Fedora users, because most users do not run their own DNS
server software. (Not even most domain owners, because name servers are
typically provided by the registrar.) None was as catastrophic a failure as
the mob for stricter rules had painted them.
In any case I have seen our current updates system working and
blocking
tons of harmfull updates over the years.
But you do not have a list proving that. (The list you linked to is not it,
because I see only exactly one entry in it where a bad update did not make
it beyond testing, the 2010-07-02 celt one.) And even if you had one, it
would not prove that the maintainers would not have used updates-testing the
intended way without being forced to by software.
What I have seen (and no, I do not have a list either) is updates making it
through (due to karma or because the 7 or 14 days just ran out without
anybody bothering to test them) to stable causing a bad regression, and then
the regression fix needlessly sitting in testing for up to two weeks,
instead of being pushed directly to stable to undo the breakage. Not only
does that delay the fix for those users unlucky enough to get the bad
update, but it also largely increases the number of affected users, because
many users do not update daily and might not have noticed the regression if
the fix had been pushed the next day rather than 1-2 weeks later (window of
exposure). In some cases, I was the maintainer that was unable to push the
regression fix out any sooner due to the rules, so I definitely know that
the rules are to blame. (And before the rules were enforced, I had always
used direct stable pushes to fix regressions, leading to happy users.)
Due to the window of exposure effect, 7 regressions slipping through, each
fixed the next day with a direct stable push, are not necessarily worse than
1 regression slipping through (the other 6 having been prevented by the
rules), fixed only 7 days later due to the update rules. And in practice,
the testing is far from catching 6 bad updates out of 7. A lot slips
through, because testers are unable to test, e.g., library packages
exhaustively.
I think direct to stable is a bad idea.
I think it is a good idea, under some conditions (critical regression or
security fix, entirely new package, or package that was previously
uninstallable or unusable). When stable is actually open for pushes.
Of course it does not work during a freeze, which (together with the fact
that karma thresholds can be reached even before the updates reach testing,
leading to a special case in which direct stable pushes are actually still
possible after all) is the issue that started this thread. But that should
be solved the way I already mentioned (automatically divert the push to
testing instead, but keep the update queued for stable so it goes out as a
0-day update as soon as the freeze lifts).
Kevin Kofler