Re: Direct to stable updates

Monday, 14 November 2022

Kevin Fenzi wrote:
...
 For a few years I was keeping track of updates that caused big
problems:

 https://fedoraproject.org/wiki/Updates_Lessons 
That is also very anecdotal evidence though. And with only one exception 
(the broken dependency in celt), the updates in the above list above are 
all:
* either updates that have made it through to stable despite being broken, 
in all cases since March 2010 (i.e., after the dnssec-conf one that was the 
trigger for the main Bodhi crackdown) despite (or even because of) the new 
update policies, (In other words, the strict rules have NOT prevented the 
breakage in stable. At least in one case (the 2014-03 one), you have 
explicitly documented autokarma as the cause for the breakage: 
"User:Adamwill noticed several cases where updates had been submitted to 
stable despite a valid AutoQA test failure, usually not via a manual push 
but via karma automatism.")
* or critical updates that have not made it through to stable in a timely 
manner, usually because of the karma requirements. (In other words, those 
issues were directly CAUSED by the strict policies!)

And this list is by no means complete. I remember way more bad updates and 
delayed updates (sometimes even together: bad update gets through instantly 
through autokarma, the regression fix is stuck in testing for days, because 
the testers just did not care about and/or were unable to test the regressed 
use case), too many to write down a list like this (and indeed, I have no 
such list). Your list has only the worst failures of the bureaucratic update 
policy.

...
 The orig "very bad" update was a dbus update that broke
everything. 
That was one of the issues that had triggered a perceived demand for 
stricter rules, but the Bodhi crackdown was actually implemented only 
starting from March 2010, after the dnssec-conf update (which was pushed 
directly to stable, as was still allowed in February 2010).

The D-Bus one actually affected a large range of users, but it had a trivial 
workaround (update from the CLI, i.e.: su -c "yum update" – yum because DNF 
was not a thing at the time). The dnssec-conf one affected only a very small 
percentage of Fedora users, because most users do not run their own DNS 
server software. (Not even most domain owners, because name servers are 
typically provided by the registrar.) None was as catastrophic a failure as 
the mob for stricter rules had painted them.

...
 In any case I have seen our current updates system working and
blocking
 tons of harmfull updates over the years. 
But you do not have a list proving that. (The list you linked to is not it, 
because I see only exactly one entry in it where a bad update did not make 
it beyond testing, the 2010-07-02 celt one.) And even if you had one, it 
would not prove that the maintainers would not have used updates-testing the 
intended way without being forced to by software.

What I have seen (and no, I do not have a list either) is updates making it 
through (due to karma or because the 7 or 14 days just ran out without 
anybody bothering to test them) to stable causing a bad regression, and then 
the regression fix needlessly sitting in testing for up to two weeks, 
instead of being pushed directly to stable to undo the breakage. Not only 
does that delay the fix for those users unlucky enough to get the bad 
update, but it also largely increases the number of affected users, because 
many users do not update daily and might not have noticed the regression if 
the fix had been pushed the next day rather than 1-2 weeks later (window of 
exposure). In some cases, I was the maintainer that was unable to push the 
regression fix out any sooner due to the rules, so I definitely know that 
the rules are to blame. (And before the rules were enforced, I had always 
used direct stable pushes to fix regressions, leading to happy users.)

Due to the window of exposure effect, 7 regressions slipping through, each 
fixed the next day with a direct stable push, are not necessarily worse than 
1 regression slipping through (the other 6 having been prevented by the 
rules), fixed only 7 days later due to the update rules. And in practice, 
the testing is far from catching 6 bad updates out of 7. A lot slips 
through, because testers are unable to test, e.g., library packages 
exhaustively.

...
 I think direct to stable is a bad idea. 
I think it is a good idea, under some conditions (critical regression or 
security fix, entirely new package, or package that was previously 
uninstallable or unusable). When stable is actually open for pushes.

Of course it does not work during a freeze, which (together with the fact 
that karma thresholds can be reached even before the updates reach testing, 
leading to a special case in which direct stable pushes are actually still 
possible after all) is the issue that started this thread. But that should 
be solved the way I already mentioned (automatically divert the push to 
testing instead, but keep the update queued for stable so it goes out as a 
0-day update as soon as the freeze lifts).

        Kevin Kofler

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Direct to stable updates