Hey folks! Just wanted to give a heads-up about this ticket:
https://pagure.io/fedora-comps/pull-request/777
that would define a critical path for Server. That would mean updates containing any of those packages or their dependencies would be 'critical path' updates, meaning they have higher requirements to be pushed stable (+2 karma or 14 days waiting, as opposed to +1 karma or 7 days waiting for regular updates), and openQA would test them automatically and they would be blocked from stable if the tests fail.
See this devel@ discussion on expanding the critpath definition for background: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
Please chime in if you have any thoughts/concerns/suggestions. Thanks!
Hello Adam,
Adam Williamson [2022-10-18 10:37 -0700]:
that would define a critical path for Server. That would mean updates containing any of those packages or their dependencies would be 'critical path' updates, meaning they have higher requirements to be pushed stable (+2 karma or 14 days waiting, as opposed to +1 karma or 7 days waiting for regular updates)
What is the reason for increasing the waiting period, what would it achieve?
- It sets the wrong motivation, as it indiscriminately penalizes all of these updates. You can not land faster (or at all) by adding proper CI, and instead can blame "the community" for not having spotted a regression once the update gets in eventually.
- It will easily lead to packages not being updated at all any more. We release every 14 days. Updates in the current stable are usually getting karma feedback rather easily, but we also support the previous stable, or the pending fork (like Fedora 37 a few weeks ago), and they usually don't get *any* karma. We can work around that by giving karma ourself from our team after gating tests get green [1], but that is just fighting the process and gives zero new information.
- It is a step backwards. CI helps to increase velocity of the distribution, this step decreases it. This is walking away from CI and towards more human interaction.
- It tries to address the problem at the wrong place. What we are sorely missing in Fedora is reverse dependency testing, i.e. making sure that a package update does not break *other* packages. New selinux-policy, systemd, or kernel versions break the distro all the time, but they wouldn't be covered by the "critical path". Human manual testing of e.g. freeipa or cockpit *could* spot this, but (1) it takes much more work than just running the automated tests that we already have, and (2) it barks at the wrong tree.
As a workaround, we run Cockpit's integration tests (which are kinda sorta "Fedora/RHEL server QA") against fedora-37 + updates-testing twice a week, and occasionally we are able to -1 a package update in bodhi that breaks things. But 7 days are more than enough for that. Our problem has been that at least in the past, e.g. selinux-policy updates got like +5 "WFM" karma and went into -updates in *hours*, way too fast to catch it with that method. A human "WFM" is simply not good enough for these low-level package updates, and it also doesn't address the "test consumers of that package" problem.
So in summary, I don't believe this is helpful at all, sorry.
and openQA would test them automatically and they would be blocked from stable if the tests fail.
That is a good move, of course. Any failing gating test should block updates (for every package), otherwise we'll never fix them. (Developers can waive, of course, but without that you wouldn't even know)
Martin
[1] And I'd automate that, hello https://github.com/osbuild/fedora-bot
On Wed, 2022-10-19 at 07:13 +0200, Martin Pitt wrote:
Hello Adam,
Adam Williamson [2022-10-18 10:37 -0700]:
that would define a critical path for Server. That would mean updates containing any of those packages or their dependencies would be 'critical path' updates, meaning they have higher requirements to be pushed stable (+2 karma or 14 days waiting, as opposed to +1 karma or 7 days waiting for regular updates)
What is the reason for increasing the waiting period, what would it achieve?
It's essentially a byproduct. Please see the devel@ discussion I linked. We do not currently have any other good way to gate updates on the openQA tests. We can invent one but it's substantially more work than just adding things to the critpath. I'm 75% done with the "do it by adding things to critpath" approach because nobody objected to that idea on devel@ , so it would've been great to have this feedback in *that* discussion :(
The issues you raise are all reasonable, but kind of not something new: we've never had an official way to override the critical path requirements if we think a package has good enough CI, so if you want to call that a problem, it's been a problem forever.
We can leave cockpit out of the definition if you like, but that would mean it won't be gated by the distribution-wide gating policy. As I mentioned in the devel@ discussion, an alternative is to configure gating in the package repos - see https://gating-greenwave.readthedocs.io/en/latest/policies.html#remoterule-c... - but this requires the package maintainer(s) to maintain the gating policy. It also has the disadvantage that we don't resolve dependencies like we do for the critpath. If we put cockpit in the critpath definition, then cockpit *and all its dependencies* are added to the critpath, and thus will have the tests run and be gated on them. So if some dependency of cockpit breaks cockpit, we'll find out, and the update will be gated. If we just have cockpit in the openQA scheduler allowlist and add a gating.yaml to the cockpit repo, we do not find out if some dependency of cockpit breaks cockpit before it happens (unless that dep happens to be in the critical path some other way).
CCing devel@ since this brings up points wider than the Server context. The rest of Martin's mail is below for reference.
It sets the wrong motivation, as it indiscriminately penalizes all of these updates. You can not land faster (or at all) by adding proper CI, and instead can blame "the community" for not having spotted a regression once the update gets in eventually.
It will easily lead to packages not being updated at all any more. We release every 14 days. Updates in the current stable are usually getting karma feedback rather easily, but we also support the previous stable, or the pending fork (like Fedora 37 a few weeks ago), and they usually don't get *any* karma. We can work around that by giving karma ourself from our team after gating tests get green [1], but that is just fighting the process and gives zero new information.
It is a step backwards. CI helps to increase velocity of the distribution, this step decreases it. This is walking away from CI and towards more human interaction.
It tries to address the problem at the wrong place. What we are sorely missing in Fedora is reverse dependency testing, i.e. making sure that a package update does not break *other* packages. New selinux-policy, systemd, or kernel versions break the distro all the time, but they wouldn't be covered by the "critical path". Human manual testing of e.g. freeipa or cockpit *could* spot this, but (1) it takes much more work than just running the automated tests that we already have, and (2) it barks at the wrong tree.
As a workaround, we run Cockpit's integration tests (which are kinda sorta "Fedora/RHEL server QA") against fedora-37 + updates-testing twice a week, and occasionally we are able to -1 a package update in bodhi that breaks things. But 7 days are more than enough for that. Our problem has been that at least in the past, e.g. selinux-policy updates got like +5 "WFM" karma and went into -updates in *hours*, way too fast to catch it with that method. A human "WFM" is simply not good enough for these low-level package updates, and it also doesn't address the "test consumers of that package" problem.
So in summary, I don't believe this is helpful at all, sorry.
and openQA would test them automatically and they would be blocked from stable if the tests fail.
That is a good move, of course. Any failing gating test should block updates (for every package), otherwise we'll never fix them. (Developers can waive, of course, but without that you wouldn't even know)
Martin
[1] And I'd automate that, hello https://github.com/osbuild/fedora-bot _______________________________________________ server mailing list -- server@lists.fedoraproject.org To unsubscribe send an email to server-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/server@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Hello Adam,
Adam Williamson [2022-10-19 8:31 -0700]:
On Wed, 2022-10-19 at 07:13 +0200, Martin Pitt wrote:
Hello Adam,
Adam Williamson [2022-10-18 10:37 -0700]:
that would define a critical path for Server. That would mean updates containing any of those packages or their dependencies would be 'critical path' updates, meaning they have higher requirements to be pushed stable (+2 karma or 14 days waiting, as opposed to +1 karma or 7 days waiting for regular updates)
What is the reason for increasing the waiting period, what would it achieve?
It's essentially a byproduct. Please see the devel@ discussion I linked. We do not currently have any other good way to gate updates on the openQA tests. We can invent one but it's substantially more work than just adding things to the critpath.
The devel@ thread does not mention/justify the inceased waiting period at all. So it's just intrinsic to the "critical path" definition? Because all of its other properties totally make sense with any other explicit wait time.
I'm 75% done with the "do it by adding things to critpath" approach because nobody objected to that idea on devel@ , so it would've been great to have this feedback in *that* discussion :(
Sorry, I'm not on devel@ (email overflow, plus I was on PTO when that happened).
The issues you raise are all reasonable, but kind of not something new: we've never had an official way to override the critical path requirements if we think a package has good enough CI, so if you want to call that a problem, it's been a problem forever.
Yes, it is. It seems utterly hard to get Fedora to do a "tests gate by default" policy, as pretty much everything is opt-in. Opt-in gating is essentially a complete waste of time IMO and just makes everyone involved grumpy.
We can leave cockpit out of the definition if you like, but that would mean it won't be gated by the distribution-wide gating policy. As I mentioned in the devel@ discussion, an alternative is to configure gating in the package repos - see https://gating-greenwave.readthedocs.io/en/latest/policies.html#remoterule-c...
- but this requires the package maintainer(s) to maintain the gating
policy.
Fully agreed, let's not do this. The current form of gating.yml is ... not useful and just a pain.
It also has the disadvantage that we don't resolve dependencies like we do for the critpath. If we put cockpit in the critpath definition, then cockpit *and all its dependencies* are added to the critpath, and thus will have the tests run and be gated on them. So if some dependency of cockpit breaks cockpit, we'll find out, and the update will be gated.
Ah, that is a good thing! The thread only loosely mentioned this:
Ideally, we should have reverse dependency updates to trigger FreeIPA tests and give their results a visibility but don't block on failures.
(FWIW, that is not my definition of "ideally"). If this is the case, I'm in. We can still do the "increase Karma" workaround, if the waiting period is not the primary reason why you want to do this, but just a byproduct of using critical-path. Then that would feel much less like an abuse.
CCing devel@ since this brings up points wider than the Server context. The rest of Martin's mail is below for reference.
(FTR, respecting Reply-To/avoiding cross-posting, so I dropped -devel@)
Thanks!
Martin
On Thu, 2022-10-20 at 07:35 +0200, Martin Pitt wrote:
Hello Adam,
Adam Williamson [2022-10-19 8:31 -0700]:
On Wed, 2022-10-19 at 07:13 +0200, Martin Pitt wrote:
Hello Adam,
Adam Williamson [2022-10-18 10:37 -0700]:
that would define a critical path for Server. That would mean updates containing any of those packages or their dependencies would be 'critical path' updates, meaning they have higher requirements to be pushed stable (+2 karma or 14 days waiting, as opposed to +1 karma or 7 days waiting for regular updates)
What is the reason for increasing the waiting period, what would it achieve?
It's essentially a byproduct. Please see the devel@ discussion I linked. We do not currently have any other good way to gate updates on the openQA tests. We can invent one but it's substantially more work than just adding things to the critpath.
The devel@ thread does not mention/justify the inceased waiting period at all. So it's just intrinsic to the "critical path" definition? Because all of its other properties totally make sense with any other explicit wait time.
Ah, yes, sorry if that wasn't clear. That is essentially the entire original practical purpose of critpath: packages in the critpath are subject to increased requirements. Exactly what the requirements for critpath and non-critpath packages *are* has changed several times over the years, but as of now, AIUI, non-critpath updates can be pushed stable if they have +1 karma or have been in updates-testing for a week. critpath updates can be pushed stable if they have +2 karma or have been in updates-testing for two weeks.
Note, if you set e.g. +3 autopush on your update, that means it will be *automatically* pushed when it reaches +3 with no -1s, but you can still *manually* push it as long as it reaches the requirements above. The minimum requirements act as floors for autopush; you can't set an autopush of 1 for a critpath update. Well, you shouldn't be able to. I think we've had bugs with that before...
I came along later and kinda piggybacked on the critical path as the set of packages to run openQA updates tests on, simply because I needed *some* way to reduce the set from "all updates" and it was handy and seemed kind appropriate.
I'm 75% done with the "do it by adding things to critpath" approach because nobody objected to that idea on devel@ , so it would've been great to have this feedback in *that* discussion :(
Sorry, I'm not on devel@ (email overflow, plus I was on PTO when that happened).
The issues you raise are all reasonable, but kind of not something new: we've never had an official way to override the critical path requirements if we think a package has good enough CI, so if you want to call that a problem, it's been a problem forever.
Yes, it is. It seems utterly hard to get Fedora to do a "tests gate by default" policy, as pretty much everything is opt-in. Opt-in gating is essentially a complete waste of time IMO and just makes everyone involved grumpy.
Note, gating acts in a kind of "required, but not sufficient" way regarding stable push. An update passing gating doesn't mean it can be pushed stable: it has to pass gating *and* meet the karma/wait time requirement.
Aside from that: so, as for having a "gate on all tests" kind of thing...until fairly recently we couldn't really do that. The tricky bit is knowing what "all tests" are for any given Thing. But since: https://pagure.io/fedora-infrastructure/issue/8989#comment-802213 happened, I guess we could - theoretically, at least - do that. We would need to make sure both Fedora CI and openQA are submitting queued "results" for every test they schedule (openQA does not do this currently, partly because I'd forgotten about this whole thing until reading this mail). Then I guess it would be possible to somehow teach greenwave about this; we'd have to invent a new kind of "all tests" requirement type for greenwave and then teach greenwave that when it sees that requirement type, it needs to query resultsdb for "queued" results to figure out what tests are actually scheduled for the thing in question. There's still a kinda small race condition there - gating status would I guess be "go" for a small window after the Thing was created but before the first test system managed to create a "queued" result. I guess this requirement type could assume there must be at least *one* test run on the Thing to avoid that issue? Anyway, yeah, it's an engineering challenge that I guess just nobody really got around to yet.
It does seem like a tricky thing in a way too, because the whole gating system is designed to be fairly open: we have two main test systems ATM (openQA and Fedora CI), but in theory anyone else can set one up and submit test results that would be considered by greenwave; all it needs is credentials to submit results to resultsdb. And we could add new tests that run on many things ("generic" tests) to Fedora CI or openQA at any time, really. So to use this you'd have to have a high degree of trust that all the tests added by anyone running a 'production' test system for Fedora would be sensible to gate your update on. Or be prepared to issue a lot of waivers, I guess.
We can leave cockpit out of the definition if you like, but that would mean it won't be gated by the distribution-wide gating policy. As I mentioned in the devel@ discussion, an alternative is to configure gating in the package repos - see https://gating-greenwave.readthedocs.io/en/latest/policies.html#remoterule-c...
- but this requires the package maintainer(s) to maintain the gating
policy.
Fully agreed, let's not do this. The current form of gating.yml is ... not useful and just a pain.
It also has the disadvantage that we don't resolve dependencies like we do for the critpath. If we put cockpit in the critpath definition, then cockpit *and all its dependencies* are added to the critpath, and thus will have the tests run and be gated on them. So if some dependency of cockpit breaks cockpit, we'll find out, and the update will be gated.
Ah, that is a good thing! The thread only loosely mentioned this:
Ideally, we should have reverse dependency updates to trigger FreeIPA tests and give their results a visibility but don't block on failures.
(FWIW, that is not my definition of "ideally"). If this is the case, I'm in. We can still do the "increase Karma" workaround, if the waiting period is not the primary reason why you want to do this, but just a byproduct of using critical-path. Then that would feel much less like an abuse.
Yeah, my point here is definitely not to apply the increased requirements, that's just a byproduct.
On the idea of having test systems submit feedback: I thought about this, and the more I think about it, the less it seems like a hack and the more it seems, well, appropriate. Why isn't it just...the right thing to do? openQA exercises cockpit pretty extensively. Your tests also do. Assuming your CI system does actually test the Fedora update in a correct Fedora context, like openQA does...it seems totally valid to me that both our test systems should submit a +1 to Bodhi if the tests pass. Why not? They're testing the software like a human would (probably better). If I was doing manual testing I might just open up Cockpit, click around a bit, and say yeah, that looks good and give it a +1; if that's what we're expecting of humans, what's wrong with having an automated test system that does things rather more thoroughly give +1s?
So, if we do that...your system is 1, my system is 1, that makes 2, which is...all the feedback the updates need to be pushed stable. ;)
server@lists.fedoraproject.org