Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59
Hi Miro,
On Mon, Feb 7, 2022 at 9:20 AM Miro Hrončok mhroncok@redhat.com wrote:
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
Actually, only Zuul should stay ...
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
This is actually not true, only for STI which we are trying to deprecate.
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
It is not a switch, Zuul was never able to run tmt tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
The currently SLA for ci is 5% error rate, I will reply soon how it looks for Fedora CI only and let you know.
The outage you were hit was:
https://pagure.io/fedora-ci/general/issue/315
What is:
https://gitlab.com/testing-farm/artemis/-/issues/185
I will let my team know to prioritize it.
As for the unreadable errors, that is something we can improve (and did already, but not for this error).
I will file an downstream issue ...
Best regards, /M
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-requ est/59
https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59
Miro Hrončok
Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Also there is this issue which is blocking us do any good job:
https://pagure.io/fedora-infrastructure/issue/10532
Blocking all Fedora Rawhide testing.
/M
On Mon, Feb 7, 2022 at 9:48 AM Miroslav Vadkerti mvadkert@redhat.com wrote:
Hi Miro,
On Mon, Feb 7, 2022 at 9:20 AM Miro Hrončok mhroncok@redhat.com wrote:
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
Actually, only Zuul should stay ...
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
This is actually not true, only for STI which we are trying to deprecate.
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
It is not a switch, Zuul was never able to run tmt tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
The currently SLA for ci is 5% error rate, I will reply soon how it looks for Fedora CI only and let you know.
The outage you were hit was:
https://pagure.io/fedora-ci/general/issue/315
What is:
https://gitlab.com/testing-farm/artemis/-/issues/185
I will let my team know to prioritize it.
As for the unreadable errors, that is something we can improve (and did already, but not for this error).
I will file an downstream issue ...
Best regards, /M
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-requ est/59
https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59
Miro Hrončok
Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
Correction, it is actually there for all Fedora releases :(
Somebody during the weekend removed all fedora nightlies from AWS
So we are blocked now on any testing ...
/M
On Mon, Feb 7, 2022 at 11:05 AM Miroslav Vadkerti mvadkert@redhat.com wrote:
Also there is this issue which is blocking us do any good job:
https://pagure.io/fedora-infrastructure/issue/10532
Blocking all Fedora Rawhide testing.
/M
On Mon, Feb 7, 2022 at 9:48 AM Miroslav Vadkerti mvadkert@redhat.com wrote:
Hi Miro,
On Mon, Feb 7, 2022 at 9:20 AM Miro Hrončok mhroncok@redhat.com wrote:
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
Actually, only Zuul should stay ...
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
This is actually not true, only for STI which we are trying to deprecate.
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
It is not a switch, Zuul was never able to run tmt tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
The currently SLA for ci is 5% error rate, I will reply soon how it looks for Fedora CI only and let you know.
The outage you were hit was:
https://pagure.io/fedora-ci/general/issue/315
What is:
https://gitlab.com/testing-farm/artemis/-/issues/185
I will let my team know to prioritize it.
As for the unreadable errors, that is something we can improve (and did already, but not for this error).
I will file an downstream issue ...
Best regards, /M
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-requ est/59
https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59
Miro Hrončok
Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
Hi everybody,
po 7. 2. 2022 o 9:20 Miro Hrončok mhroncok@redhat.com napísal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
I have a suggestion that I believe could solve more than this problem.
Every CI system can connect to a machine, run some commands and then retrieve results at the end.
The nice thing about Testing Farm is that it should be able to scale really well as it can provision resources (not only) in the cloud. Miro is right that if Zuul always delegates tmt testing to Testing Farm, then TF outage will affect both Fedora CI and Zuul.
What if, instead of delegating to Testing Farm, we simply request resources (a VM) from TF and use that VM as a standard node in CI? CI (Zuul/Jenkins/GitLab CI,...) would simply connect to a TF-provided node and run the "tmt/you-name-it" command there. That way, if Zuul cannot get a VM from Testing Farm for whatever reason, it could still fallback and run the tests on one of its own nodes (Zuul would know what tmt command to run, currently it doesn't).
This would also help with generic tests because CI would have better control over the input/output.
To be clear: I am not suggesting to drop the current "delegate" API, I am just suggesting to make it optional and allow CI systems to work with TF-provided resources directly, if it makes sense for the particular use case or integration.
WDYT?
Thanks, Michal
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 -- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Hi,
On Mon, Feb 7, 2022 at 5:19 PM Michal Srb msrb@redhat.com wrote:
Hi everybody,
po 7. 2. 2022 o 9:20 Miro Hrončok mhroncok@redhat.com napísal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
I have a suggestion that I believe could solve more than this problem.
Every CI system can connect to a machine, run some commands and then retrieve results at the end.
The nice thing about Testing Farm is that it should be able to scale really well as it can provision resources (not only) in the cloud. Miro is right that if Zuul always delegates tmt testing to Testing Farm, then TF outage will affect both Fedora CI and Zuul.
What if, instead of delegating to Testing Farm, we simply request resources (a VM) from TF and use that VM as a standard node in CI? CI (Zuul/Jenkins/GitLab CI,...) would simply connect to a TF-provided node and run the "tmt/you-name-it" command there. That way, if Zuul cannot get a VM from Testing Farm for whatever reason, it could still fallback and run the tests on one of its own nodes (Zuul would know what tmt command to run, currently it doesn't).
We do not want to maintain two same codebases to run the same tests, we want to same experience everywhere. Once we support two ways of running tmt tests, we have to maintain both.
For Fedora CI we currently have 4.42% error rate in the last 90 days (sorry the grafana is internal only):
https://i.imgur.com/W5UKqhW.png
What is meets our SLA (<5%). We plan to get it down to <1% this year.
This would also help with generic tests because CI would have better control over the input/output.
Let's rather make tmt output digestible by the user, let's make generic tests work better and without workarounds, we did before to quickly bootstrap the testing. Let's give the users the same experience for all environments we have.
To be clear: I am not suggesting to drop the current "delegate" API, I am just suggesting to make it optional and allow CI systems to work with TF-provided resources directly, if it makes sense for the particular use case or integration.
We do not want to provide resources as a service. I believe the only thing we are missing is visibility (logs are visible from Zuul) and some bugs (like timeouts python devs are hitting). Make it a less black box thing, then the situation will improve.
My team definitely does not have time to maintain another tmt job to run tests ...
WDYT?
Thanks, Michal
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 -- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
po 7. 2. 2022 o 20:22 Miroslav Vadkerti mvadkert@redhat.com napísal(a):
Hi,
On Mon, Feb 7, 2022 at 5:19 PM Michal Srb msrb@redhat.com wrote:
Hi everybody,
po 7. 2. 2022 o 9:20 Miro Hrončok mhroncok@redhat.com napísal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
I have a suggestion that I believe could solve more than this problem.
Every CI system can connect to a machine, run some commands and then retrieve results at the end.
The nice thing about Testing Farm is that it should be able to scale really well as it can provision resources (not only) in the cloud. Miro is right that if Zuul always delegates tmt testing to Testing Farm, then TF outage will affect both Fedora CI and Zuul.
What if, instead of delegating to Testing Farm, we simply request resources (a VM) from TF and use that VM as a standard node in CI? CI (Zuul/Jenkins/GitLab CI,...) would simply connect to a TF-provided node and run the "tmt/you-name-it" command there. That way, if Zuul cannot get a VM from Testing Farm for whatever reason, it could still fallback and run the tests on one of its own nodes (Zuul would know what tmt command to run, currently it doesn't).
We do not want to maintain two same codebases to run the same tests, we want to same experience everywhere. Once we support two ways of running tmt tests, we have to maintain both.
node ("tf-provided-provisioned-vm") { tmt run ... }
Well, executing "tmt run" on a provisioned node is not a problem for any CI system. A small plus would be that the CI would see the progress of the run. I am not sure what you mean by "same experience", but if it is the final static html page with results, then this feature should be provided by tmt itself (IMO).
Just to be clear, I am not arguing that functional (rpm-tmt-test) tests should be bypassing the "delegate" API. Those tests don't need any additional context and it's always as simple as "run my tmt tests, give me tmt log, stdout and workdir and I will be happy (and that reproducer is also cool)". This is what Testing Farm seems to be aiming for.
However, should we tmt-ize "rpm linter" and "compose test" and send it to Testing Farm to get the same experience everywhere? Well, I am yet to meet a single person who would ask about running rpminspect via tmt (or annocheck via rpminspect and all that via tmt). This is where I still see many question marks.
For Fedora CI we currently have 4.42% error rate in the last 90 days (sorry the grafana is internal only):
https://i.imgur.com/W5UKqhW.pngWhat is meets our SLA (<5%). We plan to get it down to <1% this year.
This would also help with generic tests because CI would have better control over the input/output.
Let's rather make tmt output digestible by the user, let's make generic tests work better and without workarounds, we did before to quickly bootstrap the testing. Let's give the users the same experience for all environments we have.
When we want to run the "reverse dependency" test, we cannot just run all those individual functional tests and dump the results on unsuspected users. Additional context is needed. Like who owns those tests (could be in tmt metadata), or whether the test was broken before so the result shouldn't be taken too seriously. This extra context is not known to Testing Farm (it just runs tests), so it cannot provide a good experience on its own. If we point people to a landing page that would hold all this extra information and simply link to those individual results in Testing Farm, does it mean that we won't/cannot have the same experience for all environments?
I think tests can own their presentation layer and still provide the same experience across all environments. Of course, there is no reason for each individual functional test to reinvent the wheel. Those tests can greatly benefit from the unified "raw stdout + workdir" UI. If maintainers want something better, they can always generate a custom report and then look it up in the workdir (people know their own tests).
To be clear: I am not suggesting to drop the current "delegate" API, I am
just suggesting to make it optional and allow CI systems to work with TF-provided resources directly, if it makes sense for the particular use case or integration.
We do not want to provide resources as a service. I believe the only thing we are missing is visibility (logs are visible from Zuul) and some bugs (like timeouts python devs are hitting). Make it a less black box thing, then the situation will improve.
My team definitely does not have time to maintain another tmt job to run tests ...
Don't worry, nobody is suggesting that it should be the Testing Farm team maintaining any (extra) tmt jobs.
Let me ask you this:
What is the Testing Farm?
I've probably never seen any pitch deck for the project, or a written down problem statement that the project is trying to tackle. However, I do remember people describing Testing Farm in the early days as "if OpenStack is down, we will get a VM from Beaker, and if Beaker is down, we will get a VM from cloud". Which sounds like a resource provider to me. And yes, reliably getting those resources was a big problem back then.
Having clarity on this fundamental level would help with reasoning about many things.
Thanks, Michal
WDYT?
Thanks, Michal
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 -- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Hi,
On Tue, Feb 8, 2022 at 8:51 AM Michal Srb msrb@redhat.com wrote:
po 7. 2. 2022 o 20:22 Miroslav Vadkerti mvadkert@redhat.com napísal(a):
Hi,
On Mon, Feb 7, 2022 at 5:19 PM Michal Srb msrb@redhat.com wrote:
Hi everybody,
po 7. 2. 2022 o 9:20 Miro Hrončok mhroncok@redhat.com napísal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
I have a suggestion that I believe could solve more than this problem.
Every CI system can connect to a machine, run some commands and then retrieve results at the end.
The nice thing about Testing Farm is that it should be able to scale really well as it can provision resources (not only) in the cloud. Miro is right that if Zuul always delegates tmt testing to Testing Farm, then TF outage will affect both Fedora CI and Zuul.
What if, instead of delegating to Testing Farm, we simply request resources (a VM) from TF and use that VM as a standard node in CI? CI (Zuul/Jenkins/GitLab CI,...) would simply connect to a TF-provided node and run the "tmt/you-name-it" command there. That way, if Zuul cannot get a VM from Testing Farm for whatever reason, it could still fallback and run the tests on one of its own nodes (Zuul would know what tmt command to run, currently it doesn't).
We do not want to maintain two same codebases to run the same tests, we want to same experience everywhere. Once we support two ways of running tmt tests, we have to maintain both.
node ("tf-provided-provisioned-vm") { tmt run ... }
Well, executing "tmt run" on a provisioned node is not a problem for any CI system.
Right, so CI systems can provision themselves, no need for Testing Farm to be involved.
A small plus would be that the CI would see the progress of the run.
Progress is coming (I know it is taking long), I guess that will make this small plus gone.
I am not sure what you mean by "same experience", but if it is the final static html page with results, then this feature should be provided by tmt itself (IMO).
A proper UI is coming, this is just some interim steps as implementing that will take time :(
Just to be clear, I am not arguing that functional (rpm-tmt-test) tests should be bypassing the "delegate" API. Those tests don't need any additional context and it's always as simple as "run my tmt tests, give me tmt log, stdout and workdir and I will be happy (and that reproducer is also cool)". This is what Testing Farm seems to be aiming for.
Not really, we would like to provide a stable base for any kind of tests, not just functional tests.
However, should we tmt-ize "rpm linter" and "compose test" and send it to Testing Farm to get the same experience everywhere? Well, I am yet to meet a single person who would ask about running rpminspect via tmt (or annocheck via rpminspect and all that via tmt). This is where I still see many question marks.
I still see this as problem of the test, what is the problem if that test would expose the command line which should be used to run rpminspect on localhost as rpminspect is designed?
I though wrapping rpminspect into tmt makes it easy for you to maintain the releases of rpminspect as you need. If there are missing features to make the logs more digastable due to this wrapping, let's do that.
If we decide to drop rpminspect from Testing Farm, I am fine with that, I just like that now it is all the same, all wrapped in a unified way in both Fedora CI and RHEL CI.
For Fedora CI we currently have 4.42% error rate in the last 90 days (sorry the grafana is internal only):
https://i.imgur.com/W5UKqhW.pngWhat is meets our SLA (<5%). We plan to get it down to <1% this year.
This would also help with generic tests because CI would have better control over the input/output.
Let's rather make tmt output digestible by the user, let's make generic tests work better and without workarounds, we did before to quickly bootstrap the testing. Let's give the users the same experience for all environments we have.
When we want to run the "reverse dependency" test, we cannot just run all those individual functional tests and dump the results on unsuspected users. Additional context is needed. Like who owns those tests (could be in tmt metadata), or whether the test was broken before so the result shouldn't be taken too seriously. This extra context is not known to Testing Farm (it just runs tests), so it cannot provide a good experience on its own. If we point people to a landing page that would hold all this extra information and simply link to those individual results in Testing Farm, does it mean that we won't/cannot have the same experience for all environments?
It does, I consider this a valid approach for revdeps tests.
I think tests can own their presentation layer and still provide the same experience across all environments. Of course, there is no reason for each individual functional test to reinvent the wheel. Those tests can greatly benefit from the unified "raw stdout + workdir" UI. If maintainers want something better, they can always generate a custom report and then look it up in the workdir (people know their own tests).
Agreed, I do not see issues here.
To be clear: I am not suggesting to drop the current "delegate" API, I am
just suggesting to make it optional and allow CI systems to work with TF-provided resources directly, if it makes sense for the particular use case or integration.
We do not want to provide resources as a service. I believe the only thing we are missing is visibility (logs are visible from Zuul) and some bugs (like timeouts python devs are hitting). Make it a less black box thing, then the situation will improve.
My team definitely does not have time to maintain another tmt job to run tests ...
Don't worry, nobody is suggesting that it should be the Testing Farm team maintaining any (extra) tmt jobs.
Great \o/, worries gone
Let me ask you this:
What is the Testing Farm?
I've probably never seen any pitch deck for the project, or a written down problem statement that the project is trying to tackle. However, I do remember people describing Testing Farm in the early days as "if OpenStack is down, we will get a VM from Beaker, and if Beaker is down, we will get a VM from cloud". Which sounds like a resource provider to me. And yes, reliably getting those resources was a big problem back then.
No, we will not be providing provisioning as a service, if somebody wants just the provisioner, he can spin his own instance of our Artemis provisioner
Having clarity on this fundamental level would help with reasoning about many things.
I guess we should update more our docs with some FAQ and what Testing Farm is, and what is it not planned to be. Would that help?
Best regards, /M
Thanks, Michal
WDYT?
Thanks, Michal
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 -- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Hello folks,
I'd like to take the opportunity of this thread to ask about: - Who are the current actors involved in the various pieces of the Fedora Pull Request CI ? - What is the plan, if any, for mid-term, long-term ?
I know that there is a SIG as stated here https://fedoraproject.org/wiki/SIGs/CI and bi-weekly meetings. There is also a Taiga project https://teams.fedoraproject.org/project/ci/timeline but it seems we are not using it anymore.
For the Zuul-based CI, only Tristan and myself are listed but we were expecting to hand over the responsibility to the community.
Fabien
On Tue, Feb 8, 2022 at 1:46 PM Miroslav Vadkerti mvadkert@redhat.com wrote:
Hi,
On Tue, Feb 8, 2022 at 8:51 AM Michal Srb msrb@redhat.com wrote:
po 7. 2. 2022 o 20:22 Miroslav Vadkerti mvadkert@redhat.com napísal(a):
Hi,
On Mon, Feb 7, 2022 at 5:19 PM Michal Srb msrb@redhat.com wrote:
Hi everybody,
po 7. 2. 2022 o 9:20 Miro Hrončok mhroncok@redhat.com napísal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
I have a suggestion that I believe could solve more than this problem.
Every CI system can connect to a machine, run some commands and then retrieve results at the end.
The nice thing about Testing Farm is that it should be able to scale really well as it can provision resources (not only) in the cloud. Miro is right that if Zuul always delegates tmt testing to Testing Farm, then TF outage will affect both Fedora CI and Zuul.
What if, instead of delegating to Testing Farm, we simply request resources (a VM) from TF and use that VM as a standard node in CI? CI (Zuul/Jenkins/GitLab CI,...) would simply connect to a TF-provided node and run the "tmt/you-name-it" command there. That way, if Zuul cannot get a VM from Testing Farm for whatever reason, it could still fallback and run the tests on one of its own nodes (Zuul would know what tmt command to run, currently it doesn't).
We do not want to maintain two same codebases to run the same tests, we want to same experience everywhere. Once we support two ways of running tmt tests, we have to maintain both.
node ("tf-provided-provisioned-vm") { tmt run ... }
Well, executing "tmt run" on a provisioned node is not a problem for any CI system.
Right, so CI systems can provision themselves, no need for Testing Farm to be involved.
A small plus would be that the CI would see the progress of the run.
Progress is coming (I know it is taking long), I guess that will make this small plus gone.
I am not sure what you mean by "same experience", but if it is the final static html page with results, then this feature should be provided by tmt itself (IMO).
A proper UI is coming, this is just some interim steps as implementing that will take time :(
Just to be clear, I am not arguing that functional (rpm-tmt-test) tests should be bypassing the "delegate" API. Those tests don't need any additional context and it's always as simple as "run my tmt tests, give me tmt log, stdout and workdir and I will be happy (and that reproducer is also cool)". This is what Testing Farm seems to be aiming for.
Not really, we would like to provide a stable base for any kind of tests, not just functional tests.
However, should we tmt-ize "rpm linter" and "compose test" and send it to Testing Farm to get the same experience everywhere? Well, I am yet to meet a single person who would ask about running rpminspect via tmt (or annocheck via rpminspect and all that via tmt). This is where I still see many question marks.
I still see this as problem of the test, what is the problem if that test would expose the command line which should be used to run rpminspect on localhost as rpminspect is designed?
I though wrapping rpminspect into tmt makes it easy for you to maintain the releases of rpminspect as you need. If there are missing features to make the logs more digastable due to this wrapping, let's do that.
If we decide to drop rpminspect from Testing Farm, I am fine with that, I just like that now it is all the same, all wrapped in a unified way in both Fedora CI and RHEL CI.
For Fedora CI we currently have 4.42% error rate in the last 90 days (sorry the grafana is internal only):
https://i.imgur.com/W5UKqhW.pngWhat is meets our SLA (<5%). We plan to get it down to <1% this year.
This would also help with generic tests because CI would have better control over the input/output.
Let's rather make tmt output digestible by the user, let's make generic tests work better and without workarounds, we did before to quickly bootstrap the testing. Let's give the users the same experience for all environments we have.
When we want to run the "reverse dependency" test, we cannot just run all those individual functional tests and dump the results on unsuspected users. Additional context is needed. Like who owns those tests (could be in tmt metadata), or whether the test was broken before so the result shouldn't be taken too seriously. This extra context is not known to Testing Farm (it just runs tests), so it cannot provide a good experience on its own. If we point people to a landing page that would hold all this extra information and simply link to those individual results in Testing Farm, does it mean that we won't/cannot have the same experience for all environments?
It does, I consider this a valid approach for revdeps tests.
I think tests can own their presentation layer and still provide the same experience across all environments. Of course, there is no reason for each individual functional test to reinvent the wheel. Those tests can greatly benefit from the unified "raw stdout + workdir" UI. If maintainers want something better, they can always generate a custom report and then look it up in the workdir (people know their own tests).
Agreed, I do not see issues here.
To be clear: I am not suggesting to drop the current "delegate" API, I
am just suggesting to make it optional and allow CI systems to work with TF-provided resources directly, if it makes sense for the particular use case or integration.
We do not want to provide resources as a service. I believe the only thing we are missing is visibility (logs are visible from Zuul) and some bugs (like timeouts python devs are hitting). Make it a less black box thing, then the situation will improve.
My team definitely does not have time to maintain another tmt job to run tests ...
Don't worry, nobody is suggesting that it should be the Testing Farm team maintaining any (extra) tmt jobs.
Great \o/, worries gone
Let me ask you this:
What is the Testing Farm?
I've probably never seen any pitch deck for the project, or a written down problem statement that the project is trying to tackle. However, I do remember people describing Testing Farm in the early days as "if OpenStack is down, we will get a VM from Beaker, and if Beaker is down, we will get a VM from cloud". Which sounds like a resource provider to me. And yes, reliably getting those resources was a big problem back then.
No, we will not be providing provisioning as a service, if somebody wants just the provisioner, he can spin his own instance of our Artemis provisioner
Having clarity on this fundamental level would help with reasoning about many things.
I guess we should update more our docs with some FAQ and what Testing Farm is, and what is it not planned to be. Would that help?
Best regards, /M
Thanks, Michal
WDYT?
Thanks, Michal
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 -- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Hi,
On Tue, Feb 8, 2022 at 4:37 PM Fabien Boucher fboucher@redhat.com wrote:
Hello folks,
I'd like to take the opportunity of this thread to ask about:
- Who are the current actors involved in the various pieces of the Fedora
Pull Request CI ?
OSCI and Testing Farm Team
- What is the plan, if any, for mid-term, long-term ?
I would say we should move to Zuul only.
I know that there is a SIG as stated here https://fedoraproject.org/wiki/SIGs/CI and bi-weekly
Yeah, those meetings are quite dead, right Jim?
meetings. There is also a Taiga project https://teams.fedoraproject.org/project/ci/timeline but it seems we are not using it anymore.
Yep, we mostly use just the fedora-ci/general now to track issues.
For the Zuul-based CI, only Tristan and myself are listed but we were expecting to hand over the responsibility to the community.
Basically, that would mean it will land on us :) I am not sure we would be able to operate it fully, at least not yet. Also, we are quite understaffed so we are happy we can keep our puzzle running,
I let OSCI team or somebody else chime in. Maybe it would be good time to talk about Fedora CI in general and the direction and main pain points which block further adoption.
Best regards, /M
Fabien
On Tue, Feb 8, 2022 at 1:46 PM Miroslav Vadkerti mvadkert@redhat.com wrote:
Hi,
On Tue, Feb 8, 2022 at 8:51 AM Michal Srb msrb@redhat.com wrote:
po 7. 2. 2022 o 20:22 Miroslav Vadkerti mvadkert@redhat.com napísal(a):
Hi,
On Mon, Feb 7, 2022 at 5:19 PM Michal Srb msrb@redhat.com wrote:
Hi everybody,
po 7. 2. 2022 o 9:20 Miro Hrončok mhroncok@redhat.com napísal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
I have a suggestion that I believe could solve more than this problem.
Every CI system can connect to a machine, run some commands and then retrieve results at the end.
The nice thing about Testing Farm is that it should be able to scale really well as it can provision resources (not only) in the cloud. Miro is right that if Zuul always delegates tmt testing to Testing Farm, then TF outage will affect both Fedora CI and Zuul.
What if, instead of delegating to Testing Farm, we simply request resources (a VM) from TF and use that VM as a standard node in CI? CI (Zuul/Jenkins/GitLab CI,...) would simply connect to a TF-provided node and run the "tmt/you-name-it" command there. That way, if Zuul cannot get a VM from Testing Farm for whatever reason, it could still fallback and run the tests on one of its own nodes (Zuul would know what tmt command to run, currently it doesn't).
We do not want to maintain two same codebases to run the same tests, we want to same experience everywhere. Once we support two ways of running tmt tests, we have to maintain both.
node ("tf-provided-provisioned-vm") { tmt run ... }
Well, executing "tmt run" on a provisioned node is not a problem for any CI system.
Right, so CI systems can provision themselves, no need for Testing Farm to be involved.
A small plus would be that the CI would see the progress of the run.
Progress is coming (I know it is taking long), I guess that will make this small plus gone.
I am not sure what you mean by "same experience", but if it is the final static html page with results, then this feature should be provided by tmt itself (IMO).
A proper UI is coming, this is just some interim steps as implementing that will take time :(
Just to be clear, I am not arguing that functional (rpm-tmt-test) tests should be bypassing the "delegate" API. Those tests don't need any additional context and it's always as simple as "run my tmt tests, give me tmt log, stdout and workdir and I will be happy (and that reproducer is also cool)". This is what Testing Farm seems to be aiming for.
Not really, we would like to provide a stable base for any kind of tests, not just functional tests.
However, should we tmt-ize "rpm linter" and "compose test" and send it to Testing Farm to get the same experience everywhere? Well, I am yet to meet a single person who would ask about running rpminspect via tmt (or annocheck via rpminspect and all that via tmt). This is where I still see many question marks.
I still see this as problem of the test, what is the problem if that test would expose the command line which should be used to run rpminspect on localhost as rpminspect is designed?
I though wrapping rpminspect into tmt makes it easy for you to maintain the releases of rpminspect as you need. If there are missing features to make the logs more digastable due to this wrapping, let's do that.
If we decide to drop rpminspect from Testing Farm, I am fine with that, I just like that now it is all the same, all wrapped in a unified way in both Fedora CI and RHEL CI.
For Fedora CI we currently have 4.42% error rate in the last 90 days (sorry the grafana is internal only):
https://i.imgur.com/W5UKqhW.pngWhat is meets our SLA (<5%). We plan to get it down to <1% this year.
This would also help with generic tests because CI would have better control over the input/output.
Let's rather make tmt output digestible by the user, let's make generic tests work better and without workarounds, we did before to quickly bootstrap the testing. Let's give the users the same experience for all environments we have.
When we want to run the "reverse dependency" test, we cannot just run all those individual functional tests and dump the results on unsuspected users. Additional context is needed. Like who owns those tests (could be in tmt metadata), or whether the test was broken before so the result shouldn't be taken too seriously. This extra context is not known to Testing Farm (it just runs tests), so it cannot provide a good experience on its own. If we point people to a landing page that would hold all this extra information and simply link to those individual results in Testing Farm, does it mean that we won't/cannot have the same experience for all environments?
It does, I consider this a valid approach for revdeps tests.
I think tests can own their presentation layer and still provide the same experience across all environments. Of course, there is no reason for each individual functional test to reinvent the wheel. Those tests can greatly benefit from the unified "raw stdout + workdir" UI. If maintainers want something better, they can always generate a custom report and then look it up in the workdir (people know their own tests).
Agreed, I do not see issues here.
To be clear: I am not suggesting to drop the current "delegate" API, I
am just suggesting to make it optional and allow CI systems to work with TF-provided resources directly, if it makes sense for the particular use case or integration.
We do not want to provide resources as a service. I believe the only thing we are missing is visibility (logs are visible from Zuul) and some bugs (like timeouts python devs are hitting). Make it a less black box thing, then the situation will improve.
My team definitely does not have time to maintain another tmt job to run tests ...
Don't worry, nobody is suggesting that it should be the Testing Farm team maintaining any (extra) tmt jobs.
Great \o/, worries gone
Let me ask you this:
What is the Testing Farm?
I've probably never seen any pitch deck for the project, or a written down problem statement that the project is trying to tackle. However, I do remember people describing Testing Farm in the early days as "if OpenStack is down, we will get a VM from Beaker, and if Beaker is down, we will get a VM from cloud". Which sounds like a resource provider to me. And yes, reliably getting those resources was a big problem back then.
No, we will not be providing provisioning as a service, if somebody wants just the provisioner, he can spin his own instance of our Artemis provisioner
Having clarity on this fundamental level would help with reasoning about many things.
I guess we should update more our docs with some FAQ and what Testing Farm is, and what is it not planned to be. Would that help?
Best regards, /M
Thanks, Michal
WDYT?
Thanks, Michal
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 -- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Hi,
Thanks Miroslav for the input.
On Thu, Feb 10, 2022 at 11:14 PM Miroslav Vadkerti mvadkert@redhat.com wrote:
Hi,
On Tue, Feb 8, 2022 at 4:37 PM Fabien Boucher fboucher@redhat.com wrote:
Hello folks,
I'd like to take the opportunity of this thread to ask about:
- Who are the current actors involved in the various pieces of the Fedora
Pull Request CI ?
OSCI and Testing Farm Team
- What is the plan, if any, for mid-term, long-term ?
I would say we should move to Zuul only.
I know that there is a SIG as stated here https://fedoraproject.org/wiki/SIGs/CI and bi-weekly
Yeah, those meetings are quite dead, right Jim?
meetings. There is also a Taiga project https://teams.fedoraproject.org/project/ci/timeline but it seems we are not using it anymore.
Yep, we mostly use just the fedora-ci/general now to track issues.
For the Zuul-based CI, only Tristan and myself are listed but we were expecting to hand over the responsibility to the community.
Basically, that would mean it will land on us :) I am not sure we would be able to operate it fully, at least not yet. Also, we are quite understaffed so we are happy we can keep our puzzle running,
https://fedora.softwarefactory-project.io is a scoped view of Zuul for the Fedora Zuul tenant. It means Fedora Zuul tenant relies on the same Zuul/Nodepool instance than other tenants listed here https://softwarefactory-project.io/zuul/tenants We are committed to keep Zuul and Nodepool services running. I think that there are two layers: - the infra (provided through softwarefactory-project.io) - the Zuul CI config which is hosted in git repos The idea is to get more involvement in the CI config part from the community.
I let OSCI team or somebody else chime in. Maybe it would be good time to talk about Fedora CI in general and the direction and main pain points which block further adoption.
I agree.
Fabien
On Tue, Feb 8, 2022 at 1:46 PM Miroslav Vadkerti mvadkert@redhat.com wrote:
Hi,
On Tue, Feb 8, 2022 at 8:51 AM Michal Srb msrb@redhat.com wrote:
po 7. 2. 2022 o 20:22 Miroslav Vadkerti mvadkert@redhat.com napísal(a):
Hi,
On Mon, Feb 7, 2022 at 5:19 PM Michal Srb msrb@redhat.com wrote:
Hi everybody,
po 7. 2. 2022 o 9:20 Miro Hrončok mhroncok@redhat.com napísal(a):
> Hello CI folks! > > Several times somebody asked me "Why there are two CI jobs in Fedora > dist-git > PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* > run the exact > same tests?" > > My answers so far was: "I don't know why it happened, but now it > works as a > nice backup plan. It is very common that one of them breaks [1]. > When you see > an unreadable infrastructure failure [2] at one of them, you have a > pretty good > chance tat the other one would still work." > > With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), > this benefit > would be lost. We would just have two CI tests runs where both of > them have > infrastructure failures at the same times (see for example [3]), > both of them > run on the same system, and both of them run the same tests. > > Could you please consider not doing that, at least until the overall > stability > of the CI improves? >
I have a suggestion that I believe could solve more than this problem.
Every CI system can connect to a machine, run some commands and then retrieve results at the end.
The nice thing about Testing Farm is that it should be able to scale really well as it can provision resources (not only) in the cloud. Miro is right that if Zuul always delegates tmt testing to Testing Farm, then TF outage will affect both Fedora CI and Zuul.
What if, instead of delegating to Testing Farm, we simply request resources (a VM) from TF and use that VM as a standard node in CI? CI (Zuul/Jenkins/GitLab CI,...) would simply connect to a TF-provided node and run the "tmt/you-name-it" command there. That way, if Zuul cannot get a VM from Testing Farm for whatever reason, it could still fallback and run the tests on one of its own nodes (Zuul would know what tmt command to run, currently it doesn't).
We do not want to maintain two same codebases to run the same tests, we want to same experience everywhere. Once we support two ways of running tmt tests, we have to maintain both.
node ("tf-provided-provisioned-vm") { tmt run ... }
Well, executing "tmt run" on a provisioned node is not a problem for any CI system.
Right, so CI systems can provision themselves, no need for Testing Farm to be involved.
A small plus would be that the CI would see the progress of the run.
Progress is coming (I know it is taking long), I guess that will make this small plus gone.
I am not sure what you mean by "same experience", but if it is the final static html page with results, then this feature should be provided by tmt itself (IMO).
A proper UI is coming, this is just some interim steps as implementing that will take time :(
Just to be clear, I am not arguing that functional (rpm-tmt-test) tests should be bypassing the "delegate" API. Those tests don't need any additional context and it's always as simple as "run my tmt tests, give me tmt log, stdout and workdir and I will be happy (and that reproducer is also cool)". This is what Testing Farm seems to be aiming for.
Not really, we would like to provide a stable base for any kind of tests, not just functional tests.
However, should we tmt-ize "rpm linter" and "compose test" and send it to Testing Farm to get the same experience everywhere? Well, I am yet to meet a single person who would ask about running rpminspect via tmt (or annocheck via rpminspect and all that via tmt). This is where I still see many question marks.
I still see this as problem of the test, what is the problem if that test would expose the command line which should be used to run rpminspect on localhost as rpminspect is designed?
I though wrapping rpminspect into tmt makes it easy for you to maintain the releases of rpminspect as you need. If there are missing features to make the logs more digastable due to this wrapping, let's do that.
If we decide to drop rpminspect from Testing Farm, I am fine with that, I just like that now it is all the same, all wrapped in a unified way in both Fedora CI and RHEL CI.
For Fedora CI we currently have 4.42% error rate in the last 90 days (sorry the grafana is internal only):
https://i.imgur.com/W5UKqhW.pngWhat is meets our SLA (<5%). We plan to get it down to <1% this year.
This would also help with generic tests because CI would have better control over the input/output.
Let's rather make tmt output digestible by the user, let's make generic tests work better and without workarounds, we did before to quickly bootstrap the testing. Let's give the users the same experience for all environments we have.
When we want to run the "reverse dependency" test, we cannot just run all those individual functional tests and dump the results on unsuspected users. Additional context is needed. Like who owns those tests (could be in tmt metadata), or whether the test was broken before so the result shouldn't be taken too seriously. This extra context is not known to Testing Farm (it just runs tests), so it cannot provide a good experience on its own. If we point people to a landing page that would hold all this extra information and simply link to those individual results in Testing Farm, does it mean that we won't/cannot have the same experience for all environments?
It does, I consider this a valid approach for revdeps tests.
I think tests can own their presentation layer and still provide the same experience across all environments. Of course, there is no reason for each individual functional test to reinvent the wheel. Those tests can greatly benefit from the unified "raw stdout + workdir" UI. If maintainers want something better, they can always generate a custom report and then look it up in the workdir (people know their own tests).
Agreed, I do not see issues here.
To be clear: I am not suggesting to drop the current "delegate" API, I
am just suggesting to make it optional and allow CI systems to work with TF-provided resources directly, if it makes sense for the particular use case or integration.
We do not want to provide resources as a service. I believe the only thing we are missing is visibility (logs are visible from Zuul) and some bugs (like timeouts python devs are hitting). Make it a less black box thing, then the situation will improve.
My team definitely does not have time to maintain another tmt job to run tests ...
Don't worry, nobody is suggesting that it should be the Testing Farm team maintaining any (extra) tmt jobs.
Great \o/, worries gone
Let me ask you this:
What is the Testing Farm?
I've probably never seen any pitch deck for the project, or a written down problem statement that the project is trying to tackle. However, I do remember people describing Testing Farm in the early days as "if OpenStack is down, we will get a VM from Beaker, and if Beaker is down, we will get a VM from cloud". Which sounds like a resource provider to me. And yes, reliably getting those resources was a big problem back then.
No, we will not be providing provisioning as a service, if somebody wants just the provisioner, he can spin his own instance of our Artemis provisioner
Having clarity on this fundamental level would help with reasoning about many things.
I guess we should update more our docs with some FAQ and what Testing Farm is, and what is it not planned to be. Would that help?
Best regards, /M
Thanks, Michal
WDYT?
Thanks, Michal
> > Thanks, > > [1] https://pagure.io/fedora-ci/general/issue/44 > [2] https://pagure.io/fedora-ci/general/issue/43 > [3] > https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 > -- > Miro Hrončok > -- > Phone: +420777974800 > IRC: mhroncok > _______________________________________________ > CI mailing list -- ci@lists.fedoraproject.org > To unsubscribe send an email to ci-leave@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: > https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org > Do not reply to spam on the list, report it: > https://pagure.io/fedora-infrastructure > _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Hi,
On Mon, Feb 14, 2022 at 10:40 AM Fabien Boucher fboucher@redhat.com wrote:
Hi,
Thanks Miroslav for the input.
On Thu, Feb 10, 2022 at 11:14 PM Miroslav Vadkerti mvadkert@redhat.com wrote:
Hi,
On Tue, Feb 8, 2022 at 4:37 PM Fabien Boucher fboucher@redhat.com wrote:
Hello folks,
I'd like to take the opportunity of this thread to ask about:
- Who are the current actors involved in the various pieces of the
Fedora Pull Request CI ?
OSCI and Testing Farm Team
- What is the plan, if any, for mid-term, long-term ?
I would say we should move to Zuul only.
I know that there is a SIG as stated here https://fedoraproject.org/wiki/SIGs/CI and bi-weekly
Yeah, those meetings are quite dead, right Jim?
meetings. There is also a Taiga project https://teams.fedoraproject.org/project/ci/timeline but it seems we are not using it anymore.
Yep, we mostly use just the fedora-ci/general now to track issues.
For the Zuul-based CI, only Tristan and myself are listed but we were expecting to hand over the responsibility to the community.
Basically, that would mean it will land on us :) I am not sure we would be able to operate it fully, at least not yet. Also, we are quite understaffed so we are happy we can keep our puzzle running,
https://fedora.softwarefactory-project.io is a scoped view of Zuul for the Fedora Zuul tenant. It means Fedora Zuul tenant relies on the same Zuul/Nodepool instance than other tenants listed here https://softwarefactory-project.io/zuul/tenants We are committed to keep Zuul and Nodepool services running. I think that there are two layers:
- the infra (provided through softwarefactory-project.io)
- the Zuul CI config which is hosted in git repos
The idea is to get more involvement in the CI config part from the community.
Ack, that makes sense, thanks for the explanation. I understood it wrong.
I let OSCI team or somebody else chime in. Maybe it would be good time to talk about Fedora CI in general and the direction and main pain points which block further adoption.
I agree.
@Jim can you organize something pretty pls? :)
Fabien
On Tue, Feb 8, 2022 at 1:46 PM Miroslav Vadkerti mvadkert@redhat.com wrote:
Hi,
On Tue, Feb 8, 2022 at 8:51 AM Michal Srb msrb@redhat.com wrote:
po 7. 2. 2022 o 20:22 Miroslav Vadkerti mvadkert@redhat.com napísal(a):
Hi,
On Mon, Feb 7, 2022 at 5:19 PM Michal Srb msrb@redhat.com wrote:
> Hi everybody, > > po 7. 2. 2022 o 9:20 Miro Hrončok mhroncok@redhat.com napísal(a): > >> Hello CI folks! >> >> Several times somebody asked me "Why there are two CI jobs in >> Fedora dist-git >> PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* >> run the exact >> same tests?" >> >> My answers so far was: "I don't know why it happened, but now it >> works as a >> nice backup plan. It is very common that one of them breaks [1]. >> When you see >> an unreadable infrastructure failure [2] at one of them, you have a >> pretty good >> chance tat the other one would still work." >> >> With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), >> this benefit >> would be lost. We would just have two CI tests runs where both of >> them have >> infrastructure failures at the same times (see for example [3]), >> both of them >> run on the same system, and both of them run the same tests. >> >> Could you please consider not doing that, at least until the >> overall stability >> of the CI improves? >> > > I have a suggestion that I believe could solve more than this > problem. > > Every CI system can connect to a machine, run some commands and then > retrieve results at the end. > > The nice thing about Testing Farm is that it should be able to scale > really well as it can provision resources (not only) in the cloud. > Miro is right that if Zuul always delegates tmt testing to Testing > Farm, then TF outage will affect both Fedora CI and Zuul. >
> What if, instead of delegating to Testing Farm, we simply request > resources (a VM) from TF and use that VM as a standard node in CI? CI > (Zuul/Jenkins/GitLab CI,...) would simply connect to a TF-provided node and > run the "tmt/you-name-it" command there. That way, if Zuul cannot get a VM > from Testing Farm for whatever reason, it could still fallback and run the > tests on one of its own nodes (Zuul would know what tmt command to run, > currently it doesn't). >
We do not want to maintain two same codebases to run the same tests, we want to same experience everywhere. Once we support two ways of running tmt tests, we have to maintain both.
node ("tf-provided-provisioned-vm") { tmt run ... }
Well, executing "tmt run" on a provisioned node is not a problem for any CI system.
Right, so CI systems can provision themselves, no need for Testing Farm to be involved.
A small plus would be that the CI would see the progress of the run.
Progress is coming (I know it is taking long), I guess that will make this small plus gone.
I am not sure what you mean by "same experience", but if it is the final static html page with results, then this feature should be provided by tmt itself (IMO).
A proper UI is coming, this is just some interim steps as implementing that will take time :(
Just to be clear, I am not arguing that functional (rpm-tmt-test) tests should be bypassing the "delegate" API. Those tests don't need any additional context and it's always as simple as "run my tmt tests, give me tmt log, stdout and workdir and I will be happy (and that reproducer is also cool)". This is what Testing Farm seems to be aiming for.
Not really, we would like to provide a stable base for any kind of tests, not just functional tests.
However, should we tmt-ize "rpm linter" and "compose test" and send it to Testing Farm to get the same experience everywhere? Well, I am yet to meet a single person who would ask about running rpminspect via tmt (or annocheck via rpminspect and all that via tmt). This is where I still see many question marks.
I still see this as problem of the test, what is the problem if that test would expose the command line which should be used to run rpminspect on localhost as rpminspect is designed?
I though wrapping rpminspect into tmt makes it easy for you to maintain the releases of rpminspect as you need. If there are missing features to make the logs more digastable due to this wrapping, let's do that.
If we decide to drop rpminspect from Testing Farm, I am fine with that, I just like that now it is all the same, all wrapped in a unified way in both Fedora CI and RHEL CI.
For Fedora CI we currently have 4.42% error rate in the last 90 days (sorry the grafana is internal only):
https://i.imgur.com/W5UKqhW.pngWhat is meets our SLA (<5%). We plan to get it down to <1% this year.
> > This would also help with generic tests because CI would have better > control over the input/output. >
Let's rather make tmt output digestible by the user, let's make generic tests work better and without workarounds, we did before to quickly bootstrap the testing. Let's give the users the same experience for all environments we have.
When we want to run the "reverse dependency" test, we cannot just run all those individual functional tests and dump the results on unsuspected users. Additional context is needed. Like who owns those tests (could be in tmt metadata), or whether the test was broken before so the result shouldn't be taken too seriously. This extra context is not known to Testing Farm (it just runs tests), so it cannot provide a good experience on its own. If we point people to a landing page that would hold all this extra information and simply link to those individual results in Testing Farm, does it mean that we won't/cannot have the same experience for all environments?
It does, I consider this a valid approach for revdeps tests.
I think tests can own their presentation layer and still provide the same experience across all environments. Of course, there is no reason for each individual functional test to reinvent the wheel. Those tests can greatly benefit from the unified "raw stdout + workdir" UI. If maintainers want something better, they can always generate a custom report and then look it up in the workdir (people know their own tests).
Agreed, I do not see issues here.
To be clear: I am not suggesting to drop the current "delegate" API, > I am just suggesting to make it optional and allow CI systems to work with > TF-provided resources directly, if it makes sense for the particular use > case or integration. >
We do not want to provide resources as a service. I believe the only thing we are missing is visibility (logs are visible from Zuul) and some bugs (like timeouts python devs are hitting). Make it a less black box thing, then the situation will improve.
My team definitely does not have time to maintain another tmt job to run tests ...
Don't worry, nobody is suggesting that it should be the Testing Farm team maintaining any (extra) tmt jobs.
Great \o/, worries gone
Let me ask you this:
What is the Testing Farm?
I've probably never seen any pitch deck for the project, or a written down problem statement that the project is trying to tackle. However, I do remember people describing Testing Farm in the early days as "if OpenStack is down, we will get a VM from Beaker, and if Beaker is down, we will get a VM from cloud". Which sounds like a resource provider to me. And yes, reliably getting those resources was a big problem back then.
No, we will not be providing provisioning as a service, if somebody wants just the provisioner, he can spin his own instance of our Artemis provisioner
Having clarity on this fundamental level would help with reasoning about many things.
I guess we should update more our docs with some FAQ and what Testing Farm is, and what is it not planned to be. Would that help?
Best regards, /M
Thanks, Michal
>
> WDYT? > > Thanks, > Michal > > >> >> Thanks, >> >> [1] https://pagure.io/fedora-ci/general/issue/44 >> [2] https://pagure.io/fedora-ci/general/issue/43 >> [3] >> https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 >> -- >> Miro Hrončok >> -- >> Phone: +420777974800 >> IRC: mhroncok >> _______________________________________________ >> CI mailing list -- ci@lists.fedoraproject.org >> To unsubscribe send an email to ci-leave@lists.fedoraproject.org >> Fedora Code of Conduct: >> https://docs.fedoraproject.org/en-US/project/code-of-conduct/ >> List Guidelines: >> https://fedoraproject.org/wiki/Mailing_list_guidelines >> List Archives: >> https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org >> Do not reply to spam on the list, report it: >> https://pagure.io/fedora-infrastructure >> > _______________________________________________ > CI mailing list -- ci@lists.fedoraproject.org > To unsubscribe send an email to ci-leave@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: > https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org > Do not reply to spam on the list, report it: > https://pagure.io/fedora-infrastructure >
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Mon, Feb 07, 2022 at 17:17 Michal Srb wrote:
Hi everybody,
po 7. 2. 2022 o 9:20 Miro Hrončok mhroncok@redhat.com napísal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
I have a suggestion that I believe could solve more than this problem.
Every CI system can connect to a machine, run some commands and then retrieve results at the end.
The nice thing about Testing Farm is that it should be able to scale really well as it can provision resources (not only) in the cloud. Miro is right that if Zuul always delegates tmt testing to Testing Farm, then TF outage will affect both Fedora CI and Zuul.
What if, instead of delegating to Testing Farm, we simply request resources (a VM) from TF and use that VM as a standard node in CI? CI (Zuul/Jenkins/GitLab CI,...) would simply connect to a TF-provided node and run the "tmt/you-name-it" command there. That way, if Zuul cannot get a VM from Testing Farm for whatever reason, it could still fallback and run the tests on one of its own nodes (Zuul would know what tmt command to run, currently it doesn't).
Zuul test instances are provided by a service named nodepool, and we should create a nodepool driver for Testing Farm, so that this would be transparent. That is not very complicated and it would be a great addition to: https://zuul-ci.org/docs/nodepool/configuration.html
Otherwise I think Zuul is already doing what you suggest, but through the API instead of directly connecting to the node.
This would also help with generic tests because CI would have better control over the input/output.
To be clear: I am not suggesting to drop the current "delegate" API, I am just suggesting to make it optional and allow CI systems to work with TF-provided resources directly, if it makes sense for the particular use case or integration.
WDYT?
Thanks, Michal
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 -- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Mon, Feb 07, 2022 at 19:21 Tristan Cacqueray wrote:
On Mon, Feb 07, 2022 at 17:17 Michal Srb wrote:
Hi everybody,
po 7. 2. 2022 o 9:20 Miro Hrončok mhroncok@redhat.com napísal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
I have a suggestion that I believe could solve more than this problem.
Every CI system can connect to a machine, run some commands and then retrieve results at the end.
The nice thing about Testing Farm is that it should be able to scale really well as it can provision resources (not only) in the cloud. Miro is right that if Zuul always delegates tmt testing to Testing Farm, then TF outage will affect both Fedora CI and Zuul.
What if, instead of delegating to Testing Farm, we simply request resources (a VM) from TF and use that VM as a standard node in CI? CI (Zuul/Jenkins/GitLab CI,...) would simply connect to a TF-provided node and run the "tmt/you-name-it" command there. That way, if Zuul cannot get a VM from Testing Farm for whatever reason, it could still fallback and run the tests on one of its own nodes (Zuul would know what tmt command to run, currently it doesn't).
Zuul test instances are provided by a service named nodepool, and we should create a nodepool driver for Testing Farm, so that this would be transparent. That is not very complicated and it would be a great addition to: https://zuul-ci.org/docs/nodepool/configuration.html
After further investigations, I don't think we should do that nodepool integration I previously suggested. If I understand correctly, Testing Farm aims to provides a user friendly interface to run tmt using complex testing plan, and it should not provide direct access through ssh or a kubectl exec API.
Thus it seems like the main issue with the current setup is that we don't have a log stream of the test output while the job is running. And perhaps, we just need to improve testing farm trigger[3] to provide more infos. For example, the trigger could stream:
``` Requesting node... Setting up access... Node is ready, running $test... <test output> ```
I think that would solve the immediate issue reported:
- Zuul infra for such integration is lightweight, the trigger is performed from a minimal container. - tmt run as a regular task.
That should be transparent for fedora CI users.
-Tristan
[3] the current testing farm integration is: https://pagure.io/fedora-zuul-jobs/blob/master/f/roles/testing-farm-run-test...
Otherwise I think Zuul is already doing what you suggest, but through the API instead of directly connecting to the node.
This would also help with generic tests because CI would have better control over the input/output.
To be clear: I am not suggesting to drop the current "delegate" API, I am just suggesting to make it optional and allow CI systems to work with TF-provided resources directly, if it makes sense for the particular use case or integration.
WDYT?
Thanks, Michal
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 -- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Tue, Mar 8, 2022 at 4:33 PM Tristan Cacqueray tdecacqu@redhat.com wrote:
On Mon, Feb 07, 2022 at 19:21 Tristan Cacqueray wrote:
On Mon, Feb 07, 2022 at 17:17 Michal Srb wrote:
Hi everybody,
po 7. 2. 2022 o 9:20 Miro Hrončok mhroncok@redhat.com napísal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run
the
exact same tests?"
My answers so far was: "I don't know why it happened, but now it works
as
a nice backup plan. It is very common that one of them breaks [1]. When
you
see an unreadable infrastructure failure [2] at one of them, you have a
pretty
good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both
of
them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
I have a suggestion that I believe could solve more than this problem.
Every CI system can connect to a machine, run some commands and then retrieve results at the end.
The nice thing about Testing Farm is that it should be able to scale
really
well as it can provision resources (not only) in the cloud. Miro is right that if Zuul always delegates tmt testing to Testing Farm, then TF outage will affect both Fedora CI and Zuul.
What if, instead of delegating to Testing Farm, we simply request
resources
(a VM) from TF and use that VM as a standard node in CI? CI (Zuul/Jenkins/GitLab CI,...) would simply connect to a TF-provided node
and
run the "tmt/you-name-it" command there. That way, if Zuul cannot get a
VM
from Testing Farm for whatever reason, it could still fallback and run
the
tests on one of its own nodes (Zuul would know what tmt command to run, currently it doesn't).
Zuul test instances are provided by a service named nodepool, and we should create a nodepool driver for Testing Farm, so that this would be transparent. That is not very complicated and it would be a great addition to: https://zuul-ci.org/docs/nodepool/configuration.html
After further investigations, I don't think we should do that nodepool integration I previously suggested. If I understand correctly, Testing Farm aims to provides a user friendly interface to run tmt using complex testing plan, and it should not provide direct access through ssh or a kubectl exec API.
Thus it seems like the main issue with the current setup is that we don't have a log stream of the test output while the job is running. And perhaps, we just need to improve testing farm trigger[3] to provide more infos. For example, the trigger could stream:
Requesting node... Setting up access... Node is ready, running $test... <test output>I think that would solve the immediate issue reported:
- Zuul infra for such integration is lightweight, the trigger is performed from a minimal container.
- tmt run as a regular task.
That should be transparent for fedora CI users.
-Tristan
[3] the current testing farm integration is:
https://pagure.io/fedora-zuul-jobs/blob/master/f/roles/testing-farm-run-test...
Thanks Tristan, I am glad we see things now the same way. After we are out of error budget mode we will implement a WebSocket API endpoint which will provide reasonable user-facing console log:
Filed this RH issue to track it:
https://issues.redhat.com/browse/TFT-1110
Best regards, /M
Otherwise I think Zuul is already doing what you suggest, but through the API instead of directly connecting to the node.
This would also help with generic tests because CI would have better control over the input/output.
To be clear: I am not suggesting to drop the current "delegate" API, I
am
just suggesting to make it optional and allow CI systems to work with TF-provided resources directly, if it makes sense for the particular use case or integration.
WDYT?
Thanks, Michal
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3]
https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59
-- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Mon, Feb 07, 2022 at 09:20:08AM +0100, Miro Hrončok wrote:
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
Doesn't this however reinforce the notion that infrastructure can be flaky because we have more of them, rather than focusing on one system of choice and making its stability the top priority?
On 15. 02. 22 22:51, Jan Pazdziora wrote:
On Mon, Feb 07, 2022 at 09:20:08AM +0100, Miro Hrončok wrote:
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
Doesn't this however reinforce the notion that infrastructure can be flaky because we have more of them, rather than focusing on one system of choice and making its stability the top priority?
Definitively! Making it stable should be a priority. But until it actually is stable, I propose we keep the redundancy.
On Wed, Mar 9, 2022 at 11:57 AM Miro Hrončok mhroncok@redhat.com wrote:
On 15. 02. 22 22:51, Jan Pazdziora wrote:
On Mon, Feb 07, 2022 at 09:20:08AM +0100, Miro Hrončok wrote:
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git
test*
run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works
as a
nice backup plan. It is very common that one of them breaks [1]. When
you
see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
Doesn't this however reinforce the notion that infrastructure can be flaky because we have more of them, rather than focusing on one system of choice and making its stability the top priority?
Definitively! Making it stable should be a priority. But until it actually is stable, I propose we keep the redundancy.
When we are on the topic of stability, we should get together sometime and review your current list of problems to identify the most pressing ones.
I guess I will schedule some mtg together with all representatives? Or should we wait a bit more?
Thanks, /M
-- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On 09. 03. 22 12:00, Miroslav Vadkerti wrote:
On Wed, Mar 9, 2022 at 11:57 AM Miro Hrončok <mhroncok@redhat.com mailto:mhroncok@redhat.com> wrote:
On 15. 02. 22 22:51, Jan Pazdziora wrote: > On Mon, Feb 07, 2022 at 09:20:08AM +0100, Miro Hrončok wrote: >> Hello CI folks! >> >> Several times somebody asked me "Why there are two CI jobs in Fedora >> dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* >> run the exact same tests?" >> >> My answers so far was: "I don't know why it happened, but now it works as a >> nice backup plan. It is very common that one of them breaks [1]. When you >> see an unreadable infrastructure failure [2] at one of them, you have a >> pretty good chance tat the other one would still work." > > Doesn't this however reinforce the notion that infrastructure can > be flaky because we have more of them, rather than focusing on one > system of choice and making its stability the top priority? Definitively! Making it stable should be a priority. But until it actually is stable, I propose we keep the redundancy.When we are on the topic of stability, we should get together sometime and review your current list of problems to identify the most pressing ones.
Most of the infra failures are solved fast. It is the amount of them that worries me. In the last 4 weeks, I've encountered 18 CI infrastructure problems (technically 17, one was encountered by @ksurma). That's bad.
My conclusions:
- the CI is highly unstable and now we have the data to prove that - if I stop sending pull requests, it might get better, apparently, it always happens to me
I guess I will schedule some mtg together with all representatives? Or should we wait a bit more?
Does adding more meetings make it more stable? :P Happy to meet with you, ping me off list.
Hi,
On Wed, Mar 9, 2022 at 12:22 PM Miro Hrončok mhroncok@redhat.com wrote:
On 09. 03. 22 12:00, Miroslav Vadkerti wrote:
On Wed, Mar 9, 2022 at 11:57 AM Miro Hrončok <mhroncok@redhat.com mailto:mhroncok@redhat.com> wrote:
On 15. 02. 22 22:51, Jan Pazdziora wrote: > On Mon, Feb 07, 2022 at 09:20:08AM +0100, Miro Hrončok wrote: >> Hello CI folks! >> >> Several times somebody asked me "Why there are two CI jobs inFedora
>> dist-git PRs, when the Zuul's rpm-test and the *Fedora CI -dist-git test*
>> run the exact same tests?" >> >> My answers so far was: "I don't know why it happened, but now itworks as a
>> nice backup plan. It is very common that one of them breaks [1].When you
>> see an unreadable infrastructure failure [2] at one of them, youhave a
>> pretty good chance tat the other one would still work." > > Doesn't this however reinforce the notion that infrastructure can > be flaky because we have more of them, rather than focusing on one > system of choice and making its stability the top priority? Definitively! Making it stable should be a priority. But until itactually is
stable, I propose we keep the redundancy.When we are on the topic of stability, we should get together sometime
and
review your current list of problems to identify the most pressing ones.
Most of the infra failures are solved fast. It is the amount of them that worries me. In the last 4 weeks, I've encountered 18 CI infrastructure problems (technically 17, one was encountered by @ksurma). That's bad.
My conclusions:
- the CI is highly unstable and now we have the data to prove that
+1
- if I stop sending pull requests, it might get better, apparently, it always happens to me
pls don't
I guess I will schedule some mtg together with all representatives? Or
should
we wait a bit more?
Does adding more meetings make it more stable? :P Happy to meet with you, ping me off list.
I will try to make some summary out of it myself first. Wanted to do that in more people, but whatever :)
Lets see what comes our from it ..
Will let you know once I am done.
Best regards, /M
-- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok
If there are design choices such as this:
https://pagure.io/fedora-ci/general/issue/329#comment-785660
Then I'll prefer different CI to Zuul.
Vít
Dne 07. 02. 22 v 9:20 Miro Hrončok napsal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59
On Tue, Mar 15, 2022 at 3:59 PM Vít Ondruch vondruch@redhat.com wrote:
If there are design choices such as this:
https://pagure.io/fedora-ci/general/issue/329#comment-785660
Then I'll prefer different CI to Zuul.
Wait, ideally, your PR should be mergeable to the target branch right?
Hmm, seems this needs wider discussion, the same approach Packit is taking, and actually is considered best practice in my POW :(
Best regards, /M
Vít
Dne 07. 02. 22 v 9:20 Miro Hrončok napsal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Tue, Mar 15, 2022 at 11:06 AM Miroslav Vadkerti mvadkert@redhat.com wrote:
On Tue, Mar 15, 2022 at 3:59 PM Vít Ondruch vondruch@redhat.com wrote:
If there are design choices such as this:
https://pagure.io/fedora-ci/general/issue/329#comment-785660
Then I'll prefer different CI to Zuul.
Wait, ideally, your PR should be mergeable to the target branch right?
Hmm, seems this needs wider discussion, the same approach Packit is taking, and actually is considered best practice in my POW :(
Best practice is to actually have both pre-merge and post-merge tests. You want to see the impact of your change, and you want to see the impact of merging your change.
On Tue, Mar 15, 2022 at 4:21 PM Neal Gompa ngompa13@gmail.com wrote:
On Tue, Mar 15, 2022 at 11:06 AM Miroslav Vadkerti mvadkert@redhat.com wrote:
On Tue, Mar 15, 2022 at 3:59 PM Vít Ondruch vondruch@redhat.com wrote:
If there are design choices such as this:
https://pagure.io/fedora-ci/general/issue/329#comment-785660
Then I'll prefer different CI to Zuul.
Wait, ideally, your PR should be mergeable to the target branch right?
Hmm, seems this needs wider discussion, the same approach Packit is
taking, and actually
is considered best practice in my POW :(
Best practice is to actually have both pre-merge and post-merge tests. You want to see the impact of your change, and you want to see the impact of merging your change.
I guess I work on too small projects :)
-- 真実はいつも一つ!/ Always, there's only one truth! _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Dne 15. 03. 22 v 16:05 Miroslav Vadkerti napsal(a):
On Tue, Mar 15, 2022 at 3:59 PM Vít Ondruch vondruch@redhat.com wrote:
If there are design choices such as this: https://pagure.io/fedora-ci/general/issue/329#comment-785660 Then I'll prefer different CI to Zuul.Wait, ideally, your PR should be mergeable to the target branch right?
The PR is long running WIP PR (it should be open ideally for the whole timespan of one Ruby development cycle). I open the PR quite early to follow the upstream development, but keep rebasing and what not would be too painful and it would not certainly make sense to include all interim steps into rawhide, while they are very useful for reference. So in the end, the final product is something between `fedpkg import` and `git squash`.
Vít
Hmm, seems this needs wider discussion, the same approach Packit is taking, and actually is considered best practice in my POW :(
Best regards, /M
Vít Dne 07. 02. 22 v 9:20 Miro Hrončok napsal(a): > Hello CI folks! > > Several times somebody asked me "Why there are two CI jobs in Fedora > dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git > test* run the exact same tests?" > > My answers so far was: "I don't know why it happened, but now it works > as a nice backup plan. It is very common that one of them breaks [1]. > When you see an unreadable infrastructure failure [2] at one of them, > you have a pretty good chance tat the other one would still work." > > With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this > benefit would be lost. We would just have two CI tests runs where both > of them have infrastructure failures at the same times (see for > example [3]), both of them run on the same system, and both of them > run the same tests. > > Could you please consider not doing that, at least until the overall > stability of the CI improves? > > Thanks, > > [1] https://pagure.io/fedora-ci/general/issue/44 > [2] https://pagure.io/fedora-ci/general/issue/43 > [3] https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list --ci@lists.fedoraproject.org To unsubscribe send an email toci-leave@lists.fedoraproject.org Fedora Code of Conduct:https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines:https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it:https://pagure.io/fedora-infrastructure
On Tue, Mar 15, 2022 at 4:24 PM Vít Ondruch vondruch@redhat.com wrote:
Dne 15. 03. 22 v 16:05 Miroslav Vadkerti napsal(a):
On Tue, Mar 15, 2022 at 3:59 PM Vít Ondruch vondruch@redhat.com wrote:
If there are design choices such as this:
https://pagure.io/fedora-ci/general/issue/329#comment-785660
Then I'll prefer different CI to Zuul.
Wait, ideally, your PR should be mergeable to the target branch right?
The PR is long running WIP PR (it should be open ideally for the whole timespan of one Ruby development cycle). I open the PR quite early to follow the upstream development, but keep rebasing and what not would be too painful and it would not certainly make sense to include all interim steps into rawhide, while they are very useful for reference. So in the end, the final product is something between `fedpkg import` and `git squash`.
I see, thanks for sharing the use case. Makes sense to me, let's see what Zuul upstream will say.
Vít
Hmm, seems this needs wider discussion, the same approach Packit is taking, and actually is considered best practice in my POW :(
Best regards, /M
Vít
Dne 07. 02. 22 v 9:20 Miro Hrončok napsal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3]
https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Correct, we "copied" this behaviour from Zuul with an exception of making it configurable :)
On Tue, Mar 15, 2022 at 4:06 PM Miroslav Vadkerti mvadkert@redhat.com wrote:
On Tue, Mar 15, 2022 at 3:59 PM Vít Ondruch vondruch@redhat.com wrote:
If there are design choices such as this:
https://pagure.io/fedora-ci/general/issue/329#comment-785660
Then I'll prefer different CI to Zuul.
Wait, ideally, your PR should be mergeable to the target branch right?
Hmm, seems this needs wider discussion, the same approach Packit is taking, and actually is considered best practice in my POW :(
Best regards, /M
Vít
Dne 07. 02. 22 v 9:20 Miro Hrončok napsal(a):
Hello CI folks!
Several times somebody asked me "Why there are two CI jobs in Fedora dist-git PRs, when the Zuul's rpm-test and the *Fedora CI - dist-git test* run the exact same tests?"
My answers so far was: "I don't know why it happened, but now it works as a nice backup plan. It is very common that one of them breaks [1]. When you see an unreadable infrastructure failure [2] at one of them, you have a pretty good chance tat the other one would still work."
With the idea to switch Zuul to the Testing Farm (rpm-tmt-test), this benefit would be lost. We would just have two CI tests runs where both of them have infrastructure failures at the same times (see for example [3]), both of them run on the same system, and both of them run the same tests.
Could you please consider not doing that, at least until the overall stability of the CI improves?
Thanks,
[1] https://pagure.io/fedora-ci/general/issue/44 [2] https://pagure.io/fedora-ci/general/issue/43 [3]
https://src.fedoraproject.org/rpms/python-virtualenv/pull-request/59 _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
-- Miroslav Vadkerti :: Senior Principal QE :: Testing Farm / Linux QE IRC mvadkert #tft #tmt #osci :: Mobile +420 773 944 252 Remote Czech Republic :: Red Hat Czech s.r.o