Hello,
I would like to introduce a CI/CD workflow for Fedora distgits around Pull Requests my team and I have started to develop.
We chose to use Zuul [1] (the CI engine) to implement this workflow because we have strong experience with it and some of its features could help a lot in the packaging context. To list some of them, Zuul is cross-repository aware, meaning that dependencies between Pull-Requests can be defined (the job workspace is populated with the dependent changes). Zuul provides a way to share artifacts between jobs meaning a parent job can do a scratch build on Koji and child jobs can use the built artifacts (rpms) to run various validations. Also Zuul comes with a component called Nodepool that helps handle VMs or containers to manage F30/31/Rawhide nodes for jobs. A more exhaustive list of features is available here [2].
For a couple of months, we have been experimenting with packaging and validation jobs for Fedora's distgits. The idea is to build a flexible packaging workflow for packagers willing to use Pull Requests on Pagure. The Pull Request system offers a place to wire Zuul jobs with the PR life cycle. Indeed, jobs are triggered based on the Pagure Pull Request status (PR opened/changed/merged). At the moment, the most complete workflow we have implemented is as follows:
- When a PR is opened, Zuul runs a Koji scratch build job, then in parallel, it runs a linter, rpminspect, and test (embedded functional tests/STI) jobs. - When a PR is approved, (when the metadata tag "gateit" is set by one of the distgit admins) and the CI status is green, Zuul merges the PR. - When the PR is closed and merged, Zuul runs the regular Koji build.
Another great feature of Zuul is how changes to CI configuration can be done. In fact, anyone can propose changes via Pull Request to a Zuul job, a project pipeline template, ... and see that change run without affecting the rest of the CI. This makes the CI more robust and more user friendly as everyone can be involved in the CI configuration.
To give a bit of insight about what my team and I are doing: we maintain https://softwarefactory-project.io and https://review.rdoproject.org that both use Zuul to provide the CI for more than 1800 repositories (~95% of distgits). Our instance of Zuul runs (among others) the Pagure driver [3] for pagure.io and src.fedoraproject.org.
For the moment, just three distgits are configured to let Zuul manage the packaging workflow via PR, but we are looking for early adopters to experiment and help improve the jobs and workflows. The process to attach a distgit on Zuul is described here [4]. Having Zuul manage the PR workflow is not incompatible/conflicting with a regular direct push workflow.
We would be really happy to help anyone willing to be involved :)
Fabien
[1]: https://zuul-ci.org/ [2]: https://fedoraproject.org/wiki/Zuul-based-ci#What_is_Zuul.2FNodepool [3]: https://zuul-ci.org/docs/zuul/admin/drivers/pagure.html [4]: https://fedoraproject.org/wiki/Zuul-based-ci#How_to_Zuul_attach_a_Pagure_rep...
Hi Fabien,
I've got at least 800 packages as candidates for this which are tied together quite a bit. How do I test this?
On Wed, Nov 13, 2019, 10:17 Fabien Boucher fboucher@redhat.com wrote:
Hello,
I would like to introduce a CI/CD workflow for Fedora distgits around Pull Requests my team and I have started to develop.
We chose to use Zuul [1] (the CI engine) to implement this workflow because we have strong experience with it and some of its features could help a lot in the packaging context. To list some of them, Zuul is cross-repository aware, meaning that dependencies between Pull-Requests can be defined (the job workspace is populated with the dependent changes). Zuul provides a way to share artifacts between jobs meaning a parent job can do a scratch build on Koji and child jobs can use the built artifacts (rpms) to run various validations. Also Zuul comes with a component called Nodepool that helps handle VMs or containers to manage F30/31/Rawhide nodes for jobs. A more exhaustive list of features is available here [2].
For a couple of months, we have been experimenting with packaging and validation jobs for Fedora's distgits. The idea is to build a flexible packaging workflow for packagers willing to use Pull Requests on Pagure. The Pull Request system offers a place to wire Zuul jobs with the PR life cycle. Indeed, jobs are triggered based on the Pagure Pull Request status (PR opened/changed/merged). At the moment, the most complete workflow we have implemented is as follows:
- When a PR is opened, Zuul runs a Koji scratch build job, then in
parallel, it runs a linter, rpminspect, and test (embedded functional tests/STI) jobs.
- When a PR is approved, (when the metadata tag "gateit" is set by one of
the distgit admins) and the CI status is green, Zuul merges the PR.
- When the PR is closed and merged, Zuul runs the regular Koji build.
Another great feature of Zuul is how changes to CI configuration can be done. In fact, anyone can propose changes via Pull Request to a Zuul job, a project pipeline template, ... and see that change run without affecting the rest of the CI. This makes the CI more robust and more user friendly as everyone can be involved in the CI configuration.
To give a bit of insight about what my team and I are doing: we maintain https://softwarefactory-project.io and https://review.rdoproject.org that both use Zuul to provide the CI for more than 1800 repositories (~95% of distgits). Our instance of Zuul runs (among others) the Pagure driver [3] for pagure.io and src.fedoraproject.org.
For the moment, just three distgits are configured to let Zuul manage the packaging workflow via PR, but we are looking for early adopters to experiment and help improve the jobs and workflows. The process to attach a distgit on Zuul is described here [4]. Having Zuul manage the PR workflow is not incompatible/conflicting with a regular direct push workflow.
We would be really happy to help anyone willing to be involved :)
Fabien
CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org
Hi Igor,
On Wed, Nov 13, 2019 at 10:29 AM Igor Gnatenko < ignatenkobrain@fedoraproject.org> wrote:
I've got at least 800 packages as candidates for this which are tied together quite a bit. How do I test this?
I would suggest to start with few packages first.
But yes, for more packages this need to be automated. I don't have yet the tooling to onboard projects automatically but at a first glance, the Pagure API provides endpoints to update project ACLs and options so it should be possible to update automatically project settings as required here [1]. However for the PR approval part it seems the endpoint to manage allowed PR metadata tags is missing.
The rest is simply YAML files to edit and two PRs to open. This can be done automatically as well.
[1]: https://fedoraproject.org/wiki/Zuul-based-ci#Configure_the_repository_for_Zu...
On Wed, Nov 13, 2019 at 6:23 AM Fabien Boucher fboucher@redhat.com wrote:
Hi Igor,
On Wed, Nov 13, 2019 at 10:29 AM Igor Gnatenko ignatenkobrain@fedoraproject.org wrote:
I've got at least 800 packages as candidates for this which are tied together quite a bit. How do I test this?
I would suggest to start with few packages first.
But yes, for more packages this need to be automated. I don't have yet the tooling to onboard projects automatically but at a first glance, the Pagure API provides endpoints to update project ACLs and options so it should be possible to update automatically project settings as required here [1]. However for the PR approval part it seems the endpoint to manage allowed PR metadata tags is missing.
The rest is simply YAML files to edit and two PRs to open. This can be done automatically as well.
Why does zuul need to be an admin on the repository?
Hi Neal,
On Wed, Nov 13, 2019 at 12:40 PM Neal Gompa ngompa13@gmail.com wrote:
Why does zuul need to be an admin on the repository?
That's a good question. Ideally the commit access would have only be needed (Zuul is also a gating system, it merges the code) but dealing with the events and API brings some difficulties at authentication level. Here is the explanation. Zuul needs to receive Pull Request and Git repo events but also it needs to be able to act on the PR via the API. To receive events Zuul relies on the Pagure Web Hook feature, Zuul serves an HTTP endpoint that Pagure uses to send payloads in case of events. Payloads need to be authenticated, to do so Zuul needs to know the Web Hook token configured in Pagure in the repository settings. To use the API Zuul needs the repository API key. Both the Web Hook Token and the API Key are unique per repository on Pagure. For each configured Pagure repository, Zuul will discover the Web Hook Token and create/reuse an API key via the Pagure API (connector endpoint) and this requires admin right on the related repository.
I'm not aware of other ready to use solutions for that use case. For instance, to mitigate this, in the future Pagure could provide another user role level with commit access + access to the connector endpoint [1]. In fact having this would ease third party application integration with Pagure. For instance on Github, there is that concept of application and Zuul relies on it to integrate easily with Github repositories.
I hope my explanation makes sense :)
On Wed, 2019-11-13 at 10:17 +0100, Fabien Boucher wrote:
Hello,
I would like to introduce a CI/CD workflow for Fedora distgits around Pull Requests my team and I have started to develop.
Are the results of these tests reported to resultsdb? Does this flow publish messages to fedora-messaging or fedmsg? Are either of those things done in ways that are compatible with the several existing implementations of "test stuff for a Fedora build/update/compose" (Taskotron, "the pipeline", openQA, autocloud...)
See https://pagure.io/fedora-ci/messages .
Hello,
On Tue, Nov 19, 2019 at 8:07 PM Adam Williamson adamwill@fedoraproject.org wrote:
On Wed, 2019-11-13 at 10:17 +0100, Fabien Boucher wrote:
Hello,
I would like to introduce a CI/CD workflow for Fedora distgits around Pull Requests my team and I have started to develop.
Are the results of these tests reported to resultsdb? Does this flow publish messages to fedora-messaging or fedmsg? Are either of those things done in ways that are compatible with the several existing implementations of "test stuff for a Fedora build/update/compose" (Taskotron, "the pipeline", openQA, autocloud...)
Just curious, is the whole "test stuff for a Fedora build/update/compose ..." documented somewhere?
Thanks, Michal
See https://pagure.io/fedora-ci/messages .
Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org
On Wed, 2019-11-20 at 08:57 +0100, Michal Srb wrote:
Hello,
On Tue, Nov 19, 2019 at 8:07 PM Adam Williamson adamwill@fedoraproject.org wrote:
On Wed, 2019-11-13 at 10:17 +0100, Fabien Boucher wrote:
Hello,
I would like to introduce a CI/CD workflow for Fedora distgits around Pull Requests my team and I have started to develop.
Are the results of these tests reported to resultsdb? Does this flow publish messages to fedora-messaging or fedmsg? Are either of those things done in ways that are compatible with the several existing implementations of "test stuff for a Fedora build/update/compose" (Taskotron, "the pipeline", openQA, autocloud...)
Just curious, is the whole "test stuff for a Fedora build/update/compose ..." documented somewhere?
It's all documented *somewhere*, but there isn't exactly a central Here Are All The Things That Test Stuff, mainly because there isn't really a central team or plan or vision or anything. 'Fedora CI' is one thing run by one bunch of people, Taskotron and openQA are both run by Fedora QA (so we do have some pages which cover both and the relation between them), autocloud was maintained by someone else and is now not really maintained by anyone, and now this thing seems to have come along from yet another bunch of folks...
Hi, Adam,
On Wed, Nov 20, 2019 at 8:30 PM Adam Williamson adamwill@fedoraproject.org wrote:
On Wed, 2019-11-20 at 08:57 +0100, Michal Srb wrote:
Hello,
On Tue, Nov 19, 2019 at 8:07 PM Adam Williamson <
adamwill@fedoraproject.org>
wrote:
On Wed, 2019-11-13 at 10:17 +0100, Fabien Boucher wrote:
Hello,
I would like to introduce a CI/CD workflow for Fedora distgits around Pull Requests my team and I have started to develop.
Are the results of these tests reported to resultsdb? Does this flow publish messages to fedora-messaging or fedmsg? Are either of those things done in ways that are compatible with the several existing implementations of "test stuff for a Fedora build/update/compose" (Taskotron, "the pipeline", openQA, autocloud...)
Just curious, is the whole "test stuff for a Fedora build/update/compose ..." documented somewhere?
It's all documented *somewhere*, but there isn't exactly a central Here Are All The Things That Test Stuff, mainly because there isn't really a central team or plan or vision or anything. 'Fedora CI' is one thing run by one bunch of people, Taskotron and openQA are both run by Fedora QA (so we do have some pages which cover both and the relation between them), autocloud was maintained by someone else and is now not really maintained by anyone, and now this thing seems to have come along from yet another bunch of folks...
To clarify, we (as "Fedora CI" group) and Zuul team have aligned our goals at Flock: Zuul team takes the task of pull requests verification and pre-merge check, while Fedora CI focus is the Gating and tests which triggered on builds in Koji after the merge to dist-git.
Unlike conventional CI systems, Zuul is much better integrated with the code review process. For example Zuul can manage dependent pull requests across different projects, even different Git Forges, or can provide merging and promotion actions, which our current CI implementation does not support.
We'd like to give Fedora community a chance to use this advanced system and provide feedback on it. And hopefully it will increase the adoption of pull-requests in the packager workflow.
Since our post-merge infrastructure and workflows are much more complex, we go there with the custom system (Bodhi, Greenwave, ResultsDB..) and continue our work in that direction. Zuul initiative is independent from it (at least for now).
We don't want to duplicate the content of our tests though. That's why we agreed on aligning our test interfaces. For example, Zuul pipeline supports dist-git tests (STI) and runs them the same way as Fedora CI does.
It maybe a good idea to align the interface for generic tests as well. Something we should research, probably.
On Wed, 2019-11-20 at 21:15 +0100, Aleksandra Fedorova wrote:
Since our post-merge infrastructure and workflows are much more complex, we go there with the custom system (Bodhi, Greenwave, ResultsDB..) and continue our work in that direction. Zuul initiative is independent from it (at least for now).
I'd say it should still publish messages so other things can see what it's doing, and it should still publish results to resultsdb so other things can look them up.
Hi Adam,
On Tue, Nov 19, 2019 at 8:07 PM Adam Williamson adamwill@fedoraproject.org wrote:
Are the results of these tests reported to resultsdb? Does this flow publish messages to fedora-messaging or fedmsg?
Results are not published via fedora-messaging neither on resultsdb. But I don't see technical issue to do it. A Zuul job is composed of pre-run, run, and post-run playbook. The post-run playbook could be used to run an Ansible role dedicated to the message publication.
How to publish in resultdb ? Is resultdb listening to fedora-messaging bus ? Is there any authentication required to send on the bus ?
Are either of those things done in ways that are compatible with the several existing implementations of "test stuff for a Fedora build/update/compose" (Taskotron, "the pipeline", openQA, autocloud...)
Yes, as much as we can, but for sure there are rooms for improvements and we'll be happy to receive guidance from Fedora CI folks. This Zuul jobs workflow integrates with Pagure, Koji and have some form of compatibility with the standard-test-roles. For Taskotron and openQA, I understand they are job runners, Zuul CI also handles that step of the process so I don't see how to integrate with them.
Hi,
To give a concrete example, here is a PR that has been validated by Zuul, approved by the package maintainer and published by Zuul on Koji.
https://src.fedoraproject.org/rpms/nodepool/pull-request/6
Four jobs have run to validate the PR (triggered at PR creation/update):
- rpm-scratch-build: Package build as scratch on Koji - rpm-linter: Run the rpm-lint command on packages built by Koji - rpm-rpminspect: Run the rpminspect command on packages built by Koji - rpm-test: Run the included tests/tests.yml functional tests on a Fedora Rawhide VMs
Then the maintainer, after a look at the test results, flagged the PR with the 'gateit' flag to trigger the publication part of the workflow. Then Zuul:
- merged the PR - run the job rpm-build to build on Koji
More details on this wiki page: https://fedoraproject.org/wiki/Zuul-based-ci
On Wed, Nov 20, 2019 at 6:09 AM Fabien Boucher fboucher@redhat.com wrote:
Hi,
To give a concrete example, here is a PR that has been validated by Zuul, approved by the package maintainer and published by Zuul on Koji.
https://src.fedoraproject.org/rpms/nodepool/pull-request/6
Four jobs have run to validate the PR (triggered at PR creation/update):
- rpm-scratch-build: Package build as scratch on Koji
- rpm-linter: Run the rpm-lint command on packages built by Koji
- rpm-rpminspect: Run the rpminspect command on packages built by Koji
- rpm-test: Run the included tests/tests.yml functional tests on a Fedora Rawhide VMs
Then the maintainer, after a look at the test results, flagged the PR with the 'gateit' flag to trigger the publication part of the workflow. Then Zuul:
- merged the PR
- run the job rpm-build to build on Koji
More details on this wiki page: https://fedoraproject.org/wiki/Zuul-based-ci
Can we have Zuul support fast-forward merges? Merge commits are really irritating for managing multi-branch (i.e. current Dist-Git package maint) workflows.
Hi Neal,
On Wed, Nov 20, 2019 at 1:45 PM Neal Gompa ngompa13@gmail.com wrote:
Can we have Zuul support fast-forward merges? Merge commits are really irritating for managing multi-branch (i.e. current Dist-Git package maint) workflows.
Yes the "Always merge" option I suggest in the Pagure project settings is optional. By default, internally, to prepare repositories to be tested, Zuul uses the git merge strategy to integrate PRs on the target branches. Keeping the commit integration strategy similar between Zuul and Pagure project is recommended but apart for specific jobs/use-cases having them different won't cause any issues.
Fabien Boucher fboucher@redhat.com writes:
Hi Neal,
On Wed, Nov 20, 2019 at 1:45 PM Neal Gompa ngompa13@gmail.com wrote:
Can we have Zuul support fast-forward merges? Merge commits are really irritating for managing multi-branch (i.e. current Dist-Git package maint) workflows.
Yes the "Always merge" option I suggest in the Pagure project settings is optional. By default, internally, to prepare repositories to be tested, Zuul uses the git merge strategy to integrate PRs on the target branches. Keeping the commit integration strategy similar between Zuul and Pagure project is recommended but apart for specific jobs/use-cases having them different won't cause any issues.
Zuul's "merge" merge strategy corresponds to the default git merge strategy ("recursive") which will perform a fast-forward if no merge is required. This is also equivalent to the default in Pagure. I think that if you set that strategy in Zuul, and disable "Always merge" in Pagure, you will have matching behavior.
[1] https://zuul-ci.org/docs/zuul/user/config.html#attr-project.merge-mode [2] https://docs.pagure.org/pagure/usage/project_settings.html
-Jim
On Wed, 2019-11-20 at 11:52 +0100, Fabien Boucher wrote:
Hi Adam,
On Tue, Nov 19, 2019 at 8:07 PM Adam Williamson adamwill@fedoraproject.org wrote:
Are the results of these tests reported to resultsdb? Does this flow publish messages to fedora-messaging or fedmsg?
Results are not published via fedora-messaging neither on resultsdb. But I don't see technical issue to do it. A Zuul job is composed of pre-run, run, and post-run playbook. The post-run playbook could be used to run an Ansible role dedicated to the message publication.
How to publish in resultdb ?
If you can work in Python, there's a Python client library that makes this quite easy:
https://pagure.io/taskotron/resultsdb_api http://docs.resultsdb20.apiary.io/
There's also a higher-level reporting library I wrote which sort of helps/forces you to comply with some 'conventions' for the *format* of the result:
https://pagure.io/taskotron/resultsdb_conventions
though for now it only really defines conventions for results for composes and updates, not for package builds (I keep meaning to go back and revise it to be more in line with the CI Messages message spec, but haven't had the time). This is used by the openQA and autocloud reporters, meaning their results are always in the same format:
https://pagure.io/fedora-qa/fedora_openqa/blob/master/f/fedora_openqa/report... https://pagure.io/fedora-qa/autocloudreporter
Taskotron and the CI Pipeline links for this:
https://pagure.io/taskotron/libtaskotron/blob/develop/f/libtaskotron/directi... https://pagure.io/ci-resultsdb-listener/blob/master/f/resultsdb_listener
Sadly even though they are often testing exactly the same thing, the formats used by Taskotron and the CI Pipeline are not the same. I've been trying to push for more consistency between results for some time now (resultsdb_conventions being one of my efforts in this direction) but it can be a bit difficult :/ At present we wind up burdening Greenwave and/or Bodhi with trying to interpret the differently- formatted results from different systems.
Here are sample results from each of the systems, for reference:
Taskotron: https://taskotron.fedoraproject.org/resultsdb/results/35577393 CI Pipeline: https://taskotron.fedoraproject.org/resultsdb/results/35577394 autocloud: https://taskotron.fedoraproject.org/resultsdb/results/35534299 openQA (compose): https://taskotron.fedoraproject.org/resultsdb/results/35576348 openQA (update): https://taskotron.fedoraproject.org/resultsdb/results/35545828
Note the Taskotron and CI Pipeline results are for the *same Koji build*, but look quite different (the Taskotron result includes the build NVR as the 'item' but the CI Pipeline result calls it 'nvr' or 'original_spec_nvr', for e.g.). The autocloud and openQA (compose) results are both for tests of the same compose (Fedora-Rawhide- 20191119.n.2) and as you can see they're pretty similar. The openQA (update) result is how results filed via resultsdb_conventions for a test of a specific *Bodhi update* look.
So far as authentication for sending to resultsdb goes, at least at one point we were simply doing this by IP whitelist :/ I know puiterwijk is/was working on implementing Ipsilon-based auth for resultsdb instead, but I don't know if that got fully baked and if anything is actually using it yet. If not, you just have to get the IP of the system that will actually be sending the results to resultsdb added to the whitelist, this is all infra ansible stuff.
Is resultdb listening to fedora-messaging bus ?
Not directly, no. Several of the reporters do work this way, though - they're fedmsg or fedora-messaging consumers that listen out for messages indicating a test has completed and then construct a result to submit to resultsdb. But this is not done directly by resultsdb itself. Note resultsdb does *publish* to the bus: each time a new result is submitted, it publishes a message. In fact it publishes two, one in an old format on the topic 'taskotron.result.new', and one in a newer format on the topic 'resultsdb.result.new'.
Is there any authentication required to send on the bus ?
Yes. If you're deploying in Fedora infra this can all be handled via the infra ansible bits - see e.g. https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/autoclo... , https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/playbooks/gro... , https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/inventory/gro... , https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/inventory/gro... which together are most of the bits that handle deployment of autocloudreporter in infra, including all the queue / auth handling. If you're deploying outside of Fedora infra you'd need to set it up another way, the shortish version is you need to ask infra to issue a key and certificate for your publisher. (fedmsg handled it a bit differently, but any new thing should be implemented as a fedora- messaging publisher, we're trying to retire fedmsg).
Are either of those things done in ways that are compatible with the several existing implementations of "test stuff for a Fedora build/update/compose" (Taskotron, "the pipeline", openQA, autocloud...)
Yes, as much as we can, but for sure there are rooms for improvements and we'll be happy to receive guidance from Fedora CI folks. This Zuul jobs workflow integrates with Pagure, Koji and have some form of compatibility with the standard-test-roles. For Taskotron and openQA, I understand they are job runners, Zuul CI also handles that step of the process so I don't see how to integrate with them.
What I meant by 'compatible' is that not these things should be integrated exactly but that, ideally, they should all publish results to resultsdb in a similar and standardized format, and they should all publish message bus messages in a similar and standardized format. At present we're not in that ideal world even for existing systems, but it'd be best not to make things worse :)
Ideally it'd probably be best if this new system could publish resultsdb results in a format that's similar to *either* Taskotron's *or* the CI Pipeline's (or even that's a superset of both), and publish to fedora-messaging following the 'CI Messages' spec:
https://pagure.io/fedora-ci/messages
Again that spec is sadly not universally adopted yet; openQA's messages should be compliant with it, but those published by Taskotron and the CI Pipeline are not (yet). Taskotron doesn't really publish any messages for 'test queued', 'test running' etc. - AFAICS the *only* messages you get for Taskotron tests are the 'taskotron.result.new' and 'resultsdb.result.new' messages published by ResultsDB (and note that you'll get messages on those topics even for results send to ResultsDB from other systems, not from Taskotron). The CI Pipeline publishes various lifecycle messages, but has not been brought into compliance with the CI Messages spec yet. Samples again:
Taskotron (taskotron.result.new): https://apps.fedoraproject.org/datagrepper/id?id=2019-fef416a2-54bb-4bc5-817... Taskotron (resultsdb.result.new): https://apps.fedoraproject.org/datagrepper/id?id=2019-5075a13c-927f-41c4-bae... CI Pipeline (ci.pipeline.allpackages-build.image.queued): https://apps.fedoraproject.org/datagrepper/id?id=2019-941c906c-faa2-42bd-bad... CI Pipeline (ci.pipeline.allpackages-build.image.running): https://apps.fedoraproject.org/datagrepper/id?id=2019-98ac76bf-f1ab-4f4c-8bc... CI Pipeline (ci.pipeline.allpackages-build.image.complete): https://apps.fedoraproject.org/datagrepper/id?id=2019-5794652d-138d-44ab-95f... openQA (ci.productmd-compose.test.queued) (CI Messages format): https://apps.fedoraproject.org/datagrepper/id?id=2019-2a138874-fdbe-44b1-a94... openQA (ci.productmd-compose.test.complete) (CI Messages format): https://apps.fedoraproject.org/datagrepper/id?id=2019-38cc57d3-bbf9-4454-ad2... openQA (openqa.job.done) (native format): https://apps.fedoraproject.org/datagrepper/id?id=2019-93b8b203-9b9e-4782-bea... autocloud: https://apps.fedoraproject.org/datagrepper/raw?category=autocloud
Those are only samples, the systems publish on quite a few more topics covering different flows and different stages in each flow. openQA publishes both messages on ci.* topics in CI Messages-compliant form, and messages on openqa.* topics in a format of its own (this is for backwards compatibility as the openqa.* messages existed before CI Messages showed up). CI Pipeline publishes messages on ci.* topics that are *not* CI Messages-compliant. autocloud publishes messages on its own topic and in its own format. Yes I know this is all a mess; that's why I'd like to try and avoid it becoming *even more of a mess* :)
Note I wrote a Python wrapper for CI Messages which should make it convenient to use the message schemas in Python code:
https://pagure.io/fedora-qa/python-ci_messages/
I haven't really used it a lot in anger yet, though, because the only thing I maintain that publishes in CI Messages-compliant format is written in perl so it can't use it :/
Hi Adam,
Thanks for the long and thorough description of the current situation.
Some comments in-line.
On Wed, Nov 20, 2019 at 9:33 PM Adam Williamson adamwill@fedoraproject.org wrote:
On Wed, 2019-11-20 at 11:52 +0100, Fabien Boucher wrote:
Hi Adam,
On Tue, Nov 19, 2019 at 8:07 PM Adam Williamson <
adamwill@fedoraproject.org>
wrote:
Are the results of these tests reported to resultsdb? Does this flow publish messages to fedora-messaging or fedmsg?
Results are not published via fedora-messaging neither on resultsdb. But I don't see technical issue to do it. A Zuul job is composed of pre-run, run, and post-run playbook. The post-run playbook could be used to run an Ansible role dedicated to the message publication.
How to publish in resultdb ?
If you can work in Python, there's a Python client library that makes this quite easy:
https://pagure.io/taskotron/resultsdb_api http://docs.resultsdb20.apiary.io/
There's also a higher-level reporting library I wrote which sort of helps/forces you to comply with some 'conventions' for the *format* of the result:
https://pagure.io/taskotron/resultsdb_conventions
though for now it only really defines conventions for results for composes and updates, not for package builds (I keep meaning to go back and revise it to be more in line with the CI Messages message spec, but haven't had the time). This is used by the openQA and autocloud reporters, meaning their results are always in the same format:
https://pagure.io/fedora-qa/fedora_openqa/blob/master/f/fedora_openqa/report... https://pagure.io/fedora-qa/autocloudreporter
Taskotron and the CI Pipeline links for this:
https://pagure.io/taskotron/libtaskotron/blob/develop/f/libtaskotron/directi... https://pagure.io/ci-resultsdb-listener/blob/master/f/resultsdb_listener
Sadly even though they are often testing exactly the same thing, the formats used by Taskotron and the CI Pipeline are not the same. I've been trying to push for more consistency between results for some time now (resultsdb_conventions being one of my efforts in this direction) but it can be a bit difficult :/ At present we wind up burdening Greenwave and/or Bodhi with trying to interpret the differently- formatted results from different systems.
Here are sample results from each of the systems, for reference:
Taskotron: https://taskotron.fedoraproject.org/resultsdb/results/35577393 CI Pipeline: https://taskotron.fedoraproject.org/resultsdb/results/35577394 autocloud: https://taskotron.fedoraproject.org/resultsdb/results/35534299 openQA (compose): https://taskotron.fedoraproject.org/resultsdb/results/35576348 openQA (update): https://taskotron.fedoraproject.org/resultsdb/results/35545828
Note the Taskotron and CI Pipeline results are for the *same Koji build*, but look quite different (the Taskotron result includes the build NVR as the 'item' but the CI Pipeline result calls it 'nvr' or 'original_spec_nvr', for e.g.). The autocloud and openQA (compose) results are both for tests of the same compose (Fedora-Rawhide- 20191119.n.2) and as you can see they're pretty similar. The openQA (update) result is how results filed via resultsdb_conventions for a test of a specific *Bodhi update* look.
It looks like resultsdb is acting as some kind of a document-oriented database where people can store whatever they want. Any reason (except historical ones of course:)) why not to enforce some kind of schema on input? I.e. resultsdb would own the schema and others would need to comply, if they want their results to be used by other services down the road.
The advantage of defining and enforcing the schema on resultsdb side would be much better clarity (in my opinion) and clear ownership. Do you want to add a new CI system to the mix? Cool, just store results in resultsdb, and here's the API/schema. End of the story. No need to fiddle with some external schemas, trying to understand why/how, or hoping that some listener will eventually store the results there for me, ...
So far as authentication for sending to resultsdb goes, at least at one point we were simply doing this by IP whitelist :/ I know puiterwijk is/was working on implementing Ipsilon-based auth for resultsdb instead, but I don't know if that got fully baked and if anything is actually using it yet. If not, you just have to get the IP of the system that will actually be sending the results to resultsdb added to the whitelist, this is all infra ansible stuff.
Is resultdb listening to fedora-messaging bus ?
Not directly, no. Several of the reporters do work this way, though - they're fedmsg or fedora-messaging consumers that listen out for messages indicating a test has completed and then construct a result to submit to resultsdb. But this is not done directly by resultsdb itself. Note resultsdb does *publish* to the bus: each time a new result is submitted, it publishes a message. In fact it publishes two, one in an old format on the topic 'taskotron.result.new', and one in a newer format on the topic 'resultsdb.result.new'.
Is there any authentication required to send on the bus ?
Yes. If you're deploying in Fedora infra this can all be handled via the infra ansible bits - see e.g.
https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/autoclo... ,
https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/playbooks/gro... ,
https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/inventory/gro... ,
https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/inventory/gro... which together are most of the bits that handle deployment of autocloudreporter in infra, including all the queue / auth handling. If you're deploying outside of Fedora infra you'd need to set it up another way, the shortish version is you need to ask infra to issue a key and certificate for your publisher. (fedmsg handled it a bit differently, but any new thing should be implemented as a fedora- messaging publisher, we're trying to retire fedmsg).
Are either of those things done in ways that are compatible with the several existing implementations of "test stuff for a Fedora build/update/compose" (Taskotron, "the pipeline", openQA, autocloud...)
Yes, as much as we can, but for sure there are rooms for improvements and we'll be happy to receive guidance from Fedora CI folks. This Zuul jobs workflow integrates with Pagure, Koji and have some form
of
compatibility with the standard-test-roles. For Taskotron and openQA, I understand they are job runners, Zuul CI also handles that step of the process so I don't see how to integrate with them.
What I meant by 'compatible' is that not these things should be integrated exactly but that, ideally, they should all publish results to resultsdb in a similar and standardized format, and they should all publish message bus messages in a similar and standardized format. At present we're not in that ideal world even for existing systems, but it'd be best not to make things worse :)
Ideally it'd probably be best if this new system could publish resultsdb results in a format that's similar to *either* Taskotron's *or* the CI Pipeline's (or even that's a superset of both), and publish to fedora-messaging following the 'CI Messages' spec:
I completely agree that sending standardized CI messages would be super nice.
Although if Fedora CI systems talk to resultsdb directly, what are the benefits/incentives for migrating them to CI Messages standard? Or in other words, are there services in the infrastructure that actually listen on those raw CI messages? (and cannot just listen on resultsdb notifications).
Thanks, Michal
Again that spec is sadly not universally adopted yet; openQA's messages should be compliant with it, but those published by Taskotron and the CI Pipeline are not (yet). Taskotron doesn't really publish any messages for 'test queued', 'test running' etc. - AFAICS the *only* messages you get for Taskotron tests are the 'taskotron.result.new' and 'resultsdb.result.new' messages published by ResultsDB (and note that you'll get messages on those topics even for results send to ResultsDB from other systems, not from Taskotron). The CI Pipeline publishes various lifecycle messages, but has not been brought into compliance with the CI Messages spec yet. Samples again:
Taskotron (taskotron.result.new): https://apps.fedoraproject.org/datagrepper/id?id=2019-fef416a2-54bb-4bc5-817... Taskotron (resultsdb.result.new): https://apps.fedoraproject.org/datagrepper/id?id=2019-5075a13c-927f-41c4-bae... CI Pipeline (ci.pipeline.allpackages-build.image.queued): https://apps.fedoraproject.org/datagrepper/id?id=2019-941c906c-faa2-42bd-bad... CI Pipeline (ci.pipeline.allpackages-build.image.running): https://apps.fedoraproject.org/datagrepper/id?id=2019-98ac76bf-f1ab-4f4c-8bc... CI Pipeline (ci.pipeline.allpackages-build.image.complete): https://apps.fedoraproject.org/datagrepper/id?id=2019-5794652d-138d-44ab-95f... openQA (ci.productmd-compose.test.queued) (CI Messages format): https://apps.fedoraproject.org/datagrepper/id?id=2019-2a138874-fdbe-44b1-a94... openQA (ci.productmd-compose.test.complete) (CI Messages format): https://apps.fedoraproject.org/datagrepper/id?id=2019-38cc57d3-bbf9-4454-ad2... openQA (openqa.job.done) (native format): https://apps.fedoraproject.org/datagrepper/id?id=2019-93b8b203-9b9e-4782-bea... autocloud: https://apps.fedoraproject.org/datagrepper/raw?category=autocloud
Those are only samples, the systems publish on quite a few more topics covering different flows and different stages in each flow. openQA publishes both messages on ci.* topics in CI Messages-compliant form, and messages on openqa.* topics in a format of its own (this is for backwards compatibility as the openqa.* messages existed before CI Messages showed up). CI Pipeline publishes messages on ci.* topics that are *not* CI Messages-compliant. autocloud publishes messages on its own topic and in its own format. Yes I know this is all a mess; that's why I'd like to try and avoid it becoming *even more of a mess* :)
Note I wrote a Python wrapper for CI Messages which should make it convenient to use the message schemas in Python code:
https://pagure.io/fedora-qa/python-ci_messages/
I haven't really used it a lot in anger yet, though, because the only thing I maintain that publishes in CI Messages-compliant format is written in perl so it can't use it :/ -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org
On Thu, 2019-11-21 at 08:40 +0100, Michal Srb wrote:
It looks like resultsdb is acting as some kind of a document-oriented database where people can store whatever they want. Any reason (except historical ones of course:)) why not to enforce some kind of schema on input? I.e. resultsdb would own the schema and others would need to comply, if they want their results to be used by other services down the road.
Josef Skladanka or Tim Flink could answer that better than me, but AIUI the answer is basically: history. The current ResultsDB is more or less ResultsDB 2.0; ResultsDB 1.0 was much more opinionated about how things interacted with it, and that turned out not to work very well. So 2.0 was intentionally made much simpler to the point where it's essentially just a database for key pairs with almost no strict rules about what a 'result' consists of.
The advantage of defining and enforcing the schema on resultsdb side would be much better clarity (in my opinion) and clear ownership. Do you want to add a new CI system to the mix? Cool, just store results in resultsdb, and here's the API/schema. End of the story. No need to fiddle with some external schemas, trying to understand why/how, or hoping that some listener will eventually store the results there for me, ...
Well, the listeners aren't just sort of random things that show up looking for results to submit :) Most of the message consumers that look for completed tests and forward the results to resultsdb are *maintained by the same teams that maintain those test systems* (this is the case for the openQA one - where it's me - and the CI Pipeline one at least). It just turns out to be a sensible way to do it. Especially if your test system is not written in Python, which neither openQA (perl) nor the pipeline (Java, mostly) are; using a message consumer approach for resultsdb submission means you can write the resultsdb submission bits in Python and use the Python libraries. The exception is autocloudreporter, which is simply because I noticed that autocloud wasn't set up to report results to resultsdb and decided to just fix it myself instead of waiting for someone else to do it. (I actually came up with resultsdb_conventions as a logical extension of sharing code with the openQA reporter when writing autocloudreporter...)
Ideally it'd probably be best if this new system could publish resultsdb results in a format that's similar to *either* Taskotron's *or* the CI Pipeline's (or even that's a superset of both), and publish to fedora-messaging following the 'CI Messages' spec:
I completely agree that sending standardized CI messages would be super nice.
Although if Fedora CI systems talk to resultsdb directly, what are the benefits/incentives for migrating them to CI Messages standard? Or in other words, are there services in the infrastructure that actually listen on those raw CI messages? (and cannot just listen on resultsdb notifications).
Well, the main thing is that the CI Messages spec defines messages for a lot *more* than just "this test completed and generated a result" (which is all you get from a resultsdb item or the message generated when one is created). It has messages for "test has been scheduled", "test is running", "test errored out" and various other things.
The other nice thing about the CI Messages spec is that it's built from the ground up to make the messages from different systems inter- compatible; one of the initial ideas of the whole spec was that the message topics *don't depend on the test system*, so you can find messages for tests of a given 'thing' regardless of which test system they come from. The overall idea is that you can build e.g. a web dashboard which can show you the status of *all* tests for a given, say, compose or update or pull request, regardless of what test system ran them. And this is in fact what it's used for inside RH; there's a thing called the 'CI Dashboard' which does exactly this. It'd be nice to have a Fedora deployment of the dashboard too, but it will only work if test systems actually publish messages in the correct format.
In terms of what things we actually have in Fedora doing useful stuff *right now* do, it's a mishmash, because we haven't got all the systems actually being inter-compatible yet. For e.g., Greenwave polls results from resultsdb directly, and has logic to cope with the different formats produced by the pipeline, Taskotron and openQA (but we could sure make that nicer if we had stronger conventions about resultsdb formats). I have a few things which listen for bus messages from openQA and autocloud; for now these use the 'native' format messages, but if we actually got everything compliant with CI Messages I'd probably change them over. There may well be other stuff out there, I don't know if anyone actually knows for sure "these are all the things that do stuff with resultsdb results or automatic test system bus messages".
Hi Adam,
Thanks for this extended explanation that's really helpful !
On Wed, Nov 20, 2019 at 9:33 PM Adam Williamson adamwill@fedoraproject.org wrote:
On Wed, 2019-11-20 at 11:52 +0100, Fabien Boucher wrote:
Ideally it'd probably be best if this new system could publish resultsdb results in a format that's similar to *either* Taskotron's *or* the CI Pipeline's (or even that's a superset of both), and publish to fedora-messaging following the 'CI Messages' spec:
Yes it completely makes sense to report Zuul jobs status on the Fedora Infra services. Zuul provides a MQTT reporter https://zuul-ci.org/docs/zuul/admin/drivers/mqtt.html that a local gateway (we'll need to write it) could use as input source to transform and forward message to fedora-messaging bus and resultdb following the right format. So I've created this story in our backlog https://teams.fedoraproject.org/project/ci/us/63. Any help is welcome. But before going to far with integration I think we should wait until the system is adopted by some early adopters.