Hi Adam,

Thanks for the long and thorough description of the current situation.

Some comments in-line.


On Wed, Nov 20, 2019 at 9:33 PM Adam Williamson <adamwill@fedoraproject.org> wrote:
On Wed, 2019-11-20 at 11:52 +0100, Fabien Boucher wrote:
> Hi Adam,
>
> On Tue, Nov 19, 2019 at 8:07 PM Adam Williamson <adamwill@fedoraproject.org>
> wrote:
>
> > Are the results of these tests reported to resultsdb? Does this flow
> > publish messages to fedora-messaging or fedmsg?
>
> Results are not published via fedora-messaging neither on resultsdb. But
> I don't see technical issue to do it. A Zuul job is composed of pre-run,
> run,
> and post-run playbook. The post-run playbook could be used to run an
> Ansible role dedicated to the message publication.
>
> How to publish in resultdb ?

If you can work in Python, there's a Python client library that makes
this quite easy:

https://pagure.io/taskotron/resultsdb_api
http://docs.resultsdb20.apiary.io/

There's also a higher-level reporting library I wrote which sort of
helps/forces you to comply with some 'conventions' for the *format* of
the result:

https://pagure.io/taskotron/resultsdb_conventions

though for now it only really defines conventions for results for
composes and updates, not for package builds (I keep meaning to go back
and revise it to be more in line with the CI Messages message spec, but
haven't had the time). This is used by the openQA and autocloud
reporters, meaning their results are always in the same format:

https://pagure.io/fedora-qa/fedora_openqa/blob/master/f/fedora_openqa/report.py#_330
https://pagure.io/fedora-qa/autocloudreporter

Taskotron and the CI Pipeline links for this:

https://pagure.io/taskotron/libtaskotron/blob/develop/f/libtaskotron/directives/resultsdb_directive.py
https://pagure.io/ci-resultsdb-listener/blob/master/f/resultsdb_listener

Sadly even though they are often testing exactly the same thing, the
formats used by Taskotron and the CI Pipeline are not the same. I've
been trying to push for more consistency between results for some time
now (resultsdb_conventions being one of my efforts in this direction)
but it can be a bit difficult :/ At present we wind up burdening
Greenwave and/or Bodhi with trying to interpret the differently-
formatted results from different systems.

Here are sample results from each of the systems, for reference:

Taskotron: https://taskotron.fedoraproject.org/resultsdb/results/35577393
CI Pipeline: https://taskotron.fedoraproject.org/resultsdb/results/35577394
autocloud: https://taskotron.fedoraproject.org/resultsdb/results/35534299
openQA (compose): https://taskotron.fedoraproject.org/resultsdb/results/35576348
openQA (update): https://taskotron.fedoraproject.org/resultsdb/results/35545828

Note the Taskotron and CI Pipeline results are for the *same Koji
build*, but look quite different (the Taskotron result includes the
build NVR as the 'item' but the CI Pipeline result calls it 'nvr' or
'original_spec_nvr', for e.g.). The autocloud and openQA (compose)
results are both for tests of the same compose (Fedora-Rawhide-
20191119.n.2) and as you can see they're pretty similar. The openQA
(update) result is how results filed via resultsdb_conventions for a
test of a specific *Bodhi update* look.

It looks like resultsdb is acting as some kind of a document-oriented database where people can store whatever they want. Any reason (except historical ones of course:)) why not to enforce some kind of schema on input? I.e. resultsdb would own the schema and others would need to comply, if they want their results to be used by other services down the road.

The advantage of defining and enforcing the schema on resultsdb side would be much better clarity (in my opinion) and clear ownership. Do you want to add a new CI system to the mix? Cool, just store results in resultsdb, and here's the API/schema. End of the story. No need to fiddle with some external schemas, trying to understand why/how, or hoping that some listener will eventually store the results there for me, ...
 

So far as authentication for sending to resultsdb goes, at least at one
point we were simply doing this by IP whitelist :/ I know puiterwijk
is/was working on implementing Ipsilon-based auth for resultsdb
instead, but I don't know if that got fully baked and if anything is
actually using it yet. If not, you just have to get the IP of the
system that will actually be sending the results to resultsdb added to
the whitelist, this is all infra ansible stuff.

>  Is resultdb listening to fedora-messaging bus ?

Not directly, no. Several of the reporters do work this way, though -
they're fedmsg or fedora-messaging consumers that listen out for
messages indicating a test has completed and then construct a result to
submit to resultsdb. But this is not done directly by resultsdb itself.
Note resultsdb does *publish* to the bus: each time a new result is
submitted, it publishes a message. In fact it publishes two, one in an
old format on the topic 'taskotron.result.new', and one in a newer
format on the topic 'resultsdb.result.new'.

> Is there any authentication required to send on the bus ?

Yes. If you're deploying in Fedora infra this can all be handled via
the infra ansible bits - see e.g.
https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/autocloudreporter ,
https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/playbooks/groups/openqa.yml#n123 ,
https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/inventory/group_vars/autocloudreporter ,
https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/inventory/group_vars/autocloudreporter_common
which together are most of the bits that handle deployment
of autocloudreporter in infra, including all the queue / auth handling.
If you're deploying outside of Fedora infra you'd need to set it up
another way, the shortish version is you need to ask infra to issue a
key and certificate for your publisher. (fedmsg handled it a bit
differently, but any new thing should be implemented as a fedora-
messaging publisher, we're trying to retire fedmsg).
>
> > Are either of those
> > things done in ways that are compatible with the several existing
> > implementations of "test stuff for a Fedora build/update/compose"
> > (Taskotron, "the pipeline", openQA, autocloud...)
> >
> >
> Yes, as much as we can, but for sure there are rooms for improvements and
> we'll be happy to receive guidance from Fedora CI folks.
> This Zuul jobs workflow integrates with Pagure, Koji and have some form of
> compatibility with the standard-test-roles.
> For Taskotron and openQA, I understand they are job runners, Zuul CI also
> handles
> that step of the process so I don't see how to integrate with them.

What I meant by 'compatible' is that not these things should be
integrated exactly but that, ideally, they should all publish results
to resultsdb in a similar and standardized format, and they should all
publish message bus messages in a similar and standardized format. At
present we're not in that ideal world even for existing systems, but
it'd be best not to make things worse :)

Ideally it'd probably be best if this new system could publish
resultsdb results in a format that's similar to *either* Taskotron's
*or* the CI Pipeline's (or even that's a superset of both), and publish
to fedora-messaging following the 'CI Messages' spec:

https://pagure.io/fedora-ci/messages


I completely agree that sending standardized CI messages would be super nice.

Although if Fedora CI systems talk to resultsdb directly, what are the benefits/incentives for migrating them to CI Messages standard? Or in other words, are there services in the infrastructure that actually listen on those raw CI messages? (and cannot just listen on resultsdb notifications).

Thanks,
Michal
 


Again that spec is sadly not universally adopted yet; openQA's messages
should be compliant with it, but those published by Taskotron and the
CI Pipeline are not (yet). Taskotron doesn't really publish any
messages for 'test queued', 'test running' etc. - AFAICS the *only*
messages you get for Taskotron tests are the 'taskotron.result.new' and
'resultsdb.result.new' messages published by ResultsDB (and note that
you'll get messages on those topics even for results send to ResultsDB
from other systems, not from Taskotron). The CI Pipeline publishes
various lifecycle messages, but has not been brought into compliance
with the CI Messages spec yet. Samples again:

Taskotron (taskotron.result.new): https://apps.fedoraproject.org/datagrepper/id?id=2019-fef416a2-54bb-4bc5-8172-2c477c8228e4&is_raw=true&size=extra-large
Taskotron (resultsdb.result.new): https://apps.fedoraproject.org/datagrepper/id?id=2019-5075a13c-927f-41c4-bae6-26eba46978ea&is_raw=true&size=extra-large
CI Pipeline (ci.pipeline.allpackages-build.image.queued): https://apps.fedoraproject.org/datagrepper/id?id=2019-941c906c-faa2-42bd-badf-68c91169a5f5&is_raw=true&size=extra-large
CI Pipeline (ci.pipeline.allpackages-build.image.running): https://apps.fedoraproject.org/datagrepper/id?id=2019-98ac76bf-f1ab-4f4c-8bc0-888b96ffa628&is_raw=true&size=extra-large
CI Pipeline (ci.pipeline.allpackages-build.image.complete): https://apps.fedoraproject.org/datagrepper/id?id=2019-5794652d-138d-44ab-95f5-1e6f97aa333e&is_raw=true&size=extra-large
openQA (ci.productmd-compose.test.queued) (CI Messages format): https://apps.fedoraproject.org/datagrepper/id?id=2019-2a138874-fdbe-44b1-a941-a3fc2be8b716&is_raw=true&size=extra-large
openQA (ci.productmd-compose.test.complete) (CI Messages format): https://apps.fedoraproject.org/datagrepper/id?id=2019-38cc57d3-bbf9-4454-ad2e-ef9e2067924f&is_raw=true&size=extra-large
openQA (openqa.job.done) (native format): https://apps.fedoraproject.org/datagrepper/id?id=2019-93b8b203-9b9e-4782-bea9-4fab2c28722a&is_raw=true&size=extra-large
autocloud: https://apps.fedoraproject.org/datagrepper/raw?category=autocloud

Those are only samples, the systems publish on quite a few more topics
covering different flows and different stages in each flow. openQA
publishes both messages on ci.* topics in CI Messages-compliant form,
and messages on openqa.* topics in a format of its own (this is for
backwards compatibility as the openqa.* messages existed before CI
Messages showed up). CI Pipeline publishes messages on ci.* topics that
are *not* CI Messages-compliant. autocloud publishes messages on its
own topic and in its own format. Yes I know this is all a mess; that's
why I'd like to try and avoid it becoming *even more of a mess* :)

Note I wrote a Python wrapper for CI Messages which should make it
convenient to use the message schemas in Python code:

https://pagure.io/fedora-qa/python-ci_messages/

I haven't really used it a lot in anger yet, though, because the only
thing I maintain that publishes in CI Messages-compliant format is
written in perl so it can't use it :/
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
_______________________________________________
CI mailing list -- ci@lists.fedoraproject.org
To unsubscribe send an email to ci-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org