per-arch tasks strategy in Taskotron
by Kamil Paral
Hello,
I want to consult our current strategy regarding per-arch tasks. We currently have no such tasks, all of them are generic (they can run on any host system and test any single or multiple item arches). I want to convert task-rpmdeplint [1] to arch-specific, to avoid a race condition [2]. Also, I believe per-arch tasks will be necessary for supporting dist-git tasks (we'll need to assume that they are arch-specific by default). I found several issues in our current implementation.
First, I submitted a diff for making --arch cmdline argument a single value instead of a multiple value list:
https://phab.qa.fedoraproject.org/D1171
That has the benefit of ${arch} being a simple string, and that makes it easier in formula to operate with it - for example pass it to a bash script that needs to do something. It also makes possible to easily extend architecture list that is passed to e.g. koji directive. That is important for tasks dealing with multilib (rpmdeplint).
My view is that we should simplify things in order to keep our sanity and reasonable implementation, at the expense of some performance optimization (running x86_64 and i386 tasks at the same time, because hardware allows that). So --arch will be a single value, and that will determine what architecture should the task check. So if an arch-specific task wants to run for armhfp, i386 and x86_64, trigger will schedule 3 different jobs on particular hosts, and they will run independently. The task will be able to determine whether it's arch-specific or generic in its formula (or scheduling database in the future), therefore deciding whether it's executed once or multiple times (with different --arch arguments). It will also be able to say which arches it supports or doesn't support (so that we don't run it on e.g. armhfp even if we have armhfp machines, if the task doesn't support it). If we need performance optimizations in the future, we can do that by allowing the task to not be tied to a particular arch host (so that we can execute armhfp rpmdeplint on x86_64 minion). But those runs will be always independent (one job for x86_64, one job for armhfp, one job for i386).
Does that sound reasonable? If agreed, I really want to push the aforementioned change before dist-git tasks take off, since then we'd be breaking our formula API (and workflow expectations) for a lot of people.
Second, we don't have any non-x86_64 minions yet. So even if we support the above, we are currently able to execute just x86_64 arch-specific tasks. I assume we'll get some arm builder with armhfp buildslaves in the future, but that will take time. i386 is a different story. We are able to create the minions (we just don't do it yet), and we even have a single i386 builder (staying idle all the time). However, i386 switched to an alternate arch since F26, so I'm not really sure if we want to invest a lot of time into it (we should obviously focus primary arches first). Also, creating minions for i386 means data storage requirements doubled, and number of issues to debug doubled. I wonder whether it's a worthy investment of time at this very moment. Our architecture should obviously be prepared to handle arch-dependent tasks (especially from formula/API perspective, so that we don't force all users to amend their code in the future), but I'm wondering whether we should actually *have* multiple arches support right now, or rather keep it simple and tell everyone that only x86_64 is supported during the pilot.
WDYT?
If we decide to go with i386 minions right now, we'll need support in trigger. It will either read the formula for a new field that we add, or it will have hardcoded value that it should run task-rpmdeplint (and any dist-git task) with multiple architectures. So it will schedule two jobs for each such task, one with --arch x86_64, and one with --arch i386. Can somebody with trigger experience estimate whether this is a difficult change to do or not? Also, do you see any further changes required in other tools?
[1] I mostly ignore task-depcheck, because it's going away. But the same issue would apply also there.
[2] https://phab.qa.fedoraproject.org/T894
6 years, 6 months
2017-03-20 Fedora QA Devel Meeting Minutes
by Tim Flink
Apologies for not announcing the meeting again. The minutes and logs
from what were discussed are as follow:
=================================
#fedora-meeting-1: fedora-qadevel
=================================
Minutes: https://meetbot.fedoraproject.org/fedora-meeting-1/2017-03-20/fedora-qade...
Minutes (text): https://meetbot.fedoraproject.org/fedora-meeting-1/2017-03-20/fedora-qade...
Log: https://meetbot.fedoraproject.org/fedora-meeting-1/2017-03-20/fedora-qade...
Meeting summary
---------------
* Roll Call (tflink, 14:02:27)
* Announcements and Information (tflink, 14:03:25)
* Dist-Git Task Storage Deployed to and Enabled in Production- tflink
(tflink, 14:03:56)
* Still investigating issues with nested virt for Cloud/Atomic compose
tests - roshi (roshi, 14:04:33)
* base-images are building again (kparal, 14:04:50)
* bot indexing disabled for resultsdb (kparal, 14:05:12)
* taskotron landing page now has a fancy instance switcher -- lbrabec
(kparal, 14:05:40)
* Dist-Git Task Storage (tflink, 14:08:17)
* draft Taskotron Overview documentation on wiki:
https://fedoraproject.org/wiki/User:Roshi/QA/Taskotron_Overview -
roshi (roshi, 14:10:10)
* LINK: https://koji.fedoraproject.org/koji/buildinfo?buildID=870047
(tflink, 14:12:30)
* ResultsDB Performance (tflink, 14:18:39)
* Moar Ansible? (tflink, 14:30:37)
* LINK: https://review.openstack.org/#/c/442180/ (tflink, 14:33:11)
* LINK: https://review.openstack.org/#/c/438281/ (tflink, 14:33:31)
* Tasking (tflink, 14:39:41)
* Open Floor (tflink, 14:41:43)
Meeting ended at 14:48:07 UTC.
Action Items
------------
Action Items, by person
-----------------------
* **UNASSIGNED**
* (none)
People Present (lines said)
---------------------------
* tflink (79)
* kparal (33)
* jskladan (11)
* zodbot (9)
* roshi (8)
* mkrizek (4)
* lbrabec (1)
* robyduck (0)
Generated by `MeetBot`_ 0.1.4
.. _`MeetBot`: http://wiki.debian.org/MeetBot
6 years, 6 months
2017-03-13 Fedora QA Devel Meeting Minutes
by Tim Flink
=================================
#fedora-meeting-1: fedora-qadevel
=================================
Minutes: https://meetbot.fedoraproject.org/fedora-meeting-1/2017-03-13/fedora-qade...
Minutes (text): https://meetbot.fedoraproject.org/fedora-meeting-1/2017-03-13/fedora-qade...
Log: https://meetbot.fedoraproject.org/fedora-meeting-1/2017-03-13/fedora-qade...
Meeting summary
---------------
* roll call (tflink, 14:01:03)
* Announcements and Information (tflink, 14:03:45)
* taskotron-dev works again - mkrizek (mkrizek, 14:04:33)
* deployment of atomic/cloud checks is in progress - mkrizek, roshi
(mkrizek, 14:05:07)
* we have nested virt on dev \o/ (mkrizek, 14:05:52)
* working on overview documentation for how data flows through
taskotron - roshi (roshi, 14:06:47)
* phab updated to a version with working search (tflink, 14:08:47)
* Deploying New Features (tflink, 14:09:33)
* Documentation, Guides and Examples (tflink, 14:18:11)
* Phabricator (tflink, 14:31:25)
* LINK: https://bugzilla.redhat.com/show_bug.cgi?id=1418940 (tflink,
14:33:22)
* ACTION: tflink to move the conversation around phab and pagure to
qadevel@ (tflink, 14:54:06)
* open floor (tflink, 14:54:10)
Meeting ended at 14:55:49 UTC.
Action Items
------------
* tflink to move the conversation around phab and pagure to qadevel@
Action Items, by person
-----------------------
* tflink
* tflink to move the conversation around phab and pagure to qadevel@
* **UNASSIGNED**
* (none)
People Present (lines said)
---------------------------
* tflink (95)
* roshi (43)
* kparal (30)
* mkrizek (18)
* jskladan (5)
* zodbot (5)
* tenk (1)
Generated by `MeetBot`_ 0.1.4
.. _`MeetBot`: http://wiki.debian.org/MeetBot
6 years, 6 months
Cancel/Reschedule 2017-03-06 Fedora QA Devel Meeting
by Tim Flink
I can't make the QA Devel meeting this week so unless someone else
wants to lead the meeting, we'll need to cancel or reschedule it.
12:00 or 13:00 UTC should work for me Tuesday-Thursday if that works
for other folks.
Thoughts? Votes? Volunteers?
Tim
6 years, 6 months
New automated test coverage: openQA tests of critical path updates
by Adam Williamson
Hi folks!
I am currently rolling out some changes to the Fedora openQA deployment
which enable a new testing workflow. From now on, a subset of openQA
tests should be run automatically on every critpath update, both on
initial submission and on any edit of the update.
For the next little while, at least, this won't be incredibly visible.
openQA sends out fedmsgs for all tests, so you can sign up for FMN
notifications to learn about these results. They'll also be
discoverable from the openQA web UI - https://openqa.fedoraproject.org
. The results are also being forwarded to ResultsDB, so they'll be
visible via ResultsDB API queries and the ResultsDB web UI. But for
now, that's it...I think.
Our intent is to set up the necessary bits so that these results will
show up in the Bodhi web UI alongside the results for relevant
Taskotron tests. There's an outside possibility that Bodhi is actually
already set up to find these results in ResultsDB, in which case
they'll just suddenly start showing up in Bodhi - we should know about
that soon enough. :) But most likely Bodhi will need a bit of a tweak
to find them. This is probably a good thing, because we need to let the
tests run for a while to find out how reliable they are, and if there's
an unacceptable number of false negatives/positives. Once we have some
info on that and are happy that we can get things sufficiently reliable
for the results to be useful, we'll hook up the Bodhi integration.
The tests that are run are most of the tests that, on the 'compose
test' workflow, get run on the Server DVD and Workstation Live images
after installation. Between them they do a decent job of covering basic
system functionality. They also cover FreeIPA server and client setup,
and Workstation browser (Firefox) and terminal functionality. So
hopefully, if your critpath update completely breaks one of those basic
workflows, you'll find out about it before pushing it stable.
At present it looks like the Workstation tests may sometimes fail
simply because the base install gets stuck during boot for some reason;
I'm going to look into that this week. In testing so far the Server
tests seem fairly reliable, but I want to gather data from a few days
worth of test runs to see how those look. Once we start sending results
to Bodhi, I'll try and write up some basic instructions on how to
interpret and debug openQA test results; QA folks will also be
available in IRC and by email for help with this, of course.
You can see sample runs on Server:
https://openqa.stg.fedoraproject.org/tests/overview?groupid=1&build=FEDOR...
and Workstation:
https://openqa.stg.fedoraproject.org/tests/overview?version=25&distri=fed...
the 'desktop_notifications_live' failure is a stale bit of data - that
test isn't actually run any more because obviously it makes no sense in
this context, but because it got run one time in early development,
openQA continues to show it for that update (it won't show for any
*other* update). The `desktop_update_graphical` fail is a good example
of the kind of issue I'll have to look into this week: it seems to have
failed because of an intermittent crasher bug in PackageKit, rather
than an issue in the update. We'll have to look at skipping known-
unreliable tests, or marking them somehow so you know the deal in
Bodhi, or automatically re-running them, or things along those lines.
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
6 years, 6 months