hey folks! I mentioned this to jskladan on IRC, but just for the
permanent record, I'm working on optional crash report submission for
openQA.
at first I had the workers clicking through the graphical report
submission process, but that has several problems:
a) needles and keypresses and blah
b) workers don't actually know the job ID or URL, so can't include it
in the bug report
c) requires inventing some kind of way to get a BZ username and
password into the workers without it being logged (doable, but just
unnecessary work, when libreport-plugin-bugzilla already has this set
up)
so instead I'm doing it in report_job_results.py in
openqa_fedora_tools. It actually builds off D310, Jan's improvement to
upload the contents of /var/tmp after a crash.
Given a job_id, we check if there's a var_tmp.tar.gz for that job, and
if there is, we look for libreport 'problem directories' inside it. If
we find any, we extract them from the tarball and run 'reporter-
bugzilla -d (directory)' on them.
That's really it in a nutshell, the rest is just error checks and glue
and frills. There's an attempt to include the web UI job URL in the
bug report for new crash reports (though so far I've been testing with
a problem directory that shows up as a dupe of an existing report, so I
haven't tested this yet), and we capture the IDs of the bugs reported.
I also refactored the reporting functions a bit to avoid code
duplication between calling report_job_results directly and using it
from openqa_trigger, and made it possible to specify the openQA URL in
a config file (so you can do result reporting from a system other than
the openQA host itself - like, fr'instance, a Fedora system with
libreport-plugin-bugzilla installed...)
To test it out you need a job in some openQA instance which has a
var_tmp.tar.gz with a crash directory inside it: I've been testing
with https://openqa.happyassassin.net/tests/2736 . You also need to
put a valid BZ username and password in
/etc/libreport/plugins/bugzilla.conf and, unless you're running on the
openQA host itself (there *are* libreport packages for openSUSE in
some OBS repository, but I haven't tried them), you'll want to create
/etc/openqa_fedora.conf with this content:
[site]
url = https://openqa.happyassassin.net
(or whatever URL is appropriate).
Then you can do this:
python report_job_results.py --crashes 2736
(or whatever the job ID is).
This probably still needs a bit more testing and polish before I
submit it as a differential, but I wanted to give people a heads-up
that I was working on it and explain the general design. My current
patch (against 'develop' branch, to which I've merged the 'live' work
now) is attached.
In case you're wondering what happens with duplicate reports: I tested
and it seems like 'not a lot'. When calling reporter-bugzilla in this
way, if the crash has already been reported, it will only generate BZ
activity if the BZ account in question isn't already on the CC list:
it will add it. But if the BZ account is already on the CC list, it
doesn't change the bug at all, it doesn't add the extra comment saying
'another user encountered this issue'. I checked libreport and it
actually only does that when some comment text has been provided, and
we aren't providing one, so it gets skipped.
If we're still worried about noise on dupes it *is* possible to test
if a bug is a dupe by checking the output of:
reporter-bugzilla -h $(cat duphash)
and completely skip the report submission step if it is, and I
actually had that written, but took it out as it seemed unnecessary.
Easy enough to put it back if we want to, though.
In the current version of the patch things are set up so that
openqa_trigger current or openqa_trigger all or openqa_trigger compose
--submit-results runs will try and report all crashes, but it's
absolutely trivial to change that if we only want to report crashes
via a separate invocation.
Comments welcome!
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
Hello,
those of you who use libtaskotron from git, please note that libtaskotron now requires one more directory (/var/cache/taskotron) to exist [1]. From now on, you should be notified if you're missing any of the important directories right at 'runtask' execution start (the execution will exit in that case).
I have also added this directory into conf/tmpfiles.d/taskotron.conf, so if you use automatic pruning of taskotron directories, be sure to update this file in your system.
Cheers,
Kamil
[1] https://phab.qadevel.cloud.fedoraproject.org/D266
===============================
#fedora-meeting: fedora-qadevel
===============================
Minutes: http://meetbot.fedoraproject.org/fedora-meeting/2015-03-16/fedora-qadevel.2…
Minutes (text): http://meetbot.fedoraproject.org/fedora-meeting/2015-03-16/fedora-qadevel.2…
Log: http://meetbot.fedoraproject.org/fedora-meeting/2015-03-16/fedora-qadevel.2…
Meeting summary
---------------
* Role Call (tflink, 15:03:52)
* ExecDB (jskladan, 15:05:53)
* ExecDB is now more defensive in parsing Taskbot's push notifications
(jskladan, 15:06:04)
* ExecDB is IMHO ready to be deployed (jskladan, 15:06:04)
* OpenQA (jskladan, 15:09:06)
* OpenQA looks good so far - some occasional quirks, but nothing too
serious (jskladan, 15:09:06)
* covered testcases are to be found here
https://bitbucket.org/rajcze/openqa_fedora_tools/src/master/PhaseSeparation…
(jskladan, 15:09:06)
* jsedlak is now working on covering FedUp testcases (jskladan,
15:09:06)
* most of the work-to-be-done on OpenQA will be Upgrade & NFS(repos,
kickstarts, ...)-related (jskladan, 15:11:52)
* kparal status report (kparal, 15:12:32)
* Taskotron Artifacts (mkrizek, 15:12:32)
* artifacs have been running on dev for a while now, no issues (that
hasn't been fixed) has been seen (mkrizek, 15:12:57)
* there is a dir for each day where artifacts of that day are put to
speed up loading of taskotron.fp.o/artifacts/ (mkrizek, 15:13:07)
* LINK: http://taskotron-dev.fedoraproject.org/artifacts/20150316/
(mkrizek, 15:14:52)
* Disposable Clients Remote Execution (mkrizek, 15:17:12)
* now working on evaluation communication methods between task
initiator and executor - paramiko or ansible are subjects of
evaluation (mkrizek, 15:17:23)
* kparal status report (kparal, 15:18:44)
* fixed mock crashes for single-arch packages (kparal, 15:18:53)
* LINK: https://phab.qadevel.cloud.fedoraproject.org/D303 (kparal,
15:18:53)
* helped out with implementing the koji download cache, review still
pending, as well as submitting unit tests (kparal, 15:19:09)
* LINK: https://phab.qadevel.cloud.fedoraproject.org/D266 (kparal,
15:19:09)
* still haven't filed a bug about gtk spinner performance issue,
planning to do it soon. it's slowing down openqa execution (and
manual testing) considerably. (just the main installation time goes
from 5 minutes to 15 minutes on my otherwise almost idle laptop, in
my testing). (kparal, 15:19:32)
* tflink status report (tflink, 15:20:38)
* updated systems, seem to still be having problems with trigger :-/
(tflink, 15:20:50)
* still watching for bad depcheck failures (tflink, 15:20:50)
* working on tickets for disposable clients (tflink, 15:20:50)
* Tasking/Planning (tflink, 15:24:55)
* artifacts and execdb should both be feature complete and ready for
deployment (tflink, 15:30:21)
* ACTION: jskladan to ansible-ize execdb (jskladan, 15:42:51)
* LINK: https://phab.qadevel.cloud.fedoraproject.org/T407 (tflink,
15:47:24)
* LINK: https://phab.qadevel.cloud.fedoraproject.org/project/view/20/
(roshi, 15:48:54)
* Open Floor (tflink, 15:53:43)
Meeting ended at 16:05:17 UTC.
Action Items
------------
* jskladan to ansible-ize execdb
Action Items, by person
-----------------------
* jskladan
* jskladan to ansible-ize execdb
* **UNASSIGNED**
* (none)
People Present (lines said)
---------------------------
* tflink (87)
* jskladan (33)
* kparal (23)
* mkrizek (21)
* danofsatx (18)
* roshi (9)
* zodbot (4)
Generated by `MeetBot`_ 0.1.4
.. _`MeetBot`: http://wiki.debian.org/MeetBot
Hey, folks. So I've been tweaking and testing my openQA live image
test stuff today, and I think it's probably about ready for merging.
I've got branches of both openqa_fedora and openqa_fedora_tools with
relevant changes:
https://www.happyassassin.net/cgit/openqa_fedora/log/?h=livehttps://www.happyassassin.net/cgit/openqa_fedora_tools/log/?h=live
on the openQA side there's all the expected work to add new needles
and test cases and modify existing ones where appropriate to handle
both live and non-live cases - quite a lot of change and this will get
very long if I try to summarize it, so just poke me with questions
about any specific bits that aren't obvious. I might go back through
and sprinkle some comment-fu on there today.
On the fedora_tools side various tweaks were needed. image downloading
is tweaked so the filenames for different images should always be
unique (we can't use sub-directories, I asked :<).
One of the most awkward bits was making sure we run all the tests that
aren't particularly image-specific *exactly once* for all composes -
not zero times, and not twice or more. This is problematic because
nightly composes have a 'generic boot.iso' image (but no 'server
boot.iso' or 'server DVD' or 'server netinst' or anything), while
TC/RC composes have a 'server boot.iso', 'server DVD', and 'server
netinst' (which I think is always 100% identical to 'server boot.iso',
but fedfind has to consider them two separate things, really), but no
'generic boot.iso'.
So I introduced a new openQA flavor called 'universal', and added a
bit of logic to openqa_trigger which makes it effectively 'nominate'
one of the images from the compose it's running on as the one that
will have the 'universal' tests run against it. It *also* then
schedules every image downloaded - including the one nominated as
'universal' - under its 'natural' flavor name (which is now
payload_imagetype). To match this, on the openQA side, all the tests
which can run with any non-live image are now associated with the
'universal' flavor/product, and only image-specific tests (which
currently means just 'default boot and install') are associated with
the image-specific flavors/products.
The upshot of that is that when you schedule a run against a compose
you get most of the tests run just once against one of the non-live
images, and you also get one instance of default_boot_install per
image, which is I think exactly what we want.
The other cute hack I added is a way for the result reporting stuff to
be able to figure out how to report the default_boot_install results
to the wiki correctly. I added an optional key for the per-testcase
dicts in the TESTCASES dict-of-dicts; if a testcase has a 'name_cb'
key, its value should be a callback function which will provide the
correct testcase name when called with the openQA job's 'flavor' as
the sole argument. report_job_results.py checks for the callback and
calls it if it's there, and uses the return value as the --testcase
parameter it passes to relval. Aaaand that results in us reporting the
result against the correct 'test instance' (row in the results table)
for the image the test was run against. SIMPLES! Actually I kinda like
that approach, and the general idea should be extensible in other
cases where we need to do something like this. (We'll probably need a
section callback for the Desktop page, for instance.)
anyhow, it's a tad tricky to test this with just a single run ATM
because we have netinsts (but no Workstation lives) for F22 composes
and Workstation lives (but no netinsts) for Rawhide, but I tested it
out quite a bit with various different composes and it seems to be
working pretty well. You can of course see all the various test runs
at https://openqa.happyassassin.net .
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
Looking at the Fedora infra ansible git repo, it seems like there is
a skeleton for managing beaker01.qa.fedoraproject.org, but just the
system and OS, not the Beaker application itself. (Unless I missed
something?)
I'd like to try and contribute a patch for Beaker server/lab controller
roles. Where is the preferred place to iterate on that? Is there an open
Phab issue, or should I just mail this list, or something else?
I'm sure it will take me quite a few attempts to get right :-) since
I haven't done anything with Fedora infra ansible before.
--
Dan Callaghan <dcallagh(a)redhat.com>
Software Engineer, Hosted & Shared Services
Red Hat, Inc.
While working on the live testing stuff, I ran into this landmine:
https://fedoraproject.org/wiki/Changes/NewDefaultConsoleFont
the console font has changed in Rawhide. It at least makes the 'root
logged in at console' needle fail to match on Rawhide, and will
probably affect other console needles too. There's a commit on my
'live' branch that fixes that one needle, but I didn't go around
testing any of the others yet (it'd be a bit hard because we don't
have non-live images for Rawhide, due to the python3 issue which means
all non-live nightlies fail).
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
Hi, folks. I've been working on getting live image testing into openQA
today. I have forked openqa_fedora at happyassassin and added a 'live'
branch with my work:
https://www.happyassassin.net/cgit/openqa_fedora/log/?h=live
currently I just keep force-pushing to a single commit, note, don't
rely on the history.
I haven't set up the scheduling in openqa_fedora_tools yet, but you
can try it by checking out my branch, running ./templates --clean ,
putting the Alpha RC3 live image in /var/lib/openqa/factory/isos, and
then running:
/var/lib/openqa/script/client isos post ISO=Fedora-Live-Workstation-
x86_64-22_Alpha-3.iso DISTRI=fedora VERSION=rawhide
FLAVOR=workstation_live ARCH=x86_64 BUILD=22_Alpha_RC3
I've been tweaking it all day and I'm pretty sure it now works for a
default live install up to GDM, and also the existing tests still work
too. Tomorrow I'm aiming to extend the test a bit further and figure
out scheduling of live jobs.
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
Can someone kick testdays.qa.fp.o again? Every time I finally get
around to transferring the anaconda Test Day results into the page,
it's down :(
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
Hey folks!
So I notice that qa-01 has not been zypper updated since 2015-02-12.
We should probably keep up with security updates, but...
don't update openQA! They're making major changes on the dev branch
ATM, and they've pushed packages out, but it's really broken. I got
openQA-4.1425914847.c17291a-253.1.noarch on happyassassin and it's
just completely busted. They're trying to split 'geekotest' into two
user accounts - one for the web UI, one for the workers - which is a
good idea, but it's nowhere near done, and right now workers just
won't run. I fixed a couple of the problems but there seem to be a lot
more.
Richard was saying we might want to use the 'stable' openQA releases
rather than the dev channel, even though upstream's instructions do
seem to point you at the dev channel. I haven't checked yet where we
might find a stable release repository.
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
Hi folks,
The Beaker team have put together a task that makes it feasible to
pre-test full rebuilds across all architectures (or at least primary
ones) before toolchain updates are landed in Koji:
Docs:
https://beaker-project.org/docs-develop/user-guide/beaker-provided-tasks.ht…
Task: http://beaker.fedoraproject.org/bkr/tasks/25
It's designed primarily for testing toolchain changes, so it currently
just builds everything in alphabetical order and doesn't inject the
results back into the build root. It also doesn't replace Koschei, since
that's better for picking up unexpected interactions between different
components, while this new task is intended specifically to help with
major toolchain updates.
Regards,
Nick.
--
Nick Coghlan
Red Hat Hosted & Shared Services
Software Engineering & Development, Brisbane
HSS Provisioning Architect