October 2016 - qa-devel - Fedora Mailing-Lists

by Josef Skladanka

With ResultsDB and Trigger rewrite done, I'd like to get started on ExecDB. The current ExecDB is more of a tech-preview, that was to show that it's possible to consume the push notifications from Buildbot. The thing is, that the code doing it is quite a mess (mostly because the notifications are quite a mess), and it's directly tied not only to Buildbot, but quite probably to the one version of Buildbot we currently use. I'd like to change the process to a style, where ExecDB provides an API, and Buildbot (or possibly any other execution tool we use in the future) will just use that to switch the execution states. ExecDB should be the hub, in which we can go to search for execution state and statistics of our jobs/tasks. The execution is tied together via UUID, provided by ExecDB at Trigger time. The UUID is passed around through all the stack, from Trigger to ResultsDB. The process, as I envision it, is: 1) Trigger consumes FedMsg 2) Trigger creates a new Job in ExecDB, storing data like FedMsg message id, and other relevant information (to make rescheduling possible) 3) ExecDB provides the UUID, marks the Job s SCHEDULED and Trigger then passes the UUID, along with other data, to Buildbot. 4) Buildbot runs runtask, (sets ExecDB job to RUNNING) 5) Libtaskotron is provided the UUID, so it can then be used to report results to ResultsDB. 6) Libtaskotron reports to ResultsDB, using the UUID as the Group UUID. 7) Libtaskotron ends, creating a status file in a known location 8) The status file contains a machine-parsable information about the runtask execution - either "OK" or a description of "Fault" (network failed, package to be installed did not exist, koji did not respond... you name it) 9) Buidbot parses the status file, and reports back to ExecDB, marking the Job either as FINISHED or CRASHED (+details) This will need changes in Buildbot steps - a step that switches the job to RUNNING at the beginnning, and a step that handles the FINISHED/CRASHED switch. The way I see it, this can be done via a simple CURL or HTTPie call from the command line. No big issue here. We should make sure that ExecDB stores data that: 1) show the execution state 2) allow job re-scheduling 3) describe the reason the Job CRASHED 1 is obviously the state. 2 I think can be satisfied by storing the Fedmsg Message ID and/or the Trigger-parsed data, which are passed to Buildbot. Here I'd like to focus on 3: My initial idea was to have SCHEDULED, RUNNING, FINISHED states, and four crashed states, to describe where the fault was: - CRASHED_TASKOTRON for when the error is on "our" side (minion could not be started, git repo with task not cloned...) - CRASHED_TASK to use when there's an unhandled exception in the Task code - CRASHED_RESOURCES when network is down, etc - CRASHED_OTHER whenever we are not sure The point of the crashed "classes" is to be able to act on different kind of crash - notify the right party, or even automatically reschedule the job, in the case of network failure, for example. After talking this through with Kamil, I'd rather do something slightly different. There would only be one CRASHED state, but the job would contain additional information to - find the right person to notify - get more information about the cause of the failure To do this, we came up with a structure like this: {state: CRASHED, blame: [TASKOTRON, TASK, UNIVERSE], details: "free-text-ish description"} The "blame" classes are self-describing, although I'd love to have a better name for "UNIVERSE". We might want to add more, should it make sense, but my main focus is to find the right party to notify. The "details" field will contain the actual cause of the failure (in the case we know it), and although I have it marked as free-text, I'd like to have a set of values defined in docs, to keep things consistent. Doing this, we could record that "Koji failed, timed out" (and blame UNIVERSE, and possibly reschedule) or "DNF failed, package not found" (blame TASK if it was in the formula, and notify the task maintained), or "Minion creation failed" (and blame TASKOTRON, notify us, I guess). Implementing the crash clasification will obviously take some time, but it can be gradual, and we can start handling the "well known" failures soon, for the bigger gain (kparal had some examples, IIRC). So - what do you think about it? Is it a good idea? Do you feel like there should be more (I can't really imagine there being less) blame targets (like NETWORK, for example), and if so, why, and which? How about the details - hould we go with pre-defined set of values (because enums are better than free-text, but adding more would mean DB changes), or is free-text + docs fine? Or do you see some other, better solution? joza

7 years, 3 months

4
8
0 / 0

Proposal to move things from fedora-qa.git to Pagure

by Adam Williamson

Hey folks! So, if no-one has any objections, I'm intending to move the contents of fedora-qa.git from fedorahosted to Pagure. At the same time, I think it'd make sense to split some things out into their own projects. My rough plan is to split out at least check-compose, relvalconsumer and stats into separate projects. I'm not sure which of the other things it's worth splitting out. I'll probably put the new projects in the fedora-qa namespace and under the fedora-qa group (if I can). git seems to have some fairly nifty capabilities for isolating the history of individual files / directories: https://blogs.atlassian.com/2014/04/tear-apart-repository-git-way/ so we should be able to produce decent histories for each new project. Does anyone mind me going ahead and doing this? And importantly, is anyone aware of any significant deployments besides the ones I'm already looking after (openQA boxes etc) which use the stuff from this git repo, and would need to be updated to pull from the new project repos? Thanks, everyone! -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net

7 years, 5 months

2
4
0 / 0

Proposal to CANCEL: 2016-10-31 Fedora QA Devel Meeting

by Tim Flink

I'm not aware of any topics that need to be discussed/reviewed as a group this week, so I propose that we cancel the weekly Fedora QA devel meeting. If there are any topics that I'm forgetting about and/or you think should be brought up with the group, reply to this thread and we can un-cancel the meeting. Tim

7 years, 5 months

2
1
0 / 0

2016-10-24 @ 14:00 UTC - Fedora QA Devel Meeting

by Tim Flink

# Fedora QA Devel Meeting # Date: 2016-10-24 # Time: 14:00 UTC (https://fedoraproject.org/wiki/Infrastructure/UTCHowto) # Location: #fedora-meeting-1 on irc.freenode.net I suspect that this is going to be a really short meeting and I'll be surprised if it takes anywhere near the full hour. https://phab.qadevel.cloud.fedoraproject.org/w/meetings/20161024-fedoraqa... If you have any additional topics, please reply to this thread or add them in the wiki doc. Tim Proposed Agenda =============== Announcements and Information ----------------------------- - Please list announcements or significant information items below so the meeting goes faster Tasking ------- - Does anyone need tasks to do? Potential Other Topics ---------------------- - Docker Testing Status - Dist-Git Task Storage Proposal Open Floor ---------- - TBD

7 years, 6 months

2
2
0 / 0

Moving all my tools to Pagure

by Adam Williamson

Hey folks! Just a heads up that I'm moving all the repos I maintain to the fedora-qa space on Pagure. That includes: (python-)wikitcms relval fedfind testdays The new projects will be: https://pagure.io/fedora-qa/python-wikitcms https://pagure.io/fedora-qa/relval https://pagure.io/fedora-qa/fedfind https://pagure.io/fedora-qa/testdays testdays is already migrated, and I'm in the middle of doing wikitcms now (renaming it as it goes). The others I'll get to later today I hope. The pages for each tool on happyassassin.net will go away and the URLs will simply redirect to the Pagure project pages. I plan to push a final commit to each repo on happyassassin.net/cgit which will just have a 'MOVED' file or something with the Pagure project info. (Except I haven't bothered for 'testdays', because it's a small thing I don't think anyone else really uses). I'll leave that up for a few weeks or something, then kill cgit from happyassassin entirely. This saves me maintaining the cgit setup and the front pages, and means there's now a handy place to file issues and pull requests for each project; I'm going to remove the Phabricator issue / PR tracking for these and just go with Pagure, unless anyone yells that they really want to be able to send issues/PRs via Phab. I guess we should also migrate the stuff from https://fedorahosted.org/fedora-qa/ soon. Thanks folks! -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net

7 years, 6 months

2
2
0 / 0

2016-10-17 @ 14:00 UTC - Fedora QA Devel Meeting

by Tim Flink

# Fedora QA Devel Meeting # Date: 2016-10-17 # Time: 14:00 UTC (https://fedoraproject.org/wiki/Infrastructure/UTCHowto) # Location: #fedora-meeting-1 on irc.freenode.net We didn't have a meeting last week and enough things are going on to warrant at least syncing up this week. https://phab.qadevel.cloud.fedoraproject.org/w/meetings/20161017-fedoraqa... If you have any additional topics, please reply to this thread or add them in the wiki doc. Tim Proposed Agenda =============== Announcements and Information ----------------------------- - Please list announcements or significant information items below so the meeting goes faster Tasking ------- - Does anyone need tasks to do? Potential Other Topics ---------------------- - Docker Testing Status - Trigger Re-Write Status - Dist-Git Task Storage Proposal Open Floor ---------- - TBD

7 years, 6 months

1
1
0 / 0

Proposal to CANCEL: 2016-10-10 Fedora QA Devel Meeting

by Tim Flink

I'm not going to be able to lead the meeting on Monday so unless someone else wants to take over that role, I propose that we cancel the meeting for next week. Tim

7 years, 6 months

1
0
0 / 0

[Fedora QA] #494: F25 Atomic Test Day

by fedora-badges

#494: F25 Atomic Test Day --------------------------------------+------------------------ Reporter: jasonbrooks | Owner: tflink Type: task | Status: new Priority: major | Milestone: Fedora 25 Component: Blocker bug tracker page | Version: Keywords: | Blocked By: Blocking: | --------------------------------------+------------------------ We want to have a test day for Fedora Cloud / Atomic 25 on Oct 14. I've started a wiki page for the test day at https://fedoraproject.org/wiki/Test_Day:2016_10_14_Cloud. -- Ticket URL: <https://fedorahosted.org/fedora-qa/ticket/494> Fedora QA <http://fedorahosted.org/fedora-qa> Fedora Quality Assurance

7 years, 6 months

1
1
0 / 0

Resultsdb v2.0 - API docs

by Josef Skladanka

Hey gang, I spent most of today working on the new API docs for ResultsDB, making use of the even better Apiary.io tool. Before I put even more hours into it, please let me know, whether you think it's fine at all - I'm yet to find a better tool for describing APIs, so I'm definitely biased, but since it's the Documentation, it needs to also be useful. http://docs.resultsdb20.apiary.io/ I am also trying to put more work towards documenting the attributes and the "usual" queries, so please try and think about this aspect of the docs too. Thanks, Joza

7 years, 6 months

4
13
0 / 0

What to do with fedora-qa (fedorahosted is dying)

by Adam Williamson

We still have a few miscellaneous things hosted in: https://git.fedorahosted.org/cgit/fedora-qa.git since fedorahosted is dying next February, what should we do with them? Is this the point where we should finally decide whether to use Phabricator's built-in repository support or Pagure for this stuff and the stuff we currently host on bitbucket? We also still use the fedorahosted *trac* for non-code-related activity tracking, but I guess that's better followed up on test@. -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net

7 years, 6 months

4
6
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

qa-devel October 2016