November 2016 - qa-devel - Fedora Mailing-Lists

by Josef Skladanka

So, I have performed the migration on DEV - there were some problems with it going out of memory, so I had to tweak it a bit (please have a look at D1059, that is what I ended up using by hot-fixing on DEV). There still is a slight problem, though - the migration of DEV took about 12 hours total, which is a bit unreasonable. Most of the time was spent in `alembic/versions/dbfab576c81_change_schema_to_v2_0_step_2.py` lines 84-93 in D1059. The code takes about 5 seconds to change 1k results. That would mean at least 15 hours of downtime on PROD, and that, I think is unreal... And since I don't know how to make it faster (tips are most welcomed), I suggest that we archive most of the data in STG/PROD before we go forward with the migration. I'd make a complete backup, and deleted all but the data from the last 3 months (or any other reasonable time span). We can then populate an "archive" database, and migrate it on its own, should we decide it is worth it (I don't think it is). What do you think? J.

7 years, 3 months

4
9
0 / 0

New ExecDB

by Josef Skladanka

With ResultsDB and Trigger rewrite done, I'd like to get started on ExecDB. The current ExecDB is more of a tech-preview, that was to show that it's possible to consume the push notifications from Buildbot. The thing is, that the code doing it is quite a mess (mostly because the notifications are quite a mess), and it's directly tied not only to Buildbot, but quite probably to the one version of Buildbot we currently use. I'd like to change the process to a style, where ExecDB provides an API, and Buildbot (or possibly any other execution tool we use in the future) will just use that to switch the execution states. ExecDB should be the hub, in which we can go to search for execution state and statistics of our jobs/tasks. The execution is tied together via UUID, provided by ExecDB at Trigger time. The UUID is passed around through all the stack, from Trigger to ResultsDB. The process, as I envision it, is: 1) Trigger consumes FedMsg 2) Trigger creates a new Job in ExecDB, storing data like FedMsg message id, and other relevant information (to make rescheduling possible) 3) ExecDB provides the UUID, marks the Job s SCHEDULED and Trigger then passes the UUID, along with other data, to Buildbot. 4) Buildbot runs runtask, (sets ExecDB job to RUNNING) 5) Libtaskotron is provided the UUID, so it can then be used to report results to ResultsDB. 6) Libtaskotron reports to ResultsDB, using the UUID as the Group UUID. 7) Libtaskotron ends, creating a status file in a known location 8) The status file contains a machine-parsable information about the runtask execution - either "OK" or a description of "Fault" (network failed, package to be installed did not exist, koji did not respond... you name it) 9) Buidbot parses the status file, and reports back to ExecDB, marking the Job either as FINISHED or CRASHED (+details) This will need changes in Buildbot steps - a step that switches the job to RUNNING at the beginnning, and a step that handles the FINISHED/CRASHED switch. The way I see it, this can be done via a simple CURL or HTTPie call from the command line. No big issue here. We should make sure that ExecDB stores data that: 1) show the execution state 2) allow job re-scheduling 3) describe the reason the Job CRASHED 1 is obviously the state. 2 I think can be satisfied by storing the Fedmsg Message ID and/or the Trigger-parsed data, which are passed to Buildbot. Here I'd like to focus on 3: My initial idea was to have SCHEDULED, RUNNING, FINISHED states, and four crashed states, to describe where the fault was: - CRASHED_TASKOTRON for when the error is on "our" side (minion could not be started, git repo with task not cloned...) - CRASHED_TASK to use when there's an unhandled exception in the Task code - CRASHED_RESOURCES when network is down, etc - CRASHED_OTHER whenever we are not sure The point of the crashed "classes" is to be able to act on different kind of crash - notify the right party, or even automatically reschedule the job, in the case of network failure, for example. After talking this through with Kamil, I'd rather do something slightly different. There would only be one CRASHED state, but the job would contain additional information to - find the right person to notify - get more information about the cause of the failure To do this, we came up with a structure like this: {state: CRASHED, blame: [TASKOTRON, TASK, UNIVERSE], details: "free-text-ish description"} The "blame" classes are self-describing, although I'd love to have a better name for "UNIVERSE". We might want to add more, should it make sense, but my main focus is to find the right party to notify. The "details" field will contain the actual cause of the failure (in the case we know it), and although I have it marked as free-text, I'd like to have a set of values defined in docs, to keep things consistent. Doing this, we could record that "Koji failed, timed out" (and blame UNIVERSE, and possibly reschedule) or "DNF failed, package not found" (blame TASK if it was in the formula, and notify the task maintained), or "Minion creation failed" (and blame TASKOTRON, notify us, I guess). Implementing the crash clasification will obviously take some time, but it can be gradual, and we can start handling the "well known" failures soon, for the bigger gain (kparal had some examples, IIRC). So - what do you think about it? Is it a good idea? Do you feel like there should be more (I can't really imagine there being less) blame targets (like NETWORK, for example), and if so, why, and which? How about the details - hould we go with pre-defined set of values (because enums are better than free-text, but adding more would mean DB changes), or is free-text + docs fine? Or do you see some other, better solution? joza

7 years, 3 months

4
8
0 / 0

Release validation NG: planning thoughts

by Adam Williamson

Hi folks! We should probably set up some projects and so on for this so we can use issue trackers, but I thought before committing to any structure we could have at least a short mailing list discussion for planning the 'release validation NG' work. For anyone who forgot / didn't know - 'release validation NG' is my nickname for the project to write a dedicated system for manual release validation testing result submission, using resultsdb for storage. The goal is to make manual validation testing result submission easier and less error-prone, and also to allow for improvement analysis of results and integration of manual results with results from other systems (taskotron, openQA, autocloud etc). This would be designed to replace the system of editable wiki pages that I call 'Wikitcms': https://fedoraproject.org/wiki/Test_Results:Current_Installation_Test (etc.) https://fedoraproject.org/wiki/Wikitcms the latter page is a broad overview of how I see the Wikitcms 'system' working at present. It's that system we'd be replacing, so it may help you to read through that page to get some context and background on how we got here and why 'release validation NG' might be a good idea :) We have a ticket open with the design team: https://pagure.io/design/issue/483 where kathryng is helping us with design mock ups based on my initial rough sketches, which is great. Please do take a look at the mockups and discussion there and add thoughts if you have any. My very initial thought on architecture is that we could have two main components, a webui component and a validator/resultsdb submitter component. The webui component would be exactly that, the actual web UI for users to interact with and submit their results to. It would query the validator/submitter component to find out what relevant 'test events' were available, and what tests and environments and so forth for each event, and then present an appropriate UI to the user for them to fill in their results. The validator/submitter component would be responsible for watching out for new composes and keeping track of tests and 'test environments' (if we keep that concept); it would have an API with endpoints you could query for this kind of information in order to construct a result submission, and for submitting results in some kind of defined form. On receiving a result it would validate it according to some schemas that admins of the system could configure (to ensure the report is for a known compose, image, test and test environment, and do some checking of stuff like the result status, user who submitted the result, comment content, stuff like that). Then it'd forward the result to resultsdb. This is just an idea, though. There are a few reasons I thought it might make sense to separate these two elements: * It gives us flexibility in a few important respects: * The validator/submitter could accept results from other things, not just the webUI - e.g. relval * The validator/submitter count send results to other things, not just ResultsDB - e.g. the wiki * The validator/submitter could be written to allow expansion to cover things other than release validation results, e.g. Test Day results, so a future rewrite of the 'testdays' webapp could use it * It should help with splitting up the work between people; different people can work on the web UI and the validator/submitter without blocking each other too often So these are just my very early thoughts on the project, it'd be great to know what other folks think! If we can agree on a basic architecture and plan we could start setting up projects (I think I'd suggest we do this in Pagure, but we can also consider Phabricator) and tickets for the initial work. -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net

7 years, 4 months

2
12
0 / 0

Re: Release validation NG: planning thoughts

by Adam Williamson

On Tue, 2016-11-29 at 19:41 +0530, Kanika Murarka wrote: > Hey everyone, > I have some thoughts for the project:- > > 1. We can have a notification system, which gives a notifications like:- > * 'There is a test day coming for this compose in 2 days' > * 'A new compose has been added' > Something to motivate and keep reminding testers about test days and new > composes. Yeah, this is certainly going to be needed if only simply to replace the Wikitcms event creation notification emails (these are sent by 'relvalconsumer', which is the fedmsg consumer bot that creates the events). > 2. Keep a record of no. of validation test done by a tester and highlight > it once he login. A badge is being prepared for no. of validation testing > done by a contributor[1]. Well, this information would kind of inevitably be collected at least in resultsdb and probably wind up in the transmitter component's DB too, depending on exactly how we set things up. For badge purposes, we're *certainly* going to have this system firing off fedmsgs in all directions, so the badges can be granted just based on the fedmsgs. 'User W reported a X for test Y on compose Z' (or similar) is a very obvious fedmsg to emit. > 3. Someway to show that testing for a particular compose is not required > now, so testers can move on to newer composes. We're talking about approximately this in the design ticket. My initial design idea would *only* show images for the 'current' validation event if you need to download an image for testing; I don't really see an awful lot of point in offering older images for download. I suggested offering events from the previous week or so for selection if you already have an image downloaded, to prevent people having to download new images all the time but also prevent us getting uselessly old reports. I'd see it as the validator/submitter component's job to keep track of information about events/composes (however we conceive it), like when they appeared, and the web UI's job to make decisions about which to actually show people. > 4. Also, we can add a 'sort by priority' option in the list of test images. Yes, something like that, at least. The current system actually does something more or less like this. The download tables on the wiki pages are not randomly ordered, but ordered using a weighting provided by fedfind which includes the importance of the image subvariant as a factor: https://pagure.io/fedora-qa/fedfind/blob/master/f/fedfind/helpers.py#_331 It currently penalizes ARM images quite heavily, which is not because ARM isn't important, but a craven surrender to the practical realities of wiki tables: they look a lot better if all the ARM disk images are grouped together than if they're interspersed throughout the table. We obviously have more freedom to avoid this issue in the design of the new system. Thanks for the thoughts! -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net

7 years, 4 months

2
1
0 / 0

openQA update heads-up

by Adam Williamson

Hey folks! Just a quick heads-up about updates coming to the openQA instances. I've upgraded staging to F25 today, that seems to have gone pretty much flawlessly. I'm building current git snapshots of os- autoinst and openQA at present and will update staging to those as well, and we'll see how things look over the next few days. Depending on how that goes I'll aim to upgrade prod to F25 and update it to the git snapshots shortly afterwards. Hoping this goes a bit smoother than last cycle and I don't wind up spending another month cleaning up upstream issues... -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net

7 years, 5 months

1
0
0 / 0

autoqa migrated to pagure

by Kamil Paral

I moved autoqa (yes, autoqa) from fedorahosted to pagure: https://pagure.io/fedora-qa/autoqa Wiki pages are exported in wiki/ directory. Just in case somebody would need it some day :-) Cheers, Kamil

7 years, 5 months

1
0
0 / 0

2016-11-07 @ **15:00 UTC** - Fedora QA Devel Meeting

by Tim Flink

# Fedora QA Devel Meeting # Date: 2016-11-07 # Time: 15:00 UTC (note time change) (https://fedoraproject.org/wiki/Infrastructure/UTCHowto) # Location: #fedora-meeting-1 on irc.freenode.net Note the time change - as with the other QA meetings, we are keeping in sync with US DST. It's the second half of that time of the year again - the time when nobody is quite sure what time meeting are at because many clocks have changed. https://phab.qadevel.cloud.fedoraproject.org/w/meetings/20161107-fedoraqa... If you have any additional topics, please reply to this thread or add them in the wiki doc. Tim Proposed Agenda =============== Announcements and Information ----------------------------- - Please list announcements or significant information items below so the meeting goes faster Tasking ------- - Does anyone need tasks to do? Potential Other Topics ---------------------- - Docker Testing Status - Dist-Git Task Storage Proposal (and test case docs) - Rebuilding Taskotron instances Open Floor ---------- - TBD

7 years, 5 months

1
1
0 / 0

stats-bodhi license

by Adam Williamson

I'm back working on moving fedora-qa to Pagure. I'm now dealing with the 'stats' scripts, and there's a problem: it appears that stats-bodhi has never actually been properly licensed. It has no license header or license text, and AFAICS, never has. I can't simply declare it to be F/OSS licensed, as I didn't write it. Can Lukas or Kamil give us a license declaration for this code? Thanks! -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net

7 years, 5 months

3
5
0 / 0

Proposal to move things from fedora-qa.git to Pagure

by Adam Williamson

Hey folks! So, if no-one has any objections, I'm intending to move the contents of fedora-qa.git from fedorahosted to Pagure. At the same time, I think it'd make sense to split some things out into their own projects. My rough plan is to split out at least check-compose, relvalconsumer and stats into separate projects. I'm not sure which of the other things it's worth splitting out. I'll probably put the new projects in the fedora-qa namespace and under the fedora-qa group (if I can). git seems to have some fairly nifty capabilities for isolating the history of individual files / directories: https://blogs.atlassian.com/2014/04/tear-apart-repository-git-way/ so we should be able to produce decent histories for each new project. Does anyone mind me going ahead and doing this? And importantly, is anyone aware of any significant deployments besides the ones I'm already looking after (openQA boxes etc) which use the stuff from this git repo, and would need to be updated to pull from the new project repos? Thanks, everyone! -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net

7 years, 5 months

2
4
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

qa-devel November 2016