On Sun, 2018-12-09 at 16:19 +0100, Aleksandra Fedorova wrote:
Hi,
> Anyone have any thoughts?
Let me try to summarize mine.
# The org.centos.prod.ci.pipeline.allpackages-build.package.ignored testcase
The problem I see here is that this testcase doesn't fit in any
of designs for Greenwave, CI Messages and/or ResultsDB. Simply
because "ignored" supposed to be the outcome of the test, not the
testcase itself.
In CI Messages schema we define "not applicable" as a possible value
for the status field [1]. This message supposed to mean
"We were trying to run a certain testcase named "<testcase_name>,
but there was a certain logic in the test framework or pipeline
which led us to skipping it".
So, as noted upthread, I have a problem with this: it is not actually
always simple for the thing sending out the message to *know* this.
Let me explain the openQA case a bit more. It does not exactly have a
simple "list of test cases". Rather...well, there are these things
called "job templates", which mean something like "run this 'test
suite' (approximately, a test case) when tests are requested for this
'flavor' (approximately, an image type, like 'Workstation live'),
'arch' and 'version' (which can be a wildcard).
So what our openQA scheduler code is actually doing here is
approximately this. It takes an update and figures out what Fedora
release it's for; that's the 'version'. It then decides which
'flavors'
it is going to request openQA run the tests for (for updates, the
'flavors' are 'workstation', 'server',
'workstation-upgrade' and
'server-upgrade' - the upgrade tests are in separate flavors so they
can be skipped when the update being tested is for the oldest
currently-supported release, so we don't try and test upgrading from an
EOL release). Then it says "hey, openQA, run the 'X' flavor tests for
this update which is version 'Y'". (Currently we only run update tests
on x86_64, but we'll likely add other arches at some point). It's then
*openQA's* job to figure out what tests that actually means, and
schedule jobs for each test.
The way the fedmsg stuff works is that, when openQA schedules a test
(or starts running a test or completes a test, etc.), it sends out a
message on an internal sort of message bus-y thing; I wrote a plugin
which then sends out fedmsgs based on those internal messages. But of
course, in this case, nothing *happens* in openQA itself, there is no
event we can possibly send out a fedmsg in response to. And the
scheduler only knows that it's not running tests for this or that
'flavor' and 'version' - it does not know, and cannot know, what actual
tests *would have been run if it did*.
It is *possible* to solve this, I guess. My first thought about how to
do that would be to actually add this feature to openQA. It would be a
pretty weird API request - basically "Here is a request that looks like
the one we send when we want you to run some tests. Now, we want you to
explicitly **NOT** run these tests, then report exactly what it is that
you didn't do". :P
Internally it'd just sort of hook into the job creation code, only it
wouldn't actually do the step where it makes the created jobs 'real';
it'd just create the sort of 'prospective' jobs, send out internal
events, and produce a response to the request, then just throw them
away. It probably wouldn't actually be too hard to do, it'd just be a
rather...odd thing to have.
This is the result of our testcase, which we want share with all
consumers of the _testcase_, to let them do something about it. While
"org.centos.prod.ci.pipeline.allpackages-build.package.ignored" is
neither testcase nor a result. It is just an informational message on
the Message Bus, and probably a redundant one, thus it shouldn't be in
the ResultsDB at all.
So for me it is not really a question, rather a work item: we need to
find out how to fix it, and fix it. Most likely by aligning the
pipeline messages to CI Messages format.
This part seems fine, sure. Remember I was talking about ResultsDB
results initially here, not fedmsgs, but I guess the results are being
reported by something which listens to the fedmsgs and forwards them,
or something like that?
# How to use non_applicable result in gating policy
Now given by the testcase result with "not_applicable" as an outcome,
can we use it in Greenwave?
Afaik currently PassingTestCase rule [2] only checks if testcase has
PASS status or waived. I think it needs to be adjusted and it needs
to be configurable.
We need another rule, something like
ParametrizedTestCaseRule(testcase=TEST_CASE_NAME, outcomes=[PASS,
NOT_APPLICABLE, WAIVED])
or maybe
PassingOrSkippedTestCaseRule(testcase=TEST_CASE_NAME)
Or both.
This way we can use all outcomes in our decision making process.
Yes, this is approximately what I was imagining too.
# But should there be the "not applicable" outcome at all
and should we
treat it like PASS?
I think it should be possible to report results like that. There are
certain use cases and extensive test suites where you can skip test
results partially or temporarily, or based on certain parameters. And
if certain CI system provides this kind of flexibility, it should be
able to communicate it.
But I think there shouldn't be a "treat non_applicable like PASS"
approach by default. We need to clearly identify those testcases where
it makes sense and use it only for this limited subset.
Agreed. It seems entirely reasonable that we might want to write a rule
which really *is* only satisfied on PASS, on the basis that that test
shouldn't ever be not run in that particular situation, and if it
*isn't* run, that means something is wrong.
And the smaller the list - the better.
# How to deal with exceptions for a global Greenwave policy
I think that the better way to treat exceptions is to make them
explicit. We shouldn't try to identify exceptions based on their test
results, but rather have a list of them predefined and stored
somewhere near the global policy itself.
To justify: CI systems, test suites and test runs can be
misconfigured. We can in theory disable a certain feature on a lower
level by mistake and we need an independent source of truth to verify
our results against it.
And it goes again back to Greenwave. It currently provides Remote Rule
which allows reading additional testcases from a dist-git repo on a
per project basis.
We would need a NotReallyRemoteRule which would allow overrides to
global policy on a per project basis, using set of rules configured
additionally on a Greenwave server itself or another centralized
storage.
I'm honestly not quite sure what you're talking about here, sorry :)
What's an 'exception' in this context? Are you talking about what
WaiverDB does?
# ExecDB vs ResultsDB
This topic probably needs the thread on its own.
If I understood correctly, ExecDB is the database of test jobs, while
ResultsDB is the database of test cases.
I don't think that's exactly it, no. ExecDB's description explains it
fairly well:
"ExecDB is a database that stores the execution status of jobs running
inside the Taskotron framework."
basically, it's just the bit of Taskotron where it keeps information
like 'test X on item Y was scheduled', 'test X on item Y is running',
'test X on item Y is complete'.
ResultsDB is intended to be exactly what the name says: a database of
test results. "Test X on item Y was run and the outcome was pass",
"Test Z on item Y ran and the outcome was fail". That kinda thing. Of
course such a thing winds up with a list of all the test cases for
which results have been reported to it, but that's a sort of incidental
detail: AFAIK, that's not one of its *intended purposes*, and it's not
formally intended to be a 'source of truth' as regards what test cases
"exist" in any given context, I don't think. It's really there to be: a
database of test results.
And we were recently discussing if it is possible to extend
ResultsDB
into a Results State Machine:
Imagine that we have several CI systems capable of running the same
test case scenario. If we extend ResultsDB status field with PENDING
and IN PROGRESS values we can use it as a task tracker.
1) When new artifact get's created, we check with Greenwave which test
cases we need to gate it.
2) Then we create "(artifact_id, testcase_id, pending)" entry in
ResultsDB for each of them.
3) CI system periodically checks if there is a "(artifact_id,
testcase_id, pending)" entry in ResultsDB, which is not overriden by
"(artifact_id, testcase_id, in progress)" result.
A technical note on this part: ResultsDB doesn't really have any
concept of results "overriding" each other, this is something that has
to be done on the consumer side. All ResultsDB does is store results
and let you access them. Of course, you *can* easily do this on the
consumer side by just filtering to the latest results, or whatever -
assuming your definition of what 'result' is 'current' is a
straightforward one to implement...
4) If it finds one, it triggers the testcase and sends the
"(artifact_id, testcase_id, in progress)" message to ResultsDB.
This idea does go into direction of ExecDB execution tracker, but
using (artifact,testcase) pair as a primary key.
Do you think ExecDB is a better place for it?
My personal opinion is that this is possible but would be quite an
abuse of the system, and it would be better to store this somewhere
else. That's just not what it's for. But the most important opinion
would I guess be Josef's, as he's much closer to this system than I am
:) (And Tim's, of course, but he's still away). CCing Josef to make
sure he's reading.
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net