Possible QA Devel Projects for GSoC 2014
by Tim Flink
Fedora has been accepted as a mentoring org for GSoC 2014 and I'm
planning to sign up as a mentor again this year. I'm trying to think of
good projects that we could put on the list of suggestions of things
that we'd like students to work on and figured it was a good topic for
wider discussion.
I'd like to avoid any blockerbugs projects this year so that we can
focus on taskotron and keeping forward momentum. I've made a quick list
of the possible projects that I can think of. Please comment on any of
them that you think would benefit from having a dedicated intern over
the summer or add to the list if you can think of other projects.
Ideally, the projects would be self-contained enough for the student to
demonstrate their progress over the summer but not so isolated that
they wouldn't be interacting with the community. Projects should be far
enough out that we wouldn't be critically blocked on them but close
enough that the effects of their work are visible before the end of
GSoC.
Tim
------------------------------------------------------------
Graphical Installation Testing
------------------------------------------------------------
Continue the work that Jan started with his thesis or look into
integrating something like openqa. The emphasis here is on the
graphical interface since ks-based installation testing could be
covered by stuff already written for beaker
------------------------------------------------------------
Beaker Integration
------------------------------------------------------------
This is on our roadmap and is certainly something that would be useful.
It would require a bit of infrastructure work and likely the
cooperation of the beaker devs but seems like it could be a good
project even if it isn't the most exciting thing ever.
On the other hand, this could end up being rather critical and may not
be something that we want to mostly hand off to a student.
------------------------------------------------------------
Gnome Integration Test Support
------------------------------------------------------------
An over-simplification of this would be to say "take the stuff that's
run as part of gnome continuous [1] and run it on fedora packages". The
goal would be to have gnome's integration test suites running with any
new gnome builds.
[1] https://wiki.gnome.org/action/show/Projects/GnomeContinuous
------------------------------------------------------------
Disposable Client Support
------------------------------------------------------------
This is another of the big features that we'll be implementing before
too long. It's one of the reasons that we made the shift from AutoQA to
taskotron and is blocking features which folks say they want to see
(user-submitted tasks, mostly).
This would involve some investigation into whether OpenStack would be
practical, if there is another provisioning system we could use or if
we'll be forced to roll our own (which I'd rather avoid). There should
be some tie-in with the graphical installation support and possibly the
gnome integration tests.
------------------------------------------------------------
RPM-OSTree Support/Integration
------------------------------------------------------------
I haven't used rpm-ostree enough yet to figure out how good of a fit
it'd be with taskotron but from the description of the project and the
discussions I've had with cwalters, it sounds like it could be a good
fit as part of our provisioning system for disposable clients.
If we're serious about proposing this as a GSoC project, we should
probably explore it a bit more to be certain that we'd want it now but
I figured it was worth putting on the list.
[2] https://github.com/cgwalters/rpm-ostree
------------------------------------------------------------
System for apparent results storage and modification
------------------------------------------------------------
There has to be a better title for this but it would be one of the last
major steps in enabling bodhi/koji to block builds/updates on check
failures. The idea would be to provide an interface which can decide
whether a build/update is OK based on what checks were passed/failed.
It would have a mechanism for manual overrides and algorithmic
overrides (ie, we know that foo has problem X and are working on it,
ignore failures for now) so that we don't upset packagers more than we
need to.
When Josef and I last talked about this, we weren't sure that putting
this functionality into our results storage mechanism was wise. It's a
different concern that has the potential to make a mess out of the
results storage.
10 years
Default invocation of pytest for qadevel projects
by Tim Flink
For the qadevel projects I'm aware of, we've been using pytest for our
unit/functional test runner.
As part of the shared configuration, tests are split up into two
categories, unit and functional. Unit tests are fast, do not touch the
network, database or filesystem (there are some exceptions to that last
part, though). Functional tests tend to be more integration tests that
can set up a database or do other actions which fall outside of the
previous definition of "unit test".
When you run pytest without any extra args, only the unit tests are
run. The '--functional' arg is needed to enable collection and
execution of the functional tests.
Kamil recently made a request [1] to do one of two things:
[1] https://phab.qadevel.cloud.fedoraproject.org/T89
1. Change the default such that functional tests are collected and
exclude them from collection using a '--unit' arg
2. Change the functional arg from '--functional' to something shorter,
like '--func'
* note that there are restrictions on which args we can use. For
example, '-f' is not allowed as single letter args are reserved for
pytest itself
As stated in T89, I don't have a strong opinion on this as long as it's
possible to exclude the functional tests from collection and we make
the same change across all of our active projects. However, I wanted to
put this up for a wider discussion before changing things.
Any other thoughts on this proposed change?
Tim
10 years
Re: D19 Comments and Diff
by Tim Flink
For some background, I sent this email to Kamil yesterday with a diff
containing suggestions on splitting up the unit tests for CheckDetail
and python_utils. It didn't seem like a big deal to me at the time, so
I just sent it to him. Given the level and type of discussion that it
seems to be started, I agree with Kamil that this needs to be discussed
with a wider audience and I'm replying to the list.
The diff has been slightly tweaked with a comment and a test method
name change. I've left some commented out tests in the diff to show
where I think tests could be removed without getting rid of meaningful
test coverage.
I wasn't very clear on this in my first email, but I meant the diff
partially as an example and it still needs a bit of work. I'm planning
to update the diff and add more comments to the review.
The review that spawned this conversation is up at:
https://phab.qadevel.cloud.fedoraproject.org/D19
My comments are inline.
Tim
On Wed, 5 Mar 2014 07:21:09 -0500 (EST)
Kamil Paral <kparal(a)redhat.com> wrote:
> > Kamil,
> >
> > Instead of putting comments directly into D19, I figured it would be
> > easier to put them in the form of a diff.
>
> It might be better to discuss this publicly, so that others can also
> voice their opinions. You can attach files to Phab using Files, and
> then you can somehow easily reference them inside the comment, I
> guess. That might be a good way to upload a related diff.
As stated above, agreed.
> > I realize this is going back on what I said last week, but I think
> > that separating out the tests even farther would be for the best
> > for the following reasons:
> >
> > 1. I still had a bit of trouble understanding what exactly all the
> > tests were supposed to be doing. I think that separating them out
> > and making the names more verbose will aid in maintainability when
> > other people end up reading the tests
>
> Can you specify which lines exactly were unclear to you? What if I
> provided docstrings for such methods? Because I think that using a
> docstring for explanation is much better way than trying to squeeze
> it into a method name (which are then very confusing and complex,
> like test_init_details_set_via_params_output() or
> test_update_result_no_override_higher_priority_fail_pass() - that's
> not readable at all; actually it looks like Java!:-) ). I think
> people would spend more time decrypting the method names than reading
> the docstring.
Yeah, some of the method names are a bit long and obtuse. I could have
chosen better names. I fixed the worst one in the new diff, though :)
I'm generally of the mind that folks shouldn't have to dive into
docstrings on tests in order to understand what is being tested. It is
unavoidable in some cases where you have complex tests but that's
usually a test or code smell which should at least be acknowledged.
> >
> > 2. By splitting up the tests, almost all the assertions are always
> > run. When they're grouped together, the test fails @ the first
> > failed assertion and not all assertions are run.
>
> That's true, but if we have zero-failed-test policy, why does it
> matter? If you patch introduced a problem, you need to fix it, and
> you might discover another one once you do. Yes, it might be a
> sequence of steps rather than seeing everything at once, but at the
> benefit of test readability (items logically grouped together, rather
> than one assertion per method). Also, I'm testing a part of a single
> function in one method, it's not like it can have large consequences.
> I understand why splitting larger pieces to smaller pieces is good,
> but I don't understand why splitting it into one line per method is
> good. All of those tests are trivial - simple functionality of simple
> methods.
One of my counter-points here is that if the tests are so trivial, do
we really need to have them? After a certain point, we can add tests
but if they aren't incredibly meaningful, we'd just be adding
maintenance burden and not gaining much from the additional coverage.
> They have a dozen lines, a few asserts. An error in the
> first line can hide an error or two in the remaining eleven lines,
> but you'll discover that soon enough, once you fix the code and run
> the tests again. Is that a problem?
In this particular case, not a huge problem - the code under test is
relatively trivial. That being said, I don't really want to start off
with tests that
> > I've gone through and made comments in the diff in addition to
> > splitting up most of the tests.
> >
> > As I'm writing this, I had a couple of problems with the fixtures
> > when using my system py.test which go away with py.test 2.5.2 in a
> > virtualenv. This may be some issue with my local env but I wanted to
> > send off the diff instead of sitting on it another day. I can take
> > another look at the tests if they're causing broader problems.
>
> The fixtures are interesting, thanks. I don't really understand why
> it needs 4 methods instead of just 2, but I'll look at it. I'm not
> sure it creates a more readable code in this case, but it will be
> definitely useful in certain cases.
It was a choice on my part not to make the test code any more
complicated than it already is. I may be missing something but AFAIK,
you can't have the same fixture handle different lists in different
situations. Differentiating between the items which should evaluate
True versus False can be done by pairing the list items with an
indicator but I don't really see how that improves things.
> >
> > The diff is attached to this email
> >
> > Tim
> >
>
> I think you're going way to the extreme. When Josef saw how you had
> split test_cmp_result(), he crawled under the desk and didn't want to
> look at it ever again :-) Martin also said he considers it broken
> down too much.
Overreact much? :-P
> We've looked for some good testing practices and we've
> found no actual recommendation to have just a single assert per
> method. Multiple asserts are totally OK. And they make the code much
> more readable. (Also, I think that naming methods by their input
> params, like test_cmp_result_greater_failed_passed(), is not a good
> practice. Instead of having a clear name explaining _purpose_, it
> describes the contents - which you can easily see right on the next
> line).
I may have gone a little too far and not spent enough time on the
tests. I agree that some of the test names could be better and that
there's not a whole lot of benefit to being rigid about "one assert per
test" regardless of whether or not it's generally accepted as a good
thing.
I'll work on updating the diff again with an emphasis on not just
providing an example - removing tests that I don't think are worth
keeping and combining some of the trivial tests.
> In some cases, you've actually lowered the test coverage by splitting
> it up. For example this:
>
> + def test_broken_crashed(self):
> self.cd.result = 'CRASHED'
> assert self.cd.broken()
>
> + def test_broken_failed(self):
> self.cd.result = 'FAILED'
> assert not self.cd.broken()
>
> Before, I was actually testing that a transition from broken to not
> broken state works. By separating those two pieces, we've lost the
> test (I admit I should have had an inline comment in there explaining
> that, my fault). Of course, we can create yet another method for it:
>
> def test_broken_broken_to_not_broken(self):
> self.cd.result = 'CRASHED'
> assert self.cd.broken()
> self.cd.result = 'FAILED'
> assert not self.cd.broken()
>
> But what's the point? It's the same code! Do you see? Sometimes
> interactions between lines are actually beneficial, because you can
> test more than with just single-assert testing. You can of course add
> all those tests even with your approach, but at the expense of
> massive code duplication and a large loss of readability. I wouldn't
> like to maintain a test script that is several times longer just
> because of this. I don't see any actual benefits (that couldn't be
> fixed by adding comments in unclear places).
Technically, yes it is reducing coverage. However, I don't really see
how this particular coverage is all that useful. What do we gain by
testing that transition? The method under test is a simple comparison
to see if the current value is in ['ABORTED', 'CRASHED']. There's no
complexity here and I don't think it's worth keeping that test in
the code base. At most, I see a point in checking to see if a result of
'CRASHED' or 'ABORTED' makes .broken() return True and one of the other
values to make sure that it returns False.
> If you think that single-assert method testing is a really important
> topic, could you start a discussion on qa-devel? Because I'm not
> really happy about this approach, I think my previous code was
> actually more readable. Sorry to disagree with you, this is probably
> a highly subjective topic.
Honestly, I'm glad that we're having conversations around how to do
unit tests instead of whether to require unit tests or not :)
Some of this is subjective and I don't pretend to be an absolute
authority on what should/shouldn't be tested or how tests should be
written.
If I've given the impression that I want to see all unit tests stick to
one assertion per test in all cases no matter what, I apologize. While I
think that having one overall assertion per unit test is ideal, I also
think that it doesn't make sense in all cases. What I want to avoid is
having obtuse tests that have a bunch of assert statements that glob
multiple concerns into the same binary PASS/FAIL. In most cases, if a
test fails, it should be pretty obvious what failed and why.
I also want to avoid increasing the maintenance burden of having a
bunch of tests that look for things which don't really need to be
tested (setting data members, checking default values etc.)
10 years