New subject: D19 Comments and Diff

Wednesday, 5 March 2014

For some background, I sent this email to Kamil yesterday with a diff
containing suggestions on splitting up the unit tests for CheckDetail
and python_utils. It didn't seem like a big deal to me at the time, so
I just sent it to him. Given the level and type of discussion that it
seems to be started, I agree with Kamil that this needs to be discussed
with a wider audience and I'm replying to the list.

The diff has been slightly tweaked with a comment and a test method
name change. I've left some commented out tests in the diff to show
where I think tests could be removed without getting rid of meaningful
test coverage.

I wasn't very clear on this in my first email, but I meant the diff
partially as an example and it still needs a bit of work. I'm planning
to update the diff and add more comments to the review.

The review that spawned this conversation is up at:
https://phab.qadevel.cloud.fedoraproject.org/D19

My comments are inline.

Tim

On Wed, 5 Mar 2014 07:21:09 -0500 (EST)
Kamil Paral <kparal(a)redhat.com&gt; wrote:

...
 > Kamil,
 > 
 > Instead of putting comments directly into D19, I figured it would be
 > easier to put them in the form of a diff.

 It might be better to discuss this publicly, so that others can also
 voice their opinions. You can attach files to Phab using Files, and
 then you can somehow easily reference them inside the comment, I
 guess. That might be a good way to upload a related diff. 
As stated above, agreed.

...
 > I realize this is going back on what I said last week, but I
think
 > that separating out the tests even farther would be for the best
 > for the following reasons:
 > 
 > 1. I still had a bit of trouble understanding what exactly all the
 > tests were supposed to be doing. I think that separating them out
 > and making the names more verbose will aid in maintainability when
 > other people end up reading the tests

 Can you specify which lines exactly were unclear to you? What if I
 provided docstrings for such methods? Because I think that using a
 docstring for explanation is much better way than trying to squeeze
 it into a method name (which are then very confusing and complex,
 like test_init_details_set_via_params_output() or
 test_update_result_no_override_higher_priority_fail_pass() - that's
 not readable at all; actually it looks like Java!:-) ). I think
 people would spend more time decrypting the method names than reading
 the docstring. 
Yeah, some of the method names are a bit long and obtuse. I could have
chosen better names. I fixed the worst one in the new diff, though :)

I'm generally of the mind that folks shouldn't have to dive into
docstrings on tests in order to understand what is being tested. It is
unavoidable in some cases where you have complex tests but that's
usually a test or code smell which should at least be acknowledged.

...
 > 
 > 2. By splitting up the tests, almost all the assertions are always
 > run. When they're grouped together, the test fails @ the first
 > failed assertion and not all assertions are run.

 That's true, but if we have zero-failed-test policy, why does it
 matter? If you patch introduced a problem, you need to fix it, and
 you might discover another one once you do. Yes, it might be a
 sequence of steps rather than seeing everything at once, but at the
 benefit of test readability (items logically grouped together, rather
 than one assertion per method). Also, I'm testing a part of a single
 function in one method, it's not like it can have large consequences.
 I understand why splitting larger pieces to smaller pieces is good,
 but I don't understand why splitting it into one line per method is
 good. All of those tests are trivial - simple functionality of simple
 methods. 
One of my counter-points here is that if the tests are so trivial, do
we really need to have them? After a certain point, we can add tests
but if they aren't incredibly meaningful, we'd just be adding
maintenance burden and not gaining much from the additional coverage.

...
 They have a dozen lines, a few asserts. An error in the
 first line can hide an error or two in the remaining eleven lines,
 but you'll discover that soon enough, once you fix the code and run
 the tests again. Is that a problem? 
In this particular case, not a huge problem - the code under test is
relatively trivial. That being said, I don't really want to start off
with tests that

...
 > I've gone through and made comments in the diff in addition
to
 > splitting up most of the tests.
 > 
 > As I'm writing this, I had a couple of problems with the fixtures
 > when using my system py.test which go away with py.test 2.5.2 in a
 > virtualenv. This may be some issue with my local env but I wanted to
 > send off the diff instead of sitting on it another day. I can take
 > another look at the tests if they're causing broader problems.

 The fixtures are interesting, thanks. I don't really understand why
 it needs 4 methods instead of just 2, but I'll look at it. I'm not
 sure it creates a more readable code in this case, but it will be
 definitely useful in certain cases. 
It was a choice on my part not to make the test code any more
complicated than it already is. I may be missing something but AFAIK,
you can't have the same fixture handle different lists in different
situations. Differentiating between the items which should evaluate
True versus False can be done by pairing the list items with an
indicator but I don't really see how that improves things.

...
 > 
 > The diff is attached to this email
 > 
 > Tim
 > 

 I think you're going way to the extreme. When Josef saw how you had
 split test_cmp_result(), he crawled under the desk and didn't want to
 look at it ever again :-) Martin also said he considers it broken
 down too much. 
Overreact much? :-P

...
 We've looked for some good testing practices and we've
 found no actual recommendation to have just a single assert per
 method. Multiple asserts are totally OK. And they make the code much
 more readable. (Also, I think that naming methods by their input
 params, like test_cmp_result_greater_failed_passed(), is not a good
 practice. Instead of having a clear name explaining _purpose_, it
 describes the contents - which you can easily see right on the next
 line). 
I may have gone a little too far and not spent enough time on the
tests. I agree that some of the test names could be better and that
there's not a whole lot of benefit to being rigid about "one assert per
test" regardless of whether or not it's generally accepted as a good
thing.

I'll work on updating the diff again with an emphasis on not just
providing an example - removing tests that I don't think are worth
keeping and combining some of the trivial tests.

...
 In some cases, you've actually lowered the test coverage by
splitting
 it up. For example this:

 +    def test_broken_crashed(self):
          self.cd.result = 'CRASHED'
          assert self.cd.broken()

 +    def test_broken_failed(self):
          self.cd.result = 'FAILED'
          assert not self.cd.broken()

 Before, I was actually testing that a transition from broken to not
 broken state works. By separating those two pieces, we've lost the
 test (I admit I should have had an inline comment in there explaining
 that, my fault). Of course, we can create yet another method for it:

      def test_broken_broken_to_not_broken(self):
          self.cd.result = 'CRASHED'
          assert self.cd.broken()
          self.cd.result = 'FAILED'
          assert not self.cd.broken()

 But what's the point? It's the same code! Do you see? Sometimes
 interactions between lines are actually beneficial, because you can
 test more than with just single-assert testing. You can of course add
 all those tests even with your approach, but at the expense of
 massive code duplication and a large loss of readability. I wouldn't
 like to maintain a test script that is several times longer just
 because of this. I don't see any actual benefits (that couldn't be
 fixed by adding comments in unclear places). 
Technically, yes it is reducing coverage. However, I don't really see
how this particular coverage is all that useful. What do we gain by
testing that transition? The method under test is a simple comparison
to see if the current value is in ['ABORTED', 'CRASHED']. There's no
complexity here and I don't think it's worth keeping that test in
the code base. At most, I see a point in checking to see if a result of
'CRASHED' or 'ABORTED' makes .broken() return True and one of the other
values to make sure that it returns False.

...
 If you think that single-assert method testing is a really important
 topic, could you start a discussion on qa-devel? Because I'm not
 really happy about this approach, I think my previous code was
 actually more readable. Sorry to disagree with you, this is probably
 a highly subjective topic. 
Honestly, I'm glad that we're having conversations around how to do
unit tests instead of whether to require unit tests or not :)

Some of this is subjective and I don't pretend to be an absolute
authority on what should/shouldn't be tested or how tests should be
written.

If I've given the impression that I want to see all unit tests stick to
one assertion per test in all cases no matter what, I apologize. While I
think that having one overall assertion per unit test is ideal, I also
think that it doesn't make sense in all cases. What I want to avoid is
having obtuse tests that have a bunch of assert statements that glob
multiple concerns into the same binary PASS/FAIL. In most cases, if a
test fails, it should be pretty obvious what failed and why.

I also want to avoid increasing the maintenance burden of having a
bunch of tests that look for things which don't really need to be
tested (setting data members, checking default values etc.)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: D19 Comments and Diff