On (04/12/15 16:02), Michal Židek wrote:
On 12/04/2015 03:07 PM, Lukas Slebodnik wrote:
>On (04/12/15 14:35), Michal Židek wrote:
>>On 12/04/2015 02:32 PM, Jakub Hrozek wrote:
>>>On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
>>>>On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
>>>>>On (04/12/15 12:11), Jakub Hrozek wrote:
>>>>>>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
>>>>>>>On (03/12/15 20:22), Jakub Hrozek wrote:
>>>>>>>>On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik
wrote:
>>>>>>>>>On (02/12/15 17:10), Michal Židek wrote:
>>>>>>>>>>Hi!
>>>>>>>>>>
>>>>>>>>>>I saw some integration tests failures recently,
>>>>>>>>>>and I think there is a race condition between the
>>>>>>>>>>enumeration refresh timeout and the sleeps
>>>>>>>>>>after some operations that wait for this timeout.
>>>>>>>>>>SSSD fails to populate changes from LDAP in time
>>>>>>>>>>and some asserts can fail because of this.
>>>>>>>>>>
>>>>>>>>>>So far I saw 4 tests to fail like this, which
>>>>>>>>>>is already quite a lot.
>>>>>>>>>>
>>>>>>>>>>The attached patch modifies the timeout values
>>>>>>>>>>and hopefully removes the issue.
>>>>>>>>>>
>>>>>>>>>>Michal
>>>>>>>>>
>>>>>>>>>>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon
Sep 17 00:00:00 2001
>>>>>>>>>>From: =?UTF-8?q?Michal=20=C5=BDidek?=
<mzidek(a)redhat.com>
>>>>>>>>>>Date: Wed, 2 Dec 2015 16:44:48 +0100
>>>>>>>>>>Subject: [PATCH] ldap_test.py: Modify enum cache
timeouts
>>>>>>>>>>
>>>>>>>>>>There is a race condation between ldap
>>>>>>>>>>enumeration refresh timeout and the sleeps
>>>>>>>>>>that wait for the ldap changes to populate
>>>>>>>>>>to SSSD if the timeout and the sleeps have
>>>>>>>>>>the same value.
>>>>>>>>>>---
>>>>>>>>>>src/tests/intg/ldap_test.py | 30
+++++++++++++++++-------------
>>>>>>>>>>1 file changed, 17 insertions(+), 13 deletions(-)
>>>>>>>>>>
>>>>>>>>>>diff --git a/src/tests/intg/ldap_test.py
b/src/tests/intg/ldap_test.py
>>>>>>>>>>index 757ee20..8ec8dbe 100644
>>>>>>>>>>--- a/src/tests/intg/ldap_test.py
>>>>>>>>>>+++ b/src/tests/intg/ldap_test.py
>>>>>>>>>>@@ -33,7 +33,11 @@ import ldap_ent
>>>>>>>>>>from util import *
>>>>>>>>>>
>>>>>>>>>>LDAP_BASE_DN = "dc=example,dc=com"
>>>>>>>>>>-INTERACTIVE_TIMEOUT = 4
>>>>>>>>>>+INTERACTIVE_TIMEOUT = 2
>>>>>>>>>>+
>>>>>>>>>>+
>>>>>>>>>>+def wait_for_ldap_enum_refresh():
>>>>>>>>>>+ time.sleep(INTERACTIVE_TIMEOUT + 4)
>>>>>>>>>Why does it need to be INTERACTIVE_TIMEOUT + 4
>>>>>>>>>
>>>>>>>>>Could it be INTERACTIVE_TIMEOUT + 3 or + 5
>>>>>>>>>
>>>>>>>>
>>>>>>>>Regardless of the value we choose, can we move this patch
forward? I see
>>>>>>>>the related failure quite often in SSSD.
>>>>>>>Adding timeout without real explanation is not a solution.
>>>>>>>
>>>>>>>The main problem is with empiric values.
>>>>>>>If they are very high then test are slow.
>>>>>>>And there still can be slow/fast machine where lower values
caused troubles.
>>>>>>>
>>>>>>>The ideal solution would be to get rid of enumeration
>>>>>>>in ldap tests.
>>>>>>
>>>>>>Enumeration is a codepath that is different from non-enumeration,
so it
>>>>>>should be tested. Not as priority, not as the only ldap tests, but
it's
>>>>>>a valid case, so it should be there.
>>>>>>
>>>>>>>If we want to test enumeration than it should be in separate
>>>>>>>test.
>>>>>>
>>>>>>Maybe, but we do have enumeration tests and we should fix them.
>>>>>>
>>>>>Adding sleep is not a fix. It's just a workaround
>>>>>because all sleep timeout are just an empiric values.
>>>>>and we should fix test and not adding workaround/hacks.
>>>>>
>>>>>If we cannot fix the test and don't want te rewrite test without
enumeration
>>>>>then we should remove/revert problematic tests.
>>>>>
>>>>>LS
>>>>
>>>>I will send a new patch with an explanation (sort of),
>>>>but it still will be a guess. I am not sure what the
>>>>real safe value should be, only that the sleep's
>>>>after operation should be longer than the ldap
>>>>refresh and enum cache timeouts (and that the
>>>>current values do not cope well wit the CI load).
>>>
>>>Would it be more acceptable then to define the ldap refresh and enum
>>>cache timeouts as variables in the test and sleep for (enum_timeout +
>>>cache_timeout + 1) ?
>>>
>>>At least that would be more readable than a magic constant..
>>
>>Will do. All will be derived from INTERACTIVE_TIMEOUT
>>so that we need to change just one value if needed in the
>>future.
>>
>Will it be reliable?
>
>Will it work on slow(arm) machines?
>
>I plan to run integration test in "%check" phase
>of rpm build. And koji/fedora has rpm machines.
We can mark tests that may fail on slow machines
due to timeouts as "unsafe" and skip them in the
rpm build.
It's not about rpmbuild, it's a general problem.
And marking tests as unsafe does not solve anything.
We need a reliable way how to find out that enumeratin task is finished.
Even grepping log files in loop is better than using
sleep
LS