Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value.
src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py index 757ee20..8ec8dbe 100644 --- a/src/tests/intg/ldap_test.py +++ b/src/tests/intg/ldap_test.py @@ -33,7 +33,11 @@ import ldap_ent from util import *
LDAP_BASE_DN = "dc=example,dc=com" -INTERACTIVE_TIMEOUT = 4 +INTERACTIVE_TIMEOUT = 2
+def wait_for_ldap_enum_refresh():
- time.sleep(INTERACTIVE_TIMEOUT + 4)
Why does it need to be INTERACTIVE_TIMEOUT + 4
Could it be INTERACTIVE_TIMEOUT + 3 or + 5
LS
On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value.
src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py index 757ee20..8ec8dbe 100644 --- a/src/tests/intg/ldap_test.py +++ b/src/tests/intg/ldap_test.py @@ -33,7 +33,11 @@ import ldap_ent from util import *
LDAP_BASE_DN = "dc=example,dc=com" -INTERACTIVE_TIMEOUT = 4 +INTERACTIVE_TIMEOUT = 2
+def wait_for_ldap_enum_refresh():
- time.sleep(INTERACTIVE_TIMEOUT + 4)
Why does it need to be INTERACTIVE_TIMEOUT + 4
Could it be INTERACTIVE_TIMEOUT + 3 or + 5
Regardless of the value we choose, can we move this patch forward? I see the related failure quite often in SSSD.
On (03/12/15 20:22), Jakub Hrozek wrote:
On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value.
src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py index 757ee20..8ec8dbe 100644 --- a/src/tests/intg/ldap_test.py +++ b/src/tests/intg/ldap_test.py @@ -33,7 +33,11 @@ import ldap_ent from util import *
LDAP_BASE_DN = "dc=example,dc=com" -INTERACTIVE_TIMEOUT = 4 +INTERACTIVE_TIMEOUT = 2
+def wait_for_ldap_enum_refresh():
- time.sleep(INTERACTIVE_TIMEOUT + 4)
Why does it need to be INTERACTIVE_TIMEOUT + 4
Could it be INTERACTIVE_TIMEOUT + 3 or + 5
Regardless of the value we choose, can we move this patch forward? I see the related failure quite often in SSSD.
Adding timeout without real explanation is not a solution.
The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles.
The ideal solution would be to get rid of enumeration in ldap tests. If we want to test enumeration than it should be in separate test. I'm not sure we would be able to get rid of "sleep()" in enumeration test but all such values should pre properly documented why it is big enough ...
LS
On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
On (03/12/15 20:22), Jakub Hrozek wrote:
On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value.
src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py index 757ee20..8ec8dbe 100644 --- a/src/tests/intg/ldap_test.py +++ b/src/tests/intg/ldap_test.py @@ -33,7 +33,11 @@ import ldap_ent from util import *
LDAP_BASE_DN = "dc=example,dc=com" -INTERACTIVE_TIMEOUT = 4 +INTERACTIVE_TIMEOUT = 2
+def wait_for_ldap_enum_refresh():
- time.sleep(INTERACTIVE_TIMEOUT + 4)
Why does it need to be INTERACTIVE_TIMEOUT + 4
Could it be INTERACTIVE_TIMEOUT + 3 or + 5
Regardless of the value we choose, can we move this patch forward? I see the related failure quite often in SSSD.
Adding timeout without real explanation is not a solution.
The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles.
The ideal solution would be to get rid of enumeration in ldap tests.
Enumeration is a codepath that is different from non-enumeration, so it should be tested. Not as priority, not as the only ldap tests, but it's a valid case, so it should be there.
If we want to test enumeration than it should be in separate test.
Maybe, but we do have enumeration tests and we should fix them.
I'm not sure we would be able to get rid of "sleep()" in enumeration test but all such values should pre properly documented why it is big enough ...
LS _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
On (04/12/15 12:11), Jakub Hrozek wrote:
On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
On (03/12/15 20:22), Jakub Hrozek wrote:
On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value.
src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py index 757ee20..8ec8dbe 100644 --- a/src/tests/intg/ldap_test.py +++ b/src/tests/intg/ldap_test.py @@ -33,7 +33,11 @@ import ldap_ent from util import *
LDAP_BASE_DN = "dc=example,dc=com" -INTERACTIVE_TIMEOUT = 4 +INTERACTIVE_TIMEOUT = 2
+def wait_for_ldap_enum_refresh():
- time.sleep(INTERACTIVE_TIMEOUT + 4)
Why does it need to be INTERACTIVE_TIMEOUT + 4
Could it be INTERACTIVE_TIMEOUT + 3 or + 5
Regardless of the value we choose, can we move this patch forward? I see the related failure quite often in SSSD.
Adding timeout without real explanation is not a solution.
The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles.
The ideal solution would be to get rid of enumeration in ldap tests.
Enumeration is a codepath that is different from non-enumeration, so it should be tested. Not as priority, not as the only ldap tests, but it's a valid case, so it should be there.
If we want to test enumeration than it should be in separate test.
Maybe, but we do have enumeration tests and we should fix them.
Adding sleep is not a fix. It's just a workaround because all sleep timeout are just an empiric values. and we should fix test and not adding workaround/hacks.
If we cannot fix the test and don't want te rewrite test without enumeration then we should remove/revert problematic tests.
LS
On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
On (04/12/15 12:11), Jakub Hrozek wrote:
On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
On (03/12/15 20:22), Jakub Hrozek wrote:
On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value.
src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py index 757ee20..8ec8dbe 100644 --- a/src/tests/intg/ldap_test.py +++ b/src/tests/intg/ldap_test.py @@ -33,7 +33,11 @@ import ldap_ent from util import *
LDAP_BASE_DN = "dc=example,dc=com" -INTERACTIVE_TIMEOUT = 4 +INTERACTIVE_TIMEOUT = 2
+def wait_for_ldap_enum_refresh():
- time.sleep(INTERACTIVE_TIMEOUT + 4)
Why does it need to be INTERACTIVE_TIMEOUT + 4
Could it be INTERACTIVE_TIMEOUT + 3 or + 5
Regardless of the value we choose, can we move this patch forward? I see the related failure quite often in SSSD.
Adding timeout without real explanation is not a solution.
The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles.
The ideal solution would be to get rid of enumeration in ldap tests.
Enumeration is a codepath that is different from non-enumeration, so it should be tested. Not as priority, not as the only ldap tests, but it's a valid case, so it should be there.
If we want to test enumeration than it should be in separate test.
Maybe, but we do have enumeration tests and we should fix them.
Adding sleep is not a fix. It's just a workaround because all sleep timeout are just an empiric values. and we should fix test and not adding workaround/hacks.
If we cannot fix the test and don't want te rewrite test without enumeration then we should remove/revert problematic tests.
LS
I will send a new patch with an explanation (sort of), but it still will be a guess. I am not sure what the real safe value should be, only that the sleep's after operation should be longer than the ldap refresh and enum cache timeouts (and that the current values do not cope well wit the CI load).
Lukas is right that I pulled the values out of place where no knowledge resides so let us make a compromise. I will push this patch to the CI a lot of times (let's say 40-50) over the weekend and see if it fails.
I am also not sure if lowering the INTERACTIVE_TIMEOUT was a good idea, I did it in order to make the execution a little faster and it is a timeout that needs to be shorter than the sleeps that wait for the ldap change to propagate to sysdb (and I did not want the sleeps to be too long). With the weekend "stress testing" we can hopefully avoid additional adjustments with new patches.
Michal
On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
On (04/12/15 12:11), Jakub Hrozek wrote:
On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
On (03/12/15 20:22), Jakub Hrozek wrote:
On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote: >Hi! > >I saw some integration tests failures recently, >and I think there is a race condition between the >enumeration refresh timeout and the sleeps >after some operations that wait for this timeout. >SSSD fails to populate changes from LDAP in time >and some asserts can fail because of this. > >So far I saw 4 tests to fail like this, which >is already quite a lot. > >The attached patch modifies the timeout values >and hopefully removes the issue. > >Michal
>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 >From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com >Date: Wed, 2 Dec 2015 16:44:48 +0100 >Subject: [PATCH] ldap_test.py: Modify enum cache timeouts > >There is a race condation between ldap >enumeration refresh timeout and the sleeps >that wait for the ldap changes to populate >to SSSD if the timeout and the sleeps have >the same value. >--- >src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- >1 file changed, 17 insertions(+), 13 deletions(-) > >diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py >index 757ee20..8ec8dbe 100644 >--- a/src/tests/intg/ldap_test.py >+++ b/src/tests/intg/ldap_test.py >@@ -33,7 +33,11 @@ import ldap_ent >from util import * > >LDAP_BASE_DN = "dc=example,dc=com" >-INTERACTIVE_TIMEOUT = 4 >+INTERACTIVE_TIMEOUT = 2 >+ >+ >+def wait_for_ldap_enum_refresh(): >+ time.sleep(INTERACTIVE_TIMEOUT + 4) Why does it need to be INTERACTIVE_TIMEOUT + 4
Could it be INTERACTIVE_TIMEOUT + 3 or + 5
Regardless of the value we choose, can we move this patch forward? I see the related failure quite often in SSSD.
Adding timeout without real explanation is not a solution.
The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles.
The ideal solution would be to get rid of enumeration in ldap tests.
Enumeration is a codepath that is different from non-enumeration, so it should be tested. Not as priority, not as the only ldap tests, but it's a valid case, so it should be there.
If we want to test enumeration than it should be in separate test.
Maybe, but we do have enumeration tests and we should fix them.
Adding sleep is not a fix. It's just a workaround because all sleep timeout are just an empiric values. and we should fix test and not adding workaround/hacks.
If we cannot fix the test and don't want te rewrite test without enumeration then we should remove/revert problematic tests.
LS
I will send a new patch with an explanation (sort of), but it still will be a guess. I am not sure what the real safe value should be, only that the sleep's after operation should be longer than the ldap refresh and enum cache timeouts (and that the current values do not cope well wit the CI load).
Would it be more acceptable then to define the ldap refresh and enum cache timeouts as variables in the test and sleep for (enum_timeout + cache_timeout + 1) ?
At least that would be more readable than a magic constant..
Lukas is right that I pulled the values out of place where no knowledge resides so let us make a compromise. I will push this patch to the CI a lot of times (let's say 40-50) over the weekend and see if it fails.
I am also not sure if lowering the INTERACTIVE_TIMEOUT was a good idea, I did it in order to make the execution a little faster and it is a timeout that needs to be shorter than the sleeps that wait for the ldap change to propagate to sysdb (and I did not want the sleeps to be too long). With the weekend "stress testing" we can hopefully avoid additional adjustments with new patches.
Michal _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
On 12/04/2015 02:32 PM, Jakub Hrozek wrote:
On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
On (04/12/15 12:11), Jakub Hrozek wrote:
On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
On (03/12/15 20:22), Jakub Hrozek wrote:
On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: > On (02/12/15 17:10), Michal Židek wrote: >> Hi! >> >> I saw some integration tests failures recently, >> and I think there is a race condition between the >> enumeration refresh timeout and the sleeps >> after some operations that wait for this timeout. >> SSSD fails to populate changes from LDAP in time >> and some asserts can fail because of this. >> >> So far I saw 4 tests to fail like this, which >> is already quite a lot. >> >> The attached patch modifies the timeout values >> and hopefully removes the issue. >> >> Michal > > >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 >> From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com >> Date: Wed, 2 Dec 2015 16:44:48 +0100 >> Subject: [PATCH] ldap_test.py: Modify enum cache timeouts >> >> There is a race condation between ldap >> enumeration refresh timeout and the sleeps >> that wait for the ldap changes to populate >> to SSSD if the timeout and the sleeps have >> the same value. >> --- >> src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- >> 1 file changed, 17 insertions(+), 13 deletions(-) >> >> diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py >> index 757ee20..8ec8dbe 100644 >> --- a/src/tests/intg/ldap_test.py >> +++ b/src/tests/intg/ldap_test.py >> @@ -33,7 +33,11 @@ import ldap_ent > >from util import * >> >> LDAP_BASE_DN = "dc=example,dc=com" >> -INTERACTIVE_TIMEOUT = 4 >> +INTERACTIVE_TIMEOUT = 2 >> + >> + >> +def wait_for_ldap_enum_refresh(): >> + time.sleep(INTERACTIVE_TIMEOUT + 4) > Why does it need to be INTERACTIVE_TIMEOUT + 4 > > Could it be INTERACTIVE_TIMEOUT + 3 or + 5 >
Regardless of the value we choose, can we move this patch forward? I see the related failure quite often in SSSD.
Adding timeout without real explanation is not a solution.
The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles.
The ideal solution would be to get rid of enumeration in ldap tests.
Enumeration is a codepath that is different from non-enumeration, so it should be tested. Not as priority, not as the only ldap tests, but it's a valid case, so it should be there.
If we want to test enumeration than it should be in separate test.
Maybe, but we do have enumeration tests and we should fix them.
Adding sleep is not a fix. It's just a workaround because all sleep timeout are just an empiric values. and we should fix test and not adding workaround/hacks.
If we cannot fix the test and don't want te rewrite test without enumeration then we should remove/revert problematic tests.
LS
I will send a new patch with an explanation (sort of), but it still will be a guess. I am not sure what the real safe value should be, only that the sleep's after operation should be longer than the ldap refresh and enum cache timeouts (and that the current values do not cope well wit the CI load).
Would it be more acceptable then to define the ldap refresh and enum cache timeouts as variables in the test and sleep for (enum_timeout + cache_timeout + 1) ?
At least that would be more readable than a magic constant..
Will do. All will be derived from INTERACTIVE_TIMEOUT so that we need to change just one value if needed in the future.
Lukas is right that I pulled the values out of place where no knowledge resides so let us make a compromise. I will push this patch to the CI a lot of times (let's say 40-50) over the weekend and see if it fails.
I am also not sure if lowering the INTERACTIVE_TIMEOUT was a good idea, I did it in order to make the execution a little faster and it is a timeout that needs to be shorter than the sleeps that wait for the ldap change to propagate to sysdb (and I did not want the sleeps to be too long). With the weekend "stress testing" we can hopefully avoid additional adjustments with new patches.
Michal _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
On (04/12/15 14:35), Michal Židek wrote:
On 12/04/2015 02:32 PM, Jakub Hrozek wrote:
On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
On (04/12/15 12:11), Jakub Hrozek wrote:
On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
On (03/12/15 20:22), Jakub Hrozek wrote: >On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: >>On (02/12/15 17:10), Michal Židek wrote: >>>Hi! >>> >>>I saw some integration tests failures recently, >>>and I think there is a race condition between the >>>enumeration refresh timeout and the sleeps >>>after some operations that wait for this timeout. >>>SSSD fails to populate changes from LDAP in time >>>and some asserts can fail because of this. >>> >>>So far I saw 4 tests to fail like this, which >>>is already quite a lot. >>> >>>The attached patch modifies the timeout values >>>and hopefully removes the issue. >>> >>>Michal >> >>>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 >>>From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com >>>Date: Wed, 2 Dec 2015 16:44:48 +0100 >>>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts >>> >>>There is a race condation between ldap >>>enumeration refresh timeout and the sleeps >>>that wait for the ldap changes to populate >>>to SSSD if the timeout and the sleeps have >>>the same value. >>>--- >>>src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- >>>1 file changed, 17 insertions(+), 13 deletions(-) >>> >>>diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py >>>index 757ee20..8ec8dbe 100644 >>>--- a/src/tests/intg/ldap_test.py >>>+++ b/src/tests/intg/ldap_test.py >>>@@ -33,7 +33,11 @@ import ldap_ent >>>from util import * >>> >>>LDAP_BASE_DN = "dc=example,dc=com" >>>-INTERACTIVE_TIMEOUT = 4 >>>+INTERACTIVE_TIMEOUT = 2 >>>+ >>>+ >>>+def wait_for_ldap_enum_refresh(): >>>+ time.sleep(INTERACTIVE_TIMEOUT + 4) >>Why does it need to be INTERACTIVE_TIMEOUT + 4 >> >>Could it be INTERACTIVE_TIMEOUT + 3 or + 5 >> > >Regardless of the value we choose, can we move this patch forward? I see >the related failure quite often in SSSD. Adding timeout without real explanation is not a solution.
The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles.
The ideal solution would be to get rid of enumeration in ldap tests.
Enumeration is a codepath that is different from non-enumeration, so it should be tested. Not as priority, not as the only ldap tests, but it's a valid case, so it should be there.
If we want to test enumeration than it should be in separate test.
Maybe, but we do have enumeration tests and we should fix them.
Adding sleep is not a fix. It's just a workaround because all sleep timeout are just an empiric values. and we should fix test and not adding workaround/hacks.
If we cannot fix the test and don't want te rewrite test without enumeration then we should remove/revert problematic tests.
LS
I will send a new patch with an explanation (sort of), but it still will be a guess. I am not sure what the real safe value should be, only that the sleep's after operation should be longer than the ldap refresh and enum cache timeouts (and that the current values do not cope well wit the CI load).
Would it be more acceptable then to define the ldap refresh and enum cache timeouts as variables in the test and sleep for (enum_timeout + cache_timeout + 1) ?
At least that would be more readable than a magic constant..
Will do. All will be derived from INTERACTIVE_TIMEOUT so that we need to change just one value if needed in the future.
Will it be reliable?
Will it work on slow(arm) machines?
I plan to run integration test in "%check" phase of rpm build. And koji/fedora has rpm machines.
I already have a POC patch.
LS
On 12/04/2015 03:07 PM, Lukas Slebodnik wrote:
On (04/12/15 14:35), Michal Židek wrote:
On 12/04/2015 02:32 PM, Jakub Hrozek wrote:
On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
On (04/12/15 12:11), Jakub Hrozek wrote:
On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: > On (03/12/15 20:22), Jakub Hrozek wrote: >> On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: >>> On (02/12/15 17:10), Michal Židek wrote: >>>> Hi! >>>> >>>> I saw some integration tests failures recently, >>>> and I think there is a race condition between the >>>> enumeration refresh timeout and the sleeps >>>> after some operations that wait for this timeout. >>>> SSSD fails to populate changes from LDAP in time >>>> and some asserts can fail because of this. >>>> >>>> So far I saw 4 tests to fail like this, which >>>> is already quite a lot. >>>> >>>> The attached patch modifies the timeout values >>>> and hopefully removes the issue. >>>> >>>> Michal >>> >>> >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 >>>> From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com >>>> Date: Wed, 2 Dec 2015 16:44:48 +0100 >>>> Subject: [PATCH] ldap_test.py: Modify enum cache timeouts >>>> >>>> There is a race condation between ldap >>>> enumeration refresh timeout and the sleeps >>>> that wait for the ldap changes to populate >>>> to SSSD if the timeout and the sleeps have >>>> the same value. >>>> --- >>>> src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- >>>> 1 file changed, 17 insertions(+), 13 deletions(-) >>>> >>>> diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py >>>> index 757ee20..8ec8dbe 100644 >>>> --- a/src/tests/intg/ldap_test.py >>>> +++ b/src/tests/intg/ldap_test.py >>>> @@ -33,7 +33,11 @@ import ldap_ent >>> >from util import * >>>> >>>> LDAP_BASE_DN = "dc=example,dc=com" >>>> -INTERACTIVE_TIMEOUT = 4 >>>> +INTERACTIVE_TIMEOUT = 2 >>>> + >>>> + >>>> +def wait_for_ldap_enum_refresh(): >>>> + time.sleep(INTERACTIVE_TIMEOUT + 4) >>> Why does it need to be INTERACTIVE_TIMEOUT + 4 >>> >>> Could it be INTERACTIVE_TIMEOUT + 3 or + 5 >>> >> >> Regardless of the value we choose, can we move this patch forward? I see >> the related failure quite often in SSSD. > Adding timeout without real explanation is not a solution. > > The main problem is with empiric values. > If they are very high then test are slow. > And there still can be slow/fast machine where lower values caused troubles. > > The ideal solution would be to get rid of enumeration > in ldap tests.
Enumeration is a codepath that is different from non-enumeration, so it should be tested. Not as priority, not as the only ldap tests, but it's a valid case, so it should be there.
> If we want to test enumeration than it should be in separate > test.
Maybe, but we do have enumeration tests and we should fix them.
Adding sleep is not a fix. It's just a workaround because all sleep timeout are just an empiric values. and we should fix test and not adding workaround/hacks.
If we cannot fix the test and don't want te rewrite test without enumeration then we should remove/revert problematic tests.
LS
I will send a new patch with an explanation (sort of), but it still will be a guess. I am not sure what the real safe value should be, only that the sleep's after operation should be longer than the ldap refresh and enum cache timeouts (and that the current values do not cope well wit the CI load).
Would it be more acceptable then to define the ldap refresh and enum cache timeouts as variables in the test and sleep for (enum_timeout + cache_timeout + 1) ?
At least that would be more readable than a magic constant..
Will do. All will be derived from INTERACTIVE_TIMEOUT so that we need to change just one value if needed in the future.
Will it be reliable?
Will it work on slow(arm) machines?
I plan to run integration test in "%check" phase of rpm build. And koji/fedora has rpm machines.
We can mark tests that may fail on slow machines due to timeouts as "unsafe" and skip them in the rpm build. The simplest way to do it would be to use -k "not test_unsafe_" option in INTGCHECK_PYTEST_ARGS and the unsafe tests would have test_unsafe_ prefix. Would that work for you? It is something that we can start using immediately.
I already have a POC patch.
LS
On (04/12/15 16:02), Michal Židek wrote:
On 12/04/2015 03:07 PM, Lukas Slebodnik wrote:
On (04/12/15 14:35), Michal Židek wrote:
On 12/04/2015 02:32 PM, Jakub Hrozek wrote:
On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
On (04/12/15 12:11), Jakub Hrozek wrote: >On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: >>On (03/12/15 20:22), Jakub Hrozek wrote: >>>On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: >>>>On (02/12/15 17:10), Michal Židek wrote: >>>>>Hi! >>>>> >>>>>I saw some integration tests failures recently, >>>>>and I think there is a race condition between the >>>>>enumeration refresh timeout and the sleeps >>>>>after some operations that wait for this timeout. >>>>>SSSD fails to populate changes from LDAP in time >>>>>and some asserts can fail because of this. >>>>> >>>>>So far I saw 4 tests to fail like this, which >>>>>is already quite a lot. >>>>> >>>>>The attached patch modifies the timeout values >>>>>and hopefully removes the issue. >>>>> >>>>>Michal >>>> >>>>>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 >>>>>From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com >>>>>Date: Wed, 2 Dec 2015 16:44:48 +0100 >>>>>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts >>>>> >>>>>There is a race condation between ldap >>>>>enumeration refresh timeout and the sleeps >>>>>that wait for the ldap changes to populate >>>>>to SSSD if the timeout and the sleeps have >>>>>the same value. >>>>>--- >>>>>src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- >>>>>1 file changed, 17 insertions(+), 13 deletions(-) >>>>> >>>>>diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py >>>>>index 757ee20..8ec8dbe 100644 >>>>>--- a/src/tests/intg/ldap_test.py >>>>>+++ b/src/tests/intg/ldap_test.py >>>>>@@ -33,7 +33,11 @@ import ldap_ent >>>>>from util import * >>>>> >>>>>LDAP_BASE_DN = "dc=example,dc=com" >>>>>-INTERACTIVE_TIMEOUT = 4 >>>>>+INTERACTIVE_TIMEOUT = 2 >>>>>+ >>>>>+ >>>>>+def wait_for_ldap_enum_refresh(): >>>>>+ time.sleep(INTERACTIVE_TIMEOUT + 4) >>>>Why does it need to be INTERACTIVE_TIMEOUT + 4 >>>> >>>>Could it be INTERACTIVE_TIMEOUT + 3 or + 5 >>>> >>> >>>Regardless of the value we choose, can we move this patch forward? I see >>>the related failure quite often in SSSD. >>Adding timeout without real explanation is not a solution. >> >>The main problem is with empiric values. >>If they are very high then test are slow. >>And there still can be slow/fast machine where lower values caused troubles. >> >>The ideal solution would be to get rid of enumeration >>in ldap tests. > >Enumeration is a codepath that is different from non-enumeration, so it >should be tested. Not as priority, not as the only ldap tests, but it's >a valid case, so it should be there. > >>If we want to test enumeration than it should be in separate >>test. > >Maybe, but we do have enumeration tests and we should fix them. > Adding sleep is not a fix. It's just a workaround because all sleep timeout are just an empiric values. and we should fix test and not adding workaround/hacks.
If we cannot fix the test and don't want te rewrite test without enumeration then we should remove/revert problematic tests.
LS
I will send a new patch with an explanation (sort of), but it still will be a guess. I am not sure what the real safe value should be, only that the sleep's after operation should be longer than the ldap refresh and enum cache timeouts (and that the current values do not cope well wit the CI load).
Would it be more acceptable then to define the ldap refresh and enum cache timeouts as variables in the test and sleep for (enum_timeout + cache_timeout + 1) ?
At least that would be more readable than a magic constant..
Will do. All will be derived from INTERACTIVE_TIMEOUT so that we need to change just one value if needed in the future.
Will it be reliable?
Will it work on slow(arm) machines?
I plan to run integration test in "%check" phase of rpm build. And koji/fedora has rpm machines.
We can mark tests that may fail on slow machines due to timeouts as "unsafe" and skip them in the rpm build.
It's not about rpmbuild, it's a general problem. And marking tests as unsafe does not solve anything. We need a reliable way how to find out that enumeratin task is finished.
Even grepping log files in loop is better than using sleep
LS
On (04/12/15 16:58), Lukas Slebodnik wrote:
On (04/12/15 16:02), Michal Židek wrote:
On 12/04/2015 03:07 PM, Lukas Slebodnik wrote:
On (04/12/15 14:35), Michal Židek wrote:
On 12/04/2015 02:32 PM, Jakub Hrozek wrote:
On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
On 12/04/2015 12:29 PM, Lukas Slebodnik wrote: >On (04/12/15 12:11), Jakub Hrozek wrote: >>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: >>>On (03/12/15 20:22), Jakub Hrozek wrote: >>>>On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: >>>>>On (02/12/15 17:10), Michal Židek wrote: >>>>>>Hi! >>>>>> >>>>>>I saw some integration tests failures recently, >>>>>>and I think there is a race condition between the >>>>>>enumeration refresh timeout and the sleeps >>>>>>after some operations that wait for this timeout. >>>>>>SSSD fails to populate changes from LDAP in time >>>>>>and some asserts can fail because of this. >>>>>> >>>>>>So far I saw 4 tests to fail like this, which >>>>>>is already quite a lot. >>>>>> >>>>>>The attached patch modifies the timeout values >>>>>>and hopefully removes the issue. >>>>>> >>>>>>Michal >>>>> >>>>>>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 >>>>>>From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com >>>>>>Date: Wed, 2 Dec 2015 16:44:48 +0100 >>>>>>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts >>>>>> >>>>>>There is a race condation between ldap >>>>>>enumeration refresh timeout and the sleeps >>>>>>that wait for the ldap changes to populate >>>>>>to SSSD if the timeout and the sleeps have >>>>>>the same value. >>>>>>--- >>>>>>src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- >>>>>>1 file changed, 17 insertions(+), 13 deletions(-) >>>>>> >>>>>>diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py >>>>>>index 757ee20..8ec8dbe 100644 >>>>>>--- a/src/tests/intg/ldap_test.py >>>>>>+++ b/src/tests/intg/ldap_test.py >>>>>>@@ -33,7 +33,11 @@ import ldap_ent >>>>>>from util import * >>>>>> >>>>>>LDAP_BASE_DN = "dc=example,dc=com" >>>>>>-INTERACTIVE_TIMEOUT = 4 >>>>>>+INTERACTIVE_TIMEOUT = 2 >>>>>>+ >>>>>>+ >>>>>>+def wait_for_ldap_enum_refresh(): >>>>>>+ time.sleep(INTERACTIVE_TIMEOUT + 4) >>>>>Why does it need to be INTERACTIVE_TIMEOUT + 4 >>>>> >>>>>Could it be INTERACTIVE_TIMEOUT + 3 or + 5 >>>>> >>>> >>>>Regardless of the value we choose, can we move this patch forward? I see >>>>the related failure quite often in SSSD. >>>Adding timeout without real explanation is not a solution. >>> >>>The main problem is with empiric values. >>>If they are very high then test are slow. >>>And there still can be slow/fast machine where lower values caused troubles. >>> >>>The ideal solution would be to get rid of enumeration >>>in ldap tests. >> >>Enumeration is a codepath that is different from non-enumeration, so it >>should be tested. Not as priority, not as the only ldap tests, but it's >>a valid case, so it should be there. >> >>>If we want to test enumeration than it should be in separate >>>test. >> >>Maybe, but we do have enumeration tests and we should fix them. >> >Adding sleep is not a fix. It's just a workaround >because all sleep timeout are just an empiric values. >and we should fix test and not adding workaround/hacks. > >If we cannot fix the test and don't want te rewrite test without enumeration >then we should remove/revert problematic tests. > >LS
I will send a new patch with an explanation (sort of), but it still will be a guess. I am not sure what the real safe value should be, only that the sleep's after operation should be longer than the ldap refresh and enum cache timeouts (and that the current values do not cope well wit the CI load).
Would it be more acceptable then to define the ldap refresh and enum cache timeouts as variables in the test and sleep for (enum_timeout + cache_timeout + 1) ?
At least that would be more readable than a magic constant..
Will do. All will be derived from INTERACTIVE_TIMEOUT so that we need to change just one value if needed in the future.
Will it be reliable?
Will it work on slow(arm) machines?
I plan to run integration test in "%check" phase of rpm build. And koji/fedora has rpm machines.
We can mark tests that may fail on slow machines due to timeouts as "unsafe" and skip them in the rpm build.
It's not about rpmbuild, it's a general problem. And marking tests as unsafe does not solve anything. We need a reliable way how to find out that enumeratin task is finished.
Even grepping log files in loop is better than using sleep
I seems that nobody wants to rewrite test to get rid of enumeration. adding/increasing timeouts is not reliable. I have some bad experiences in downstream.
So the ideal solution might be to disable unstable tests. If there will not be better solution I will send patch in the middle of next week.
LS
On 12/04/2015 12:42 PM, Lukas Slebodnik wrote:
On (03/12/15 20:22), Jakub Hrozek wrote:
On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value.
src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py index 757ee20..8ec8dbe 100644 --- a/src/tests/intg/ldap_test.py +++ b/src/tests/intg/ldap_test.py @@ -33,7 +33,11 @@ import ldap_ent from util import *
LDAP_BASE_DN = "dc=example,dc=com" -INTERACTIVE_TIMEOUT = 4 +INTERACTIVE_TIMEOUT = 2
+def wait_for_ldap_enum_refresh():
- time.sleep(INTERACTIVE_TIMEOUT + 4)
Why does it need to be INTERACTIVE_TIMEOUT + 4
Could it be INTERACTIVE_TIMEOUT + 3 or + 5
Regardless of the value we choose, can we move this patch forward? I see the related failure quite often in SSSD.
Adding timeout without real explanation is not a solution.
The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles.
The ideal solution would be to get rid of enumeration in ldap tests. If we want to test enumeration than it should be in separate test. I'm not sure we would be able to get rid of "sleep()" in enumeration test but all such values should pre properly documented why it is big enough ...
Yeah, these timeouts are messy.
The problem is not the empiric timeout values themselves but rather the guesswork of when certain cache state changes are supposed to happen in relation to them. If we can reason about event deadlines then we can use them.
If not, and the code is too complicated, can we perhaps introduce some explicit synchronization with sssd caching mechanism, where sssd behavior will drive the tests? E.g. the tests would actually wait for sssd to do something with the cache and after sssd reports it is done, the tests will verify the time and the result?
We would still get to test that sssd does something we need in the expected timeframe, but we can make the tests faster - i.e. as fast as sssd can perform and be configured to.
Perhaps add something to the sss_cache tool?
Nick
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
I think I found alternative solution for intermittent failures of ADD REMOVE test with enumeration.
Occurence of failure was increased recently therefore please review attached patch.
I know that Jakub wanted/tried to rewrite test without enumeration. But I was mentioned in this thread that testing on enumeration is valid case.
LS
On 04/18/2016 10:39 AM, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
I think I found alternative solution for intermittent failures of ADD REMOVE test with enumeration.
Not sure if this is safer than my patch. I made the timeouts bigger on purpose, so that we avoid problems with machines under heavy load.
You decided to go the opposite direction by making one of the timeouts shorter.
That being said, maybe your patch is better, because if it does not fail even under heavy load it will make the tests shorter.
Once the CI is up we can try 20 test runs with your patch and if they do not fail I will give you an ACK.
If they fail, we can still fallback to my patch.
Occurence of failure was increased recently therefore please review attached patch.
I know that Jakub wanted/tried to rewrite test without enumeration. But I was mentioned in this thread that testing on enumeration is valid case.
We would have to test the enumeration sooner or later anyway.
LS
On (18/04/16 11:14), Michal Židek wrote:
On 04/18/2016 10:39 AM, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
I think I found alternative solution for intermittent failures of ADD REMOVE test with enumeration.
Not sure if this is safer than my patch. I made the timeouts bigger on purpose, so that we avoid problems with machines under heavy load.
You decided to go the opposite direction by making one of the timeouts shorter.
That being said, maybe your patch is better, because if it does not fail even under heavy load it will make the tests shorter.
Once the CI is up we can try 20 test runs with your patch and if they do not fail I will give you an ACK.
If they fail, we can still fallback to my patch.
No, We will need to find another solution. Increasing timeout is not acceptable from long term perspective. We need to have faster tests and not slower.
LS
On 04/18/2016 12:22 PM, Lukas Slebodnik wrote:
On (18/04/16 11:14), Michal Židek wrote:
On 04/18/2016 10:39 AM, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
I think I found alternative solution for intermittent failures of ADD REMOVE test with enumeration.
Not sure if this is safer than my patch. I made the timeouts bigger on purpose, so that we avoid problems with machines under heavy load.
You decided to go the opposite direction by making one of the timeouts shorter.
That being said, maybe your patch is better, because if it does not fail even under heavy load it will make the tests shorter.
Once the CI is up we can try 20 test runs with your patch and if they do not fail I will give you an ACK.
If they fail, we can still fallback to my patch.
No, We will need to find another solution. Increasing timeout is not acceptable from long term perspective. We need to have faster tests and not slower.
LS
Then lets hope your patch will work :D
On 04/18/2016 12:39 PM, Michal Židek wrote:
On 04/18/2016 12:22 PM, Lukas Slebodnik wrote:
On (18/04/16 11:14), Michal Židek wrote:
On 04/18/2016 10:39 AM, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
I think I found alternative solution for intermittent failures of ADD REMOVE test with enumeration.
Not sure if this is safer than my patch. I made the timeouts bigger on purpose, so that we avoid problems with machines under heavy load.
You decided to go the opposite direction by making one of the timeouts shorter.
That being said, maybe your patch is better, because if it does not fail even under heavy load it will make the tests shorter.
Once the CI is up we can try 20 test runs with your patch and if they do not fail I will give you an ACK.
If they fail, we can still fallback to my patch.
No, We will need to find another solution. Increasing timeout is not acceptable from long term perspective. We need to have faster tests and not slower.
LS
Then lets hope your patch will work :D
Your patch did not fix the issue. It still fails cca 1 out of 5 times because in the enumeration tests.
Maybe the timeout magic needs more mana or there is some other issue which we do not see.
Michal
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value.
src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
Houston, we have a problem.
You patch does nto work either :-(
http://sssd-ci.duckdns.org/logs/job/42/59/summary.html rhel6 make-intgcheck ldap_test.py:496: test_add_remove_group_rfc2307_bis FAILED
http://sssd-ci.duckdns.org/logs/job/42/60/summary.html rhel6 make-intgcheck ldap_test.py:466: test_add_remove_user FAILED
http://sssd-ci.duckdns.org/logs/job/42/61/summary.html rhel6 make-intgcheck ldap_test.py:481: test_add_remove_group_rfc2307 FAILED
LS
On 04/22/2016 09:04 AM, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote:
Hi!
I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this.
So far I saw 4 tests to fail like this, which is already quite a lot.
The attached patch modifies the timeout values and hopefully removes the issue.
Michal
From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= mzidek@redhat.com Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value.
src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
Houston, we have a problem.
You patch does nto work either :-(
Hmmm... looks like my magic numbers went out of mana.
http://sssd-ci.duckdns.org/logs/job/42/59/summary.html rhel6 make-intgcheck ldap_test.py:496: test_add_remove_group_rfc2307_bis FAILED
http://sssd-ci.duckdns.org/logs/job/42/60/summary.html rhel6 make-intgcheck ldap_test.py:466: test_add_remove_user FAILED
http://sssd-ci.duckdns.org/logs/job/42/61/summary.html rhel6 make-intgcheck ldap_test.py:481: test_add_remove_group_rfc2307 FAILED
LS
sssd-devel@lists.fedorahosted.org