https://fedorahosted.org/sssd/ticket/1460
Please see the commit. I'm wondering if there is still a (small) race condition between the call to pthread_cleanup_pop() and unlocking the mutex. Would it be better to i.e. always call the cleanup handler with pthread_cleanup_pop(1) and disconnect from the fd based on some other condition?
On Fri, Aug 10, 2012 at 06:40:35PM +0200, Jakub Hrozek wrote:
https://fedorahosted.org/sssd/ticket/1460
Please see the commit. I'm wondering if there is still a (small) race condition between the call to pthread_cleanup_pop() and unlocking the mutex. Would it be better to i.e. always call the cleanup handler with pthread_cleanup_pop(1) and disconnect from the fd based on some other condition?
this patch works as expected for me on F17. I have some issues on F16, but this might have other reason I still try to investigate. While reading a bit about pthread I can across robust mutexes http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_... I wonder if we can better avoid races with those and maybe even make the implementation cleaner? E.g. if we get EOWNERDEAD we can call sss_cli_close_socket(); pthread_mutex_consistent(); run the request which finally will call pthread_mutex_unlock().
There are also pthread_mutexattr_getrobust_np() and pthread_mutex_consistent_np() in glibc. I do not have an idea how they differ and which one should be preferred. But the *_np version only need __USE_GNU while the others need __USE_XOPEN2K8.
bye, Sumit
On Mon, Aug 13, 2012 at 01:30:21PM +0200, Sumit Bose wrote:
On Fri, Aug 10, 2012 at 06:40:35PM +0200, Jakub Hrozek wrote:
https://fedorahosted.org/sssd/ticket/1460
Please see the commit. I'm wondering if there is still a (small) race condition between the call to pthread_cleanup_pop() and unlocking the mutex. Would it be better to i.e. always call the cleanup handler with pthread_cleanup_pop(1) and disconnect from the fd based on some other condition?
this patch works as expected for me on F17. I have some issues on F16, but this might have other reason I still try to investigate. While reading a bit about pthread I can across robust mutexes http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_... I wonder if we can better avoid races with those and maybe even make the implementation cleaner? E.g. if we get EOWNERDEAD we can call sss_cli_close_socket(); pthread_mutex_consistent(); run the request which finally will call pthread_mutex_unlock().
As discussed briefly on IRC, I agree. I like any approach that removes the pthread_cleanup_push/pop hack.
There are also pthread_mutexattr_getrobust_np() and pthread_mutex_consistent_np() in glibc. I do not have an idea how they differ and which one should be preferred. But the *_np version only need __USE_GNU while the others need __USE_XOPEN2K8.
bye, Sumit
RHEL5 only contains the _np versions, but they still seem to work. If the approach is deemed OK, I'll detect the available versions during configure.
Traditionally the _np suffix means "non-portable GNU extension".
Please see the attached patch. Would that work for you? If so, I'll extend it to work on all supported RHEL releases.
On Mon, Aug 13, 2012 at 04:40:03PM +0200, Jakub Hrozek wrote:
On Mon, Aug 13, 2012 at 01:30:21PM +0200, Sumit Bose wrote:
On Fri, Aug 10, 2012 at 06:40:35PM +0200, Jakub Hrozek wrote:
https://fedorahosted.org/sssd/ticket/1460
Please see the commit. I'm wondering if there is still a (small) race condition between the call to pthread_cleanup_pop() and unlocking the mutex. Would it be better to i.e. always call the cleanup handler with pthread_cleanup_pop(1) and disconnect from the fd based on some other condition?
this patch works as expected for me on F17. I have some issues on F16, but this might have other reason I still try to investigate. While reading a bit about pthread I can across robust mutexes http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_... I wonder if we can better avoid races with those and maybe even make the implementation cleaner? E.g. if we get EOWNERDEAD we can call sss_cli_close_socket(); pthread_mutex_consistent(); run the request which finally will call pthread_mutex_unlock().
As discussed briefly on IRC, I agree. I like any approach that removes the pthread_cleanup_push/pop hack.
There are also pthread_mutexattr_getrobust_np() and pthread_mutex_consistent_np() in glibc. I do not have an idea how they differ and which one should be preferred. But the *_np version only need __USE_GNU while the others need __USE_XOPEN2K8.
bye, Sumit
RHEL5 only contains the _np versions, but they still seem to work. If the approach is deemed OK, I'll detect the available versions during configure.
Traditionally the _np suffix means "non-portable GNU extension".
Please see the attached patch. Would that work for you? If so, I'll extend it to work on all supported RHEL releases.
Sumit nacked the patch on IRC. I was missing the disconnect logic when lock returns EOWNERDEAD. I also moved the mutexattr to sss_mt_init, I don't think there is any reason to keep it separate. It doesn't have to live as long as the mutex itself.
On Mon, Aug 13, 2012 at 06:13:24PM +0200, Jakub Hrozek wrote:
On Mon, Aug 13, 2012 at 04:40:03PM +0200, Jakub Hrozek wrote:
On Mon, Aug 13, 2012 at 01:30:21PM +0200, Sumit Bose wrote:
On Fri, Aug 10, 2012 at 06:40:35PM +0200, Jakub Hrozek wrote:
https://fedorahosted.org/sssd/ticket/1460
Please see the commit. I'm wondering if there is still a (small) race condition between the call to pthread_cleanup_pop() and unlocking the mutex. Would it be better to i.e. always call the cleanup handler with pthread_cleanup_pop(1) and disconnect from the fd based on some other condition?
this patch works as expected for me on F17. I have some issues on F16, but this might have other reason I still try to investigate. While reading a bit about pthread I can across robust mutexes http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_... I wonder if we can better avoid races with those and maybe even make the implementation cleaner? E.g. if we get EOWNERDEAD we can call sss_cli_close_socket(); pthread_mutex_consistent(); run the request which finally will call pthread_mutex_unlock().
As discussed briefly on IRC, I agree. I like any approach that removes the pthread_cleanup_push/pop hack.
There are also pthread_mutexattr_getrobust_np() and pthread_mutex_consistent_np() in glibc. I do not have an idea how they differ and which one should be preferred. But the *_np version only need __USE_GNU while the others need __USE_XOPEN2K8.
bye, Sumit
RHEL5 only contains the _np versions, but they still seem to work. If the approach is deemed OK, I'll detect the available versions during configure.
Traditionally the _np suffix means "non-portable GNU extension".
Please see the attached patch. Would that work for you? If so, I'll extend it to work on all supported RHEL releases.
Sumit nacked the patch on IRC. I was missing the disconnect logic when lock returns EOWNERDEAD. I also moved the mutexattr to sss_mt_init, I don't think there is any reason to keep it separate. It doesn't have to live as long as the mutex itself.
This patch works as expected on F17, so ACK.
I still have some strange behaviour on F16 and strangely in the case where the fast cache files in the /var/lib/sss/mc directory are available. Here the reproducer get stuck in a futex() call with or without the patch. I looks that glibc itself also uses a mutex here which is neither robust nor unlocked if the tread holding it is killed. I would be nice if someone with a F16 system can verify this.
bye, Sumit
On Wed, Aug 15, 2012 at 02:07:12PM +0200, Sumit Bose wrote:
On Mon, Aug 13, 2012 at 06:13:24PM +0200, Jakub Hrozek wrote:
On Mon, Aug 13, 2012 at 04:40:03PM +0200, Jakub Hrozek wrote:
On Mon, Aug 13, 2012 at 01:30:21PM +0200, Sumit Bose wrote:
On Fri, Aug 10, 2012 at 06:40:35PM +0200, Jakub Hrozek wrote:
https://fedorahosted.org/sssd/ticket/1460
Please see the commit. I'm wondering if there is still a (small) race condition between the call to pthread_cleanup_pop() and unlocking the mutex. Would it be better to i.e. always call the cleanup handler with pthread_cleanup_pop(1) and disconnect from the fd based on some other condition?
this patch works as expected for me on F17. I have some issues on F16, but this might have other reason I still try to investigate. While reading a bit about pthread I can across robust mutexes http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_... I wonder if we can better avoid races with those and maybe even make the implementation cleaner? E.g. if we get EOWNERDEAD we can call sss_cli_close_socket(); pthread_mutex_consistent(); run the request which finally will call pthread_mutex_unlock().
As discussed briefly on IRC, I agree. I like any approach that removes the pthread_cleanup_push/pop hack.
There are also pthread_mutexattr_getrobust_np() and pthread_mutex_consistent_np() in glibc. I do not have an idea how they differ and which one should be preferred. But the *_np version only need __USE_GNU while the others need __USE_XOPEN2K8.
bye, Sumit
RHEL5 only contains the _np versions, but they still seem to work. If the approach is deemed OK, I'll detect the available versions during configure.
Traditionally the _np suffix means "non-portable GNU extension".
Please see the attached patch. Would that work for you? If so, I'll extend it to work on all supported RHEL releases.
Sumit nacked the patch on IRC. I was missing the disconnect logic when lock returns EOWNERDEAD. I also moved the mutexattr to sss_mt_init, I don't think there is any reason to keep it separate. It doesn't have to live as long as the mutex itself.
This patch works as expected on F17, so ACK.
I still have some strange behaviour on F16 and strangely in the case where the fast cache files in the /var/lib/sss/mc directory are available. Here the reproducer get stuck in a futex() call with or without the patch. I looks that glibc itself also uses a mutex here which is neither robust nor unlocked if the tread holding it is killed. I would be nice if someone with a F16 system can verify this.
I forgot, just as a reminder, some code must be added to allow building on RHEL5.
bye, Sumit
On Thu, Aug 16, 2012 at 11:44:35AM +0200, Sumit Bose wrote:
On Wed, Aug 15, 2012 at 02:07:12PM +0200, Sumit Bose wrote:
On Mon, Aug 13, 2012 at 06:13:24PM +0200, Jakub Hrozek wrote:
On Mon, Aug 13, 2012 at 04:40:03PM +0200, Jakub Hrozek wrote:
On Mon, Aug 13, 2012 at 01:30:21PM +0200, Sumit Bose wrote:
On Fri, Aug 10, 2012 at 06:40:35PM +0200, Jakub Hrozek wrote:
https://fedorahosted.org/sssd/ticket/1460
Please see the commit. I'm wondering if there is still a (small) race condition between the call to pthread_cleanup_pop() and unlocking the mutex. Would it be better to i.e. always call the cleanup handler with pthread_cleanup_pop(1) and disconnect from the fd based on some other condition?
this patch works as expected for me on F17. I have some issues on F16, but this might have other reason I still try to investigate. While reading a bit about pthread I can across robust mutexes http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_... I wonder if we can better avoid races with those and maybe even make the implementation cleaner? E.g. if we get EOWNERDEAD we can call sss_cli_close_socket(); pthread_mutex_consistent(); run the request which finally will call pthread_mutex_unlock().
As discussed briefly on IRC, I agree. I like any approach that removes the pthread_cleanup_push/pop hack.
There are also pthread_mutexattr_getrobust_np() and pthread_mutex_consistent_np() in glibc. I do not have an idea how they differ and which one should be preferred. But the *_np version only need __USE_GNU while the others need __USE_XOPEN2K8.
bye, Sumit
RHEL5 only contains the _np versions, but they still seem to work. If the approach is deemed OK, I'll detect the available versions during configure.
Traditionally the _np suffix means "non-portable GNU extension".
Please see the attached patch. Would that work for you? If so, I'll extend it to work on all supported RHEL releases.
Sumit nacked the patch on IRC. I was missing the disconnect logic when lock returns EOWNERDEAD. I also moved the mutexattr to sss_mt_init, I don't think there is any reason to keep it separate. It doesn't have to live as long as the mutex itself.
This patch works as expected on F17, so ACK.
I still have some strange behaviour on F16 and strangely in the case where the fast cache files in the /var/lib/sss/mc directory are available. Here the reproducer get stuck in a futex() call with or without the patch. I looks that glibc itself also uses a mutex here which is neither robust nor unlocked if the tread holding it is killed. I would be nice if someone with a F16 system can verify this.
I forgot, just as a reminder, some code must be added to allow building on RHEL5.
bye, Sumit
Attached is a patch that uses the _np variants on RHEL5. I've tested it on F-17, RHEL-5 (where I had to create the SELinux login directory b/c of https://fedorahosted.org/sssd/ticket/1492) and RHEL-6. All 64bit.
I haven't tested F-16 yet, but I'll spin up a VM.
sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel
On Wed, Aug 22, 2012 at 07:00:02PM +0200, Jakub Hrozek wrote:
On Tue, Aug 21, 2012 at 12:02:43PM +0200, Jakub Hrozek wrote:
I haven't tested F-16 yet, but I'll spin up a VM.
The patch worked fine for me on a 64bit F-16 machine..feel free to ping me off-list for access details.
Simo requested two variables to be renamed.
A new patch is attached.
On Thu, Aug 23, 2012 at 01:05:13PM +0200, Jakub Hrozek wrote:
On Wed, Aug 22, 2012 at 07:00:02PM +0200, Jakub Hrozek wrote:
On Tue, Aug 21, 2012 at 12:02:43PM +0200, Jakub Hrozek wrote:
I haven't tested F-16 yet, but I'll spin up a VM.
The patch worked fine for me on a 64bit F-16 machine..feel free to ping me off-list for access details.
Simo requested two variables to be renamed.
A new patch is attached.
ACK
bye, Sumit
On Mon, Aug 27, 2012 at 09:33:35AM +0200, Sumit Bose wrote:
On Thu, Aug 23, 2012 at 01:05:13PM +0200, Jakub Hrozek wrote:
On Wed, Aug 22, 2012 at 07:00:02PM +0200, Jakub Hrozek wrote:
On Tue, Aug 21, 2012 at 12:02:43PM +0200, Jakub Hrozek wrote:
I haven't tested F-16 yet, but I'll spin up a VM.
The patch worked fine for me on a 64bit F-16 machine..feel free to ping me off-list for access details.
Simo requested two variables to be renamed.
A new patch is attached.
ACK
bye, Sumit
Pushed to master.
On Mon, Aug 27, 2012 at 04:34:52PM +0200, Jakub Hrozek wrote:
On Mon, Aug 27, 2012 at 09:33:35AM +0200, Sumit Bose wrote:
On Thu, Aug 23, 2012 at 01:05:13PM +0200, Jakub Hrozek wrote:
On Wed, Aug 22, 2012 at 07:00:02PM +0200, Jakub Hrozek wrote:
On Tue, Aug 21, 2012 at 12:02:43PM +0200, Jakub Hrozek wrote:
I haven't tested F-16 yet, but I'll spin up a VM.
The patch worked fine for me on a 64bit F-16 machine..feel free to ping me off-list for access details.
Simo requested two variables to be renamed.
A new patch is attached.
ACK
bye, Sumit
Pushed to master.
Also pushed to sssd-1-8
sssd-devel@lists.fedorahosted.org