On Mon, Aug 13, 2012 at 04:40:03PM +0200, Jakub Hrozek wrote:
On Mon, Aug 13, 2012 at 01:30:21PM +0200, Sumit Bose wrote:
On Fri, Aug 10, 2012 at 06:40:35PM +0200, Jakub Hrozek wrote:
https://fedorahosted.org/sssd/ticket/1460
Please see the commit. I'm wondering if there is still a (small) race condition between the call to pthread_cleanup_pop() and unlocking the mutex. Would it be better to i.e. always call the cleanup handler with pthread_cleanup_pop(1) and disconnect from the fd based on some other condition?
this patch works as expected for me on F17. I have some issues on F16, but this might have other reason I still try to investigate. While reading a bit about pthread I can across robust mutexes http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_... I wonder if we can better avoid races with those and maybe even make the implementation cleaner? E.g. if we get EOWNERDEAD we can call sss_cli_close_socket(); pthread_mutex_consistent(); run the request which finally will call pthread_mutex_unlock().
As discussed briefly on IRC, I agree. I like any approach that removes the pthread_cleanup_push/pop hack.
There are also pthread_mutexattr_getrobust_np() and pthread_mutex_consistent_np() in glibc. I do not have an idea how they differ and which one should be preferred. But the *_np version only need __USE_GNU while the others need __USE_XOPEN2K8.
bye, Sumit
RHEL5 only contains the _np versions, but they still seem to work. If the approach is deemed OK, I'll detect the available versions during configure.
Traditionally the _np suffix means "non-portable GNU extension".
Please see the attached patch. Would that work for you? If so, I'll extend it to work on all supported RHEL releases.
Sumit nacked the patch on IRC. I was missing the disconnect logic when lock returns EOWNERDEAD. I also moved the mutexattr to sss_mt_init, I don't think there is any reason to keep it separate. It doesn't have to live as long as the mutex itself.