On Wed, Aug 15, 2012 at 02:07:12PM +0200, Sumit Bose wrote:
> On Mon, Aug 13, 2012 at 06:13:24PM +0200, Jakub Hrozek wrote:
> > On Mon, Aug 13, 2012 at 04:40:03PM +0200, Jakub Hrozek wrote:
> > > On Mon, Aug 13, 2012 at 01:30:21PM +0200, Sumit Bose wrote:
> > > > On Fri, Aug 10, 2012 at 06:40:35PM +0200, Jakub Hrozek wrote:
> > > > >
https://fedorahosted.org/sssd/ticket/1460
> > > > >
> > > > > Please see the commit. I'm wondering if there is still a
(small) race
> > > > > condition between the call to pthread_cleanup_pop() and
unlocking the
> > > > > mutex. Would it be better to i.e. always call the cleanup
handler with
> > > > > pthread_cleanup_pop(1) and disconnect from the fd based on some
other
> > > > > condition?
> > > >
> > > > this patch works as expected for me on F17. I have some issues on
F16,
> > > > but this might have other reason I still try to investigate. While
> > > > reading a bit about pthread I can across robust mutexes
> > > >
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexat...
> > > > I wonder if we can better avoid races with those and maybe even make
the
> > > > implementation cleaner? E.g. if we get EOWNERDEAD we can call
> > > > sss_cli_close_socket(); pthread_mutex_consistent(); run the request
> > > > which finally will call pthread_mutex_unlock().
> > > >
> > >
> > > As discussed briefly on IRC, I agree. I like any approach that removes
> > > the pthread_cleanup_push/pop hack.
> > >
> > > > There are also pthread_mutexattr_getrobust_np() and
> > > > pthread_mutex_consistent_np() in glibc. I do not have an idea how
they
> > > > differ and which one should be preferred. But the *_np version only
need
> > > > __USE_GNU while the others need __USE_XOPEN2K8.
> > > >
> > > > bye,
> > > > Sumit
> > >
> > > RHEL5 only contains the _np versions, but they still seem to work. If
> > > the approach is deemed OK, I'll detect the available versions during
> > > configure.
> > >
> > > Traditionally the _np suffix means "non-portable GNU
extension".
> > >
> > > Please see the attached patch. Would that work for you? If so, I'll
> > > extend it to work on all supported RHEL releases.
> >
> > Sumit nacked the patch on IRC. I was missing the disconnect logic when
> > lock returns EOWNERDEAD. I also moved the mutexattr to sss_mt_init, I
> > don't think there is any reason to keep it separate. It doesn't have
to
> > live as long as the mutex itself.
>
> This patch works as expected on F17, so ACK.
>
> I still have some strange behaviour on F16 and strangely in the case
> where the fast cache files in the /var/lib/sss/mc directory are
> available. Here the reproducer get stuck in a futex() call with or
> without the patch. I looks that glibc itself also uses a mutex here
> which is neither robust nor unlocked if the tread holding it is killed.
> I would be nice if someone with a F16 system can verify this.
>
I forgot, just as a reminder, some code must be added to allow building
on RHEL5.
bye,
Sumit
Attached is a patch that uses the _np variants on RHEL5. I've tested it
on F-17, RHEL-5 (where I had to create the SELinux login directory b/c
of
) and RHEL-6. All 64bit.
I haven't tested F-16 yet, but I'll spin up a VM.
_______________________________________________
sssd-devel mailing list
sssd-devel(a)lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel