On Wed, Apr 20, 2016 at 09:59:57AM +0200, Jakub Hrozek wrote:
On Tue, Apr 05, 2016 at 02:54:10PM -0400, Simo Sorce wrote:
> On Tue, 2016-04-05 at 12:57 -0400, Simo Sorce wrote:
> > Thanks, IIRC the int-instead of enum use is intentional, I will look
> > at the others.
>
> The last coverity/clang thing is a false positive, but I initialized
> reply to NULL anyway, I expect now it will start complaining of possible
> NULL dereference :-)
>
> Attached find patches that fixes all other issues (hopefully), one of
> them simply dropped an entire function as it turned out I wasn't using
> it.
>
> Simo.
>
> --
> Simo Sorce * Red Hat, Inc * New York
Ugh, it took me too long to get to this review properly. Sorry about
this..
Since this patchset is large, I will send replies in batches.
> From e215e5534bb56f3887521443ce6c77d13ea3518d Mon Sep 17 00:00:00 2001
> From: Simo Sorce <simo(a)redhat.com>
> Date: Tue, 12 Jan 2016 20:07:59 -0500
> Subject: [PATCH 01/15] Util: Add watchdog helper
ACK
I know Pavel Brezina also reviewed this patch earlier in a separate thread,
so note to self: Add his RB :-)
> From 0dff46755af6063ed4b0339020ae5bb686692de1 Mon Sep 17 00:00:00 2001
> From: Simo Sorce <simo(a)redhat.com>
> Date: Tue, 12 Jan 2016 20:13:28 -0500
> Subject: [PATCH 02/15] Server: Enable Watchdog in all daemons
>
> This allows the services to self monitor.
>
> Related:
>
https://fedorahosted.org/sssd/ticket/2921
Is it intentional that we also enable the watchdog in monitor? I haven't
seen the sssd process being stuck and if it does, we probably have
bigger issues, so it's probably fine, I just need to remember to not
SIGSTOP sssd when testing anymore :)
Otherwise ack.
Actually, more questions...
Can you help me test this patch? I tried to inject sleep() into sssd_be
code and the sleep was just interrupted by the SIGRT delivery. With SSSD,
most of the time the process was stuck was because it was writing a lot of
data with fsync()/fdatasync(). I can't find any information in the Linux
fsync manpage on how fsync behaves wrt signals. openpub manpages indicate
that fsync would return EINTR, which worries me a bit..
> From c576e37c188ded932996c9714b18a71251ef1d63 Mon Sep 17 00:00:00 2001
> From: Simo Sorce <simo(a)redhat.com>
> Date: Wed, 13 Jan 2016 11:51:09 -0500
> Subject: [PATCH 03/15] Monitor: Remove ping infrastructure
Can we also remove the force_timeout option and the functions that use
it? Looking at monitor.c, they are only called from a single failure
situation which is ENOMEM, so I think it's actually a dead code, the
probability we ever call it is very low.
If you agree with removing the force_timeout but you're busy, let me know
and I can prepare a patch atop yours.
The only code that proved to be useful in the field of all this we are
about to remove is the diag_cmd(). We could use it to run pstack on the
'stuck' child to learn where it was actually stuck. But I'm not sure
it's possible to implement it since the watchdog handler is a "plain"
signal handler and the diag_cmd() forks and execs...
After reading man 7 signal, it seems that fork() and exec() are OK to be
called in a signal handler, but the question above is more urgent IMO..
Also, is there a reason to just deprecate the ping() method and not
remove it right away?
>
> That said, I still think it's worth removing the pings in favor of the
> watchdog, the pings proved to be too fragile..
>
> Maybe it would be better to file a ticket to check again if systemd
> watchdog can be used in the future?
>
> To be continued..
> _______________________________________________
> sssd-devel mailing list
> sssd-devel(a)lists.fedorahosted.org
>
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org