Date: Wed, 28 Nov 2012 15:10:17 +0100
From: jhrozek@redhat.com
To: sssd-devel@lists.fedorahosted.org
Subject: Re: [SSSD] [PATCH] when monitor_quit is call and the process not exists

On Thu, Nov 22, 2012 at 12:48:25PM -0500, Ariel Barria wrote:
>
>
>
> > Date: Thu, 22 Nov 2012 15:42:59 +0100
> > From: jhrozek@redhat.com
> > To: sssd-devel@lists.fedorahosted.org
> > Subject: Re: [SSSD] [PATCH] when monitor_quit is call and the process not exists
> >
> > On Wed, Nov 21, 2012 at 03:15:43PM -0500, Ariel Barria wrote:
> > >
> > > Hi.
> > > well, I'm not sure that this patch have the behavior that is it hope. You have the word.
> > > But when monitor_quit is call and the process not exists, this remains in loop and not allow signal.
> > > I try it with Monitor Task and nothing.
> > > I had to reboot
> >
> > Thanks a lot, Ariel, this is quite a bad bug. The concept is good, but
> > can you change the patch a little?
> >
> > instead of:
> > if (error != EINTR) {
> > kill()
> > if (error == ESRCH || error == ECHILD) {
> > killed = true;
> > }
> > }
>
>
> OK.
>
> > I would use:
> > if (error == ECHILD) {
> > /* Skip this process */
> > killed = true;
> > } else if (error != EINTR) {
> > /* Keep the block with the kill function around */
> > }
> >
>
>
> > Also, are you sure about the ESRCH error code? The waitpid man page only
> > mentions ECHILD..
>
>
> well, really not is waipit.
> For example 1
> (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0040): Returned with: 1
> (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0020): Terminating [pam][7494]
> (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0020): Couldn't kill [pam][7494]: [3] [No such process] *********
> (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0010): [10][No child processes] while waiting for [pam] ****waitpid
> (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0020): Terminating [ssh][7493]
> (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0020): Couldn't kill [ssh][7493]: [3] [No such process]
> (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0010): [10][No child processes] while waiting for [ssh]
> (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0020): Terminating [nss][7492]
> (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0020): Couldn't kill [nss][7492]: [3] [No such process]
> (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0010): [10][No child processes] while waiting for [nss]
> (Thu Nov 22 10:17:44 2012) [sssd] [sbus_remove_watch] (0x2000): 0x9ee5f0/0x9f1430
>
> I think that if the process not exists, is not necessary waitpit. you is agree?
> Based on this, the result would be:
> Example 2
> Returned with: 1
> (Thu Nov 22 12:42:37 2012) [sssd] [monitor_quit] (0x0020): Terminating [nss][27350]
> (Thu Nov 22 12:42:37 2012) [sssd] [monitor_quit] (0x0020): Couldn't kill [nss][27350]: [No such process]
> (Thu Nov 22 12:42:37 2012) [sssd] [monitor_quit] (0x0020): Terminating [ssh][27349]
> (Thu Nov 22 12:42:37 2012) [sssd] [monitor_quit] (0x0020): Couldn't kill [ssh][27349]: [No such process]
> (Thu Nov 22 12:42:37 2012) [sssd] [monitor_quit] (0x0020): Terminating [pam][27348]
> (Thu Nov 22 12:42:37 2012) [sssd] [monitor_quit] (0x0020): Couldn't kill [pam][27348]: [No such process]
> (Thu Nov 22 12:42:37 2012) [sssd] [sbus_remove_watch] (0x2000): 0x237a080/0x237beb0
>
>
> otherwise (only ECHILD), the result would be how the first example.
>

I don't think having that extra line really matters, it's just a DEBUG
message (and only in the case when the configuration is wrong anyway).

What about the attached patch? It's the one you sent earlier with a
modification. Would you agree with that version? Would it fix the
problem for you?

Yes, this fix the problem :)
Thanks for comments and your time.

_______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel