On Wed 28 Nov 2012 09:35:10 AM EST, Jakub Hrozek wrote:
On Wed, Nov 28, 2012 at 09:31:29AM -0500, Ariel Barria wrote:
Date: Wed, 28 Nov 2012 15:10:17 +0100 From: jhrozek@redhat.com To: sssd-devel@lists.fedorahosted.org Subject: Re: [SSSD] [PATCH] when monitor_quit is call and the process not exists
On Thu, Nov 22, 2012 at 12:48:25PM -0500, Ariel Barria wrote:
Date: Thu, 22 Nov 2012 15:42:59 +0100 From: jhrozek@redhat.com To: sssd-devel@lists.fedorahosted.org Subject: Re: [SSSD] [PATCH] when monitor_quit is call and the process not exists
On Wed, Nov 21, 2012 at 03:15:43PM -0500, Ariel Barria wrote:
Hi. well, I'm not sure that this patch have the behavior that is it hope. You have the word. But when monitor_quit is call and the process not exists, this remains in loop and not allow signal. I try it with Monitor Task and nothing. I had to reboot
Thanks a lot, Ariel, this is quite a bad bug. The concept is good, but can you change the patch a little?
instead of: if (error != EINTR) { kill() if (error == ESRCH || error == ECHILD) { killed = true; } }
OK.
I would use: if (error == ECHILD) { /* Skip this process */ killed = true; } else if (error != EINTR) { /* Keep the block with the kill function around */ }
Also, are you sure about the ESRCH error code? The waitpid man page only mentions ECHILD..
well, really not is waipit. For example 1 (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0040): Returned with: 1 (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0020): Terminating [pam][7494] (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0020): Couldn't kill [pam][7494]: [3] [No such process] ********* (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0010): [10][No child processes] while waiting for [pam] ****waitpid (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0020): Terminating [ssh][7493] (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0020): Couldn't kill [ssh][7493]: [3] [No such process] (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0010): [10][No child processes] while waiting for [ssh] (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0020): Terminating [nss][7492] (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0020): Couldn't kill [nss][7492]: [3] [No such process] (Thu Nov 22 10:17:44 2012) [sssd] [monitor_quit] (0x0010): [10][No child processes] while waiting for [nss] (Thu Nov 22 10:17:44 2012) [sssd] [sbus_remove_watch] (0x2000): 0x9ee5f0/0x9f1430
I think that if the process not exists, is not necessary waitpit. you is agree? Based on this, the result would be: Example 2 Returned with: 1 (Thu Nov 22 12:42:37 2012) [sssd] [monitor_quit] (0x0020): Terminating [nss][27350] (Thu Nov 22 12:42:37 2012) [sssd] [monitor_quit] (0x0020): Couldn't kill [nss][27350]: [No such process] (Thu Nov 22 12:42:37 2012) [sssd] [monitor_quit] (0x0020): Terminating [ssh][27349] (Thu Nov 22 12:42:37 2012) [sssd] [monitor_quit] (0x0020): Couldn't kill [ssh][27349]: [No such process] (Thu Nov 22 12:42:37 2012) [sssd] [monitor_quit] (0x0020): Terminating [pam][27348] (Thu Nov 22 12:42:37 2012) [sssd] [monitor_quit] (0x0020): Couldn't kill [pam][27348]: [No such process] (Thu Nov 22 12:42:37 2012) [sssd] [sbus_remove_watch] (0x2000): 0x237a080/0x237beb0
otherwise (only ECHILD), the result would be how the first example.
I don't think having that extra line really matters, it's just a DEBUG message (and only in the case when the configuration is wrong anyway).
What about the attached patch? It's the one you sent earlier with a modification. Would you agree with that version? Would it fix the problem for you?
Yes, this fix the problem :) Thanks for comments and your time.
Thank you for reporting the bug and the contribution!
I'd like to see a second ack on the patch as it was kind of pair programming, so I don't think I can ack on my own.
Ack.