Zombie processes

Thu Oct 7 04:15:14 UTC 2004

On Sun, Oct 03, 2004 at 10:13:08PM -0400, Thomas E. Dukes wrote:

> 
> I have several zombie processes but don't know how to determine who they
> belong to.  I have 3 each of [c++filt] and [addr2line].  They won't kill.
> 

As others have indicated you cannot kill them because they
are already dead.

A zombie process is a process that has finished but the 
parent process has not gathered up the exit status.
The reason they are still in the process table is
to deliver that final exit status to the process that
started it.

It is valuable to understand processes have parents and children.
Look for process id (PID) and parent process id (PPID) in a "ps -elf" listing.
It will show you relationships something like this:
   $ ps -efl | edit4clarityIhope
   F S UID   PID   PPID  C  PRI NI ADDR SZ WCHAN  STIME TTY  TIME      CMD
   1 S htt   3665     1  0  78   0 -    593 -     Oct01 ?    00:00:00 /usr/sbin/htt
   0 S htt   3666  3665  0  78   0 -    980 -     Oct01 ?    00:00:00 htt_server -nodaemon

The tool "pstree" will let you see an indented tree of such stuff.

This is useful for all programmers including: shell scripts, C
programs, Fortan, ruby, perl, awk, emacs, .....

See the man page for wait(), return(), exit().

A common cause of these is a shell script that 
starts a background process.  Something like this:

       sleep 10 &
       lpr file &

When the subprocess is finished it will return a number which signals
success,  failure or more.  In shell we might test the variable ($$).

For example grep:

    "Normally, exit status is 0 if selected lines are found and 1
    otherwise.  But the exit status is 2 if an error occurred, unless
    the -q or --quiet or --silent option is used and a selected line
    is found."

This exit status permits testing for success or failure and then doing
different things.  Perhaps process a file if and only if it contains
a specific string found by grep.

One of the things that init will do is issue a wait() system call for
processes that it inherits so they can pass on to the great bit bucket
in the sky.  Commonly init will tidy up after "sloppy" propgrammers
and they will never see their errrs ;-) ;-)

Not all zombies are the result of "sloppy" coding.  It may be that
the tool/ program will get to this bit of house cleaning later.
i.e. they can be good things in small numbers.

Also ps will trigger an implied wait() making it sort of difficult for
you to see your own zombies.  Because of the ways that interactive
shells work you might miss this issue in testing to see it later when
the script is run by cron inittab whatever... It is valuable to watch
for such cruft from a user account that is not "you".  Any loop that
does something "&" is a potential risk.

The exit2) man page....

       The function _exit terminates the calling process
       "immediately". Any open file descriptors belonging to the
       process are closed; any children of the process are inherited
       by process 1, init, and the process's parent is sent a
       SIGCHLD signal.

       The value status is returned to the parent process as the
       process’s exit status, and can be collected using one of the
       wait family of calls.

Summary:
It is good to track and check the exit status of your subtasks.

-- 
	T o m  M i t c h e l l 
	Me, I would "Rather" Not.