Adam Litke has posted comments on this change.
Change subject: [WIP]change startup process for vdsm and supervdsm
......................................................................
Patch Set 4: Fails; I would prefer that you didn't submit this
(15 inline comments)
I performed the following tests with your patch:
* Normal service start, stop, restart: OK
* Kill vdsm, what happens? FAIL
vdsm process shuts down but remains in defunct state. Nothing is respawned. The
pokeParent function is unable to identify a problem with the vdsm process since 'kill
-0' still returns 0. See the end of this comment for an idea to fix this.
* Kill vdsm, then restart service: OK.
* Kill supervdsm, what happens? FAIL.
Traceback (most recent call last):
File "/usr/share/vdsm/vdsm", line 98, in startVdsm
serve_clients(log)
File "/usr/share/vdsm/vdsm", line 46, in serve_clients
cif = clientIF.clientIF(log)
File "/usr/share/vdsm/clientIF.py", line 96, in __init__
self._prepareBindings()
File "/usr/share/vdsm/clientIF.py", line 134, in _prepareBindings
self._loadBindingXMLRPC()
File "/usr/share/vdsm/clientIF.py", line 120, in _loadBindingXMLRPC
default_bridge)
File "/usr/share/vdsm/BindingXMLRPC.py", line 48, in __init__
self.server = self._createXMLRPCServer()
File "/usr/share/vdsm/BindingXMLRPC.py", line 137, in _createXMLRPCServer
requestHandler=LoggingHandler, logRequests=True)
File "/usr/lib64/python2.7/SimpleXMLRPCServer.py", line 590, in __init__
SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate)
File "/usr/lib64/python2.7/SocketServer.py", line 408, in __init__
self.server_bind()
File "/usr/lib64/python2.7/SocketServer.py", line 419, in server_bind
self.socket.bind(self.server_address)
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 98] Address already in use
I believe this happens because the vdsm child was still running and when supervdsm was
respawned it tried to start a new child.
* Kill supervdsm, then restart service: OK.
Would this be a better startup sequence...
1) As root, create a new process for supervdsmServer
2) Drop privileges in the parent
3) Start vdsm in the parent
Then, if vdsm detects a problem with supervdsm, it can just exit. This will cause respawn
to restart everything again. So if either process (vdsm, or supervdsm) dies, the whole
thing will be respawned. Maybe there are problems with my idea too. What do you think?
....................................................
File vdsm/supervdsm.py
Line 52: except:
This is a bad condition. I think there should be some logging of the exception that is
causing vdsm to shut down. Also, are you sure you want to do this for all types of
exceptions? I think it's ok but we should double-check.
Line 53: os.unlink(ADDRESS)
I don't understand this. Don't we want supervdsmServer to clean up it's own
socket file when it shuts itself down in response to this failure.
Am I missing something?
Line 54: os.kill(os.getpid(), signal.SIGTERM)
Is this all we need to do to properly shut down vdsm (a question for smarter reviewers
than I)? vdsm does have an API called prepareForShutdown() that performs a lot more work
to gracefully shut down the daemon. Maybe we need to look at using it instead.
Line 68: os.unlink(ADDRESS)
Again, I am dubious about removing the server's socket file from here.
Line 69: os.kill(os.getpid(), signal.SIGTERM)
prepareForShutdown again?
....................................................
File vdsm/supervdsmServer.py
Line 267
This function no longer operates on a parent. Let's rename it to 'pokeVdsm'
and change the arg to vdsmPid to make it more clear.
Line 300: #kill old supervdsm
Add a space between # and kill.
....................................................
File vdsm/vdsm
Line 80: return
Instead of quietly returning here I would prefer that you raise an exception. If the user
doesn't start vdsm as root we want to catch it right away and provide a clear error
message.
I know I contributed this example code :)
Please consider adding a functional test to cover this case.
Line 81: vdsm_uid,vdsm_gid = getpwnam(VDSM_USER)[2:4:]
space between ',' and vdsm_gid
Line 99:
This blank line is not necessary.
Line 110: supervdsmServer.restartSvdsm(manager, vdsmPid)
This is just a one-liner. Why not inline the code directly in the function below?
Line 116: # None key make children inherit it from parent
May I suggest rewording this comment for clarity?
# When key is None, connections are restricted to child processes.
Line 117: manager = _SuperVdsmManager(ADDRESS)
Since you reference the authKey parameter in a comment, best to supply it as an explicit
parameter to the function:
manager = _SuperVdsmManager(ADDRESS, None)
Line 127: os.kill(vdsmPid, signal.SIGTERM)
What happens if vdsmPid isn't defined yet because an exception was thrown by
vdsmProc.start() ? You should either move vdsmProc.start() out of the try block or test
that the value of vdsmPid makes sense.
Line 129: vdsmProc.join()
Trailing whitespace.
--
To view, visit
http://gerrit.ovirt.org/4145
To unsubscribe, visit
http://gerrit.ovirt.org/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I69aae6b0b9529c80291d90c6ad14ff82b21aea53
Gerrit-PatchSet: 4
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Royce Lv <lvroyce(a)linux.vnet.ibm.com>
Gerrit-Reviewer: Adam Litke <agl(a)us.ibm.com>
Gerrit-Reviewer: Dan Kenigsberg <danken(a)redhat.com>
Gerrit-Reviewer: Mark Wu <wudxw(a)linux.vnet.ibm.com>
Gerrit-Reviewer: Royce Lv <lvroyce(a)linux.vnet.ibm.com>
Gerrit-Reviewer: Saggi Mizrahi <smizrahi(a)redhat.com>
Gerrit-Reviewer: ShaoHe Feng <shaohef(a)linux.vnet.ibm.com>
Gerrit-Reviewer: Shu Ming <shuming(a)linux.vnet.ibm.com>
Gerrit-Reviewer: Wenyi Gao <wenyi(a)linux.vnet.ibm.com>
Gerrit-Reviewer: Xu He Jie <xuhj(a)linux.vnet.ibm.com>