Liron Ar has posted comments on this change.
Change subject: introducing downloadImageFromStream ......................................................................
Patch Set 1:
(36 comments)
http://gerrit.ovirt.org/#/c/23281/1//COMMIT_MSG Commit Message:
Line 10: allows to upload using streaming content to vdsm images. Line 11: XML-RPC spec doesn't support streaming, as we don't want to use HTTP Line 12: server in addition to the current XML-RPC server (to not use another Line 13: port) we will threat some requests as http requests instead of xml-rpc Line 14: requests.
Streaming content is not correct - no streaming done here. This is simply a
it's not related to the size but to the fact that we don't want to have all file passed at once/in memory - therefore the term streaming is correct imo. if you have better suggestion it'll be great. Line 15: Line 16: General upload information: Line 17: --------------------------------------------------------------- Line 18: PUT requests arriving to the server with content type of
Line 17: --------------------------------------------------------------- Line 18: PUT requests arriving to the server with content type of Line 19: application/octet-stream to default paths that we use Line 20: today for request handling ('/', '/RPC2') will be threated as Line 21: upload requests.
I think it would be more correct to limit the upload support to certain pat
I don't really agree about that. having a special path for it means that when executing other kind of requests we should block this path - let's say that we add /upload - we need now to add hanling on post methods to that path. Line 22: Line 23: The PUT stream upload request inspected headers are: Line 24: -- Mandatory headers: Line 25: *content-length
Line 27: -- Mandatory custom headers: Line 28: *X-Storage-Pool-Id : mandatory for succesfull execution Line 29: *X-Storage-Domain-Id: mandatory for succesfull execution Line 30: *X-Image-Id: mandatory for succesfull execution Line 31: *X-Volume-Id: mandatory for succesfull execution
Consider passing these in the request path:
i'll stick with the headers, the supported paths are defined per all the handler instances and are verified when processing the requests. there's no need to add extra noise while the headers method seems better and simpler to me and we already use it. Line 32: Line 33: -- Optional custom headers; Line 34: *X-Buffer-Size: optional, indicates the passed buffer Line 35:
Line 30: *X-Image-Id: mandatory for succesfull execution Line 31: *X-Volume-Id: mandatory for succesfull execution Line 32: Line 33: -- Optional custom headers; Line 34: *X-Buffer-Size: optional, indicates the passed buffer
This is pointless as we discussed.
Done Line 35: Line 36: The server will return answers with the following codes: Line 37: 200 - request completed, the response body will contain the execution Line 38: results, the returned content type is 'application/json'
Line 36: The server will return answers with the following codes: Line 37: 200 - request completed, the response body will contain the execution Line 38: results, the returned content type is 'application/json' Line 39: 500 - internal error Line 40: 406 - missing content-length header
This list is missing some values, but do not fix it - we don't need to desc
Done Line 41: Line 42: in case that a task was created, its id will be passed in the response Line 43: using the custom header 'X-Task-Id' for polling by the engine. Line 44:
http://gerrit.ovirt.org/#/c/23281/1/vdsm/BindingXMLRPC.py File vdsm/BindingXMLRPC.py:
Line 131: else: Line 132: basehandler = SimpleXMLRPCServer.SimpleXMLRPCRequestHandler Line 133: Line 134: class RequestHandler(LoggingMixIn, basehandler): Line 135:
For timeout, you can put here a class variable:
the correct place for this is in the server and not in the handler. Line 136: log = logging.getLogger("BindingXMLRPC.RequestHandler") Line 137: CUSTOM_HEADER_POOL = 'X-Storage-Pool-Id' Line 138: CUSTOM_HEADER_DOMAIN = 'X-Storage-Domain-Id' Line 139: CUSTOM_HEADER_IMAGE = 'X-Image-Id'
Line 132: basehandler = SimpleXMLRPCServer.SimpleXMLRPCRequestHandler Line 133: Line 134: class RequestHandler(LoggingMixIn, basehandler): Line 135: Line 136: log = logging.getLogger("BindingXMLRPC.RequestHandler")
It would be nice to separate constants from the log.
Done Line 137: CUSTOM_HEADER_POOL = 'X-Storage-Pool-Id' Line 138: CUSTOM_HEADER_DOMAIN = 'X-Storage-Domain-Id' Line 139: CUSTOM_HEADER_IMAGE = 'X-Image-Id' Line 140: CUSTOM_HEADER_VOLUME = 'X-Volume-Id'
Line 133: Line 134: class RequestHandler(LoggingMixIn, basehandler): Line 135: Line 136: log = logging.getLogger("BindingXMLRPC.RequestHandler") Line 137: CUSTOM_HEADER_POOL = 'X-Storage-Pool-Id'
CUSTOM_HEADER_ prefix is not needed and only make the constants too long, l
I removed the "custom" part, i prefer to keep the header in the name to indicate that those are headers. Line 138: CUSTOM_HEADER_DOMAIN = 'X-Storage-Domain-Id' Line 139: CUSTOM_HEADER_IMAGE = 'X-Image-Id' Line 140: CUSTOM_HEADER_VOLUME = 'X-Volume-Id' Line 141: CUSTOM_HEADER_BUFFER = 'X-Buffer-Size'
Line 137: CUSTOM_HEADER_POOL = 'X-Storage-Pool-Id' Line 138: CUSTOM_HEADER_DOMAIN = 'X-Storage-Domain-Id' Line 139: CUSTOM_HEADER_IMAGE = 'X-Image-Id' Line 140: CUSTOM_HEADER_VOLUME = 'X-Volume-Id' Line 141: CUSTOM_HEADER_BUFFER = 'X-Buffer-Size'
Not needed as we discussed
Done Line 142: CUSTOM_HEADER_TASK_ID = 'X-Task-Id' Line 143: HEADER_CONTENT_LENGTH = 'content-length' Line 144: HEADER_CONTENT_TYPE = 'content-type' Line 145: RESPONSE_LENGTH_REQUIRED_ = 411
Line 144: HEADER_CONTENT_TYPE = 'content-type' Line 145: RESPONSE_LENGTH_REQUIRED_ = 411 Line 146: RESPONSE_NOT_ACCEPTABLE = 406 Line 147: RESPONSE_REQUEST_SUCCEEDED = 200 Line 148: RESPONSE_INTERNAL_ERROR = 500
For http status codes, please use httplib:
Done Line 149: Line 150: def setup(self): Line 151: threadLocal.client = self.client_address[0] Line 152: return basehandler.setup(self)
Line 150: def setup(self): Line 151: threadLocal.client = self.client_address[0] Line 152: return basehandler.setup(self) Line 153: Line 154: def addTaskIdHeaderIfPresent(self):
This should be sendTaskHeaderIfPresent, since it sends a header.
Done Line 155: if hasattr(vars, 'task'): Line 156: self.send_header(self.CUSTOM_HEADER_TASK_ID, Line 157: vars.task.getID()) Line 158:
Line 157: vars.task.getID()) Line 158: Line 159: def do_PUT(self): Line 160: ctype, pt = cgi.parse_header(self.headers.getheader(self. Line 161: HEADER_CONTENT_TYPE))
Why this check is out of the try-except block?
Done Line 162: try: Line 163: if ctype == 'application/octet-stream': Line 164: contentLength = self.headers. \ Line 165: getheader(self.HEADER_CONTENT_LENGTH)
Line 159: def do_PUT(self): Line 160: ctype, pt = cgi.parse_header(self.headers.getheader(self. Line 161: HEADER_CONTENT_TYPE)) Line 162: try: Line 163: if ctype == 'application/octet-stream':
The code would be more clear if we bail out as soon as we find an error:
Done Line 164: contentLength = self.headers. \ Line 165: getheader(self.HEADER_CONTENT_LENGTH) Line 166: if not contentLength: Line 167: self.send_response(411)
Line 161: HEADER_CONTENT_TYPE)) Line 162: try: Line 163: if ctype == 'application/octet-stream': Line 164: contentLength = self.headers. \ Line 165: getheader(self.HEADER_CONTENT_LENGTH)
It makes more sense to convert content-length to integer here:
Done Line 166: if not contentLength: Line 167: self.send_response(411) Line 168: self.finish() Line 169: return
Line 163: if ctype == 'application/octet-stream': Line 164: contentLength = self.headers. \ Line 165: getheader(self.HEADER_CONTENT_LENGTH) Line 166: if not contentLength: Line 167: self.send_response(411)
Use constant
Done Line 168: self.finish() Line 169: return Line 170: Line 171: # NEED TO DECIDE ABOUT THOSE
Line 164: contentLength = self.headers. \ Line 165: getheader(self.HEADER_CONTENT_LENGTH) Line 166: if not contentLength: Line 167: self.send_response(411) Line 168: self.finish()
This pattern:
I don't think that its worth extracting to a method Line 169: return Line 170: Line 171: # NEED TO DECIDE ABOUT THOSE Line 172: #self.rfile._sock.settimeout(5)
Line 169: return Line 170: Line 171: # NEED TO DECIDE ABOUT THOSE Line 172: #self.rfile._sock.settimeout(5) Line 173: #self.rfile._sock.set_socket_read_timeout
We don't need it as we discussed, timeout class variable will handle this.
replied before, the correct place for this is in the server and not in the handler. as we agreed f2f, timeout will be added to the server generally and not only for this usecase. Line 174: spUUID = self.headers. \ Line 175: getheader(self.CUSTOM_HEADER_POOL) Line 176: sdUUID = self.headers. \ Line 177: getheader(self.CUSTOM_HEADER_DOMAIN)
Line 171: # NEED TO DECIDE ABOUT THOSE Line 172: #self.rfile._sock.settimeout(5) Line 173: #self.rfile._sock.set_socket_read_timeout Line 174: spUUID = self.headers. \ Line 175: getheader(self.CUSTOM_HEADER_POOL)
This hard to read lines are caused by too much indentation and too long con
Done Line 176: sdUUID = self.headers. \ Line 177: getheader(self.CUSTOM_HEADER_DOMAIN) Line 178: imgUUID = self.headers. \ Line 179: getheader(self.CUSTOM_HEADER_IMAGE)
Line 185: getheader(self.HEADER_CONTENT_LENGTH) Line 186: Line 187: methodArgs = {'method': 'stream', Line 188: 'fileObj': self.rfile, Line 189: 'Content-Length': contentLength,
Use constant
I prefer to keep it as content-length, that's already the terminology that we use there. Line 190: 'Buffer-Size': bufferSize} Line 191: image = API.Image(imgUUID, spUUID, sdUUID) Line 192: response = image.downloadFromStream(methodArgs, Line 193: volUUID)
Line 186: Line 187: methodArgs = {'method': 'stream', Line 188: 'fileObj': self.rfile, Line 189: 'Content-Length': contentLength, Line 190: 'Buffer-Size': bufferSize}
Not needed
Done Line 191: image = API.Image(imgUUID, spUUID, sdUUID) Line 192: response = image.downloadFromStream(methodArgs, Line 193: volUUID) Line 194: json_response = json.dumps(response)
Line 191: image = API.Image(imgUUID, spUUID, sdUUID) Line 192: response = image.downloadFromStream(methodArgs, Line 193: volUUID) Line 194: json_response = json.dumps(response) Line 195: self.send_response(200)
Use constant
Done Line 196: self.send_header(self.HEADER_CONTENT_TYPE, Line 197: 'application/json') Line 198: self.send_header(self.HEADER_CONTENT_LENGTH, Line 199: len(json_response))
Line 201: self.end_headers() Line 202: self.wfile.write(json_response) Line 203: self.finish() Line 204: else: Line 205: self.send_response(406)
This status code is not correct, it should be 415 Unsupported media type:
Done Line 206: self.finish() Line 207: # shouldn't happen Line 208: except Exception: Line 209: self.log.error("execution exception",
Line 202: self.wfile.write(json_response) Line 203: self.finish() Line 204: else: Line 205: self.send_response(406) Line 206: self.finish()
If we use a timeout, we catch socket.timeout and handle it properly - we sh
i don't think that it's needed, we log the exc_info. what we could do here is to re-raise to have the server default handling for it..but it doesn't seem to make a real diff from current handling. Line 207: # shouldn't happen Line 208: except Exception: Line 209: self.log.error("execution exception", Line 210: exc_info=True)
Line 205: self.send_response(406) Line 206: self.finish() Line 207: # shouldn't happen Line 208: except Exception: Line 209: self.log.error("execution exception",
execution?
changed to unexpected error during execution Line 210: exc_info=True) Line 211: try: Line 212: self.send_response(500) Line 213: self.addTaskIdHeaderIfPresent()
Line 208: except Exception: Line 209: self.log.error("execution exception", Line 210: exc_info=True) Line 211: try: Line 212: self.send_response(500)
Use constant
Done Line 213: self.addTaskIdHeaderIfPresent() Line 214: self.end_headers() Line 215: self.finish() Line 216: except Exception:
http://gerrit.ovirt.org/#/c/23281/1/vdsm/storage/imageSharing.py File vdsm/storage/imageSharing.py:
Line 48: Line 49: return size Line 50: Line 51: Line 52: def streamGetSize(methodArgs):
Unneeded
i prefer to keep the convenetion here, see _METHOD_IMPLEMENTATIONS and therefore i leave this method, it's actual implementation is other issue..but if we already define that there is "size" method, i want to implement it. Line 53: size = None Line 54: if 'Content-Length' in methodArgs: Line 55: size = int(methodArgs['Content-Length']) Line 56: return size
Line 59: def streamGetBufferSize(methodArgs): Line 60: size = None Line 61: if 'X-Buffer-Size' in methodArgs: Line 62: size = int(methodArgs['X-Buffer-Size']) Line 63: return size
Not needed
Done Line 64: Line 65: Line 66: def httpDownloadImage(dstImgPath, methodArgs): Line 67: curlImgWrap.download(methodArgs.get('url'), dstImgPath,
Line 73: methodArgs.get("headers", {})) Line 74: Line 75: Line 76: def streamDownloadImage(dstImgPath, methodArgs): Line 77: bytes_left = streamGetSize(methodArgs)
This will not work as streamGetSize may return None, and byte_left -= bytes
Right, it's a leftover from when this verb supported unspecified content length, revmoing. Line 78: buffer_size = streamGetBufferSize(methodArgs) Line 79: stream = methodArgs.get('fileObj') Line 80: Line 81: BS = constants.MEGAB
Line 78: buffer_size = streamGetBufferSize(methodArgs) Line 79: stream = methodArgs.get('fileObj') Line 80: Line 81: BS = constants.MEGAB Line 82: count = bytes_left / BS
count is wrong, since bytes_left is not divisible by BS. For example, if we
it's a copy copy-paste from other invocation that we have..i just removed the count. Line 83: cmd = [constants.EXT_DD, "of=%s" % dstImgPath, "bs=%s" % Line 84: constants.MEGAB, "count=%d" % count] Line 85: p = utils.execCmd(cmd, sudo=False, sync=False) Line 86:
Line 80: Line 81: BS = constants.MEGAB Line 82: count = bytes_left / BS Line 83: cmd = [constants.EXT_DD, "of=%s" % dstImgPath, "bs=%s" % Line 84: constants.MEGAB, "count=%d" % count]
Check if more dd flags are needed for writing to storage - check how copyIm
will be checked. Line 85: p = utils.execCmd(cmd, sudo=False, sync=False) Line 86: Line 87: default_read_size = buffer_size Line 88: if default_read_size is None:
Line 81: BS = constants.MEGAB Line 82: count = bytes_left / BS Line 83: cmd = [constants.EXT_DD, "of=%s" % dstImgPath, "bs=%s" % Line 84: constants.MEGAB, "count=%d" % count] Line 85: p = utils.execCmd(cmd, sudo=False, sync=False)
Rest of this must be in a try-except block, to ensure that dd is killed on
Done Line 86: Line 87: default_read_size = buffer_size Line 88: if default_read_size is None: Line 89: default_read_size = BS
Line 85: p = utils.execCmd(cmd, sudo=False, sync=False) Line 86: Line 87: default_read_size = buffer_size Line 88: if default_read_size is None: Line 89: default_read_size = BS
There is no point to read from a socket and write to a pipe in 1MB chunks,
in the python documentation it's written: "For best match with hardware and network realities, the value of bufsize should be a relatively small power of 2, for example, 4096."
we can start with the value. Line 90: Line 91: while bytes_left > 0: Line 92: i = min(default_read_size, bytes_left) Line 93: data = stream.read(i)
Line 88: if default_read_size is None: Line 89: default_read_size = BS Line 90: Line 91: while bytes_left > 0: Line 92: i = min(default_read_size, bytes_left)
i is not a good name for a size variable in this context. People expect i t
Done Line 93: data = stream.read(i) Line 94: if not data: Line 95: break Line 96: p.stdin.write(data)
Line 91: while bytes_left > 0: Line 92: i = min(default_read_size, bytes_left) Line 93: data = stream.read(i) Line 94: if not data: Line 95: break
You must kill the process here. Also, this unexpected situation, so you sho
Done Line 96: p.stdin.write(data) Line 97: p.stdin.flush() Line 98: bytes_left = bytes_left - i Line 99:
Line 93: data = stream.read(i) Line 94: if not data: Line 95: break Line 96: p.stdin.write(data) Line 97: p.stdin.flush()
Maybe put a note here so we remember to check if flush is needed. In a perf
I don't think that this note will somehow help, no real harm in having that here. Line 98: bytes_left = bytes_left - i Line 99: Line 100: p.stdin.close() Line 101: p.wait()
Line 94: if not data: Line 95: break Line 96: p.stdin.write(data) Line 97: p.stdin.flush() Line 98: bytes_left = bytes_left - i
You must assume partial reads:
Done Line 99: Line 100: p.stdin.close() Line 101: p.wait() Line 102: if p.returncode != 0: