Change in vdsm[master]: introducing downloadImageFromStream

19 Jan 2014


      Liron Ar has posted comments on this change.
Change subject: introducing downloadImageFromStream
......................................................................
Patch Set 1:
(36 comments)
http://gerrit.ovirt.org/#/c/23281/1//COMMIT_MSG
Commit Message:
Line 10: allows to upload using streaming content to vdsm images.
Line 11: XML-RPC spec doesn't support streaming, as we don't want to use HTTP
Line 12: server in addition to the current XML-RPC server (to not use another
Line 13: port) we will threat some requests as http requests instead of xml-rpc
Line 14: requests.
...
Streaming content is not correct - no streaming done here. This is simply a
it's not related to the size  but to the fact that we don't want to have all file passed at once/in memory - therefore the term streaming is correct imo. if you have better suggestion it'll be great.
Line 15: 
Line 16: General upload information:
Line 17: ---------------------------------------------------------------
Line 18: PUT requests arriving to the server with content type of
Line 17: ---------------------------------------------------------------
Line 18: PUT requests arriving to the server with content type of
Line 19: application/octet-stream to default paths that we use
Line 20: today for request handling ('/', '/RPC2') will be threated as
Line 21: upload requests.
...
I think it would be more correct to limit the upload support to certain pat
I don't really agree about that.
having a special path for it means that when executing other kind of requests we should block this path - let's say that we add /upload - we need now to add hanling on post methods to that path.
Line 22: 
Line 23: The PUT stream upload request inspected headers are:
Line 24: -- Mandatory headers:
Line 25:         *content-length
Line 27: -- Mandatory custom headers:
Line 28:         *X-Storage-Pool-Id : mandatory for succesfull execution
Line 29:         *X-Storage-Domain-Id: mandatory for succesfull execution
Line 30:         *X-Image-Id: mandatory for succesfull execution
Line 31:         *X-Volume-Id: mandatory for succesfull execution
...
Consider passing these in the request path:
i'll stick with the headers, the supported paths are defined per all the handler instances and are verified when processing the requests. there's no need to add extra noise while the headers method seems better and simpler to me and we already use it.
Line 32: 
Line 33: -- Optional custom headers;
Line 34:         *X-Buffer-Size: optional, indicates the passed buffer
Line 35:
Line 30:         *X-Image-Id: mandatory for succesfull execution
Line 31:         *X-Volume-Id: mandatory for succesfull execution
Line 32: 
Line 33: -- Optional custom headers;
Line 34:         *X-Buffer-Size: optional, indicates the passed buffer
...
This is pointless as we discussed.
Done
Line 35: 
Line 36: The server will return answers with the following codes:
Line 37: 200 - request completed, the response body will contain the execution
Line 38: results, the returned content type is 'application/json'
Line 36: The server will return answers with the following codes:
Line 37: 200 - request completed, the response body will contain the execution
Line 38: results, the returned content type is 'application/json'
Line 39: 500 - internal error
Line 40: 406 - missing content-length header
...
This list is missing some values, but do not fix it - we don't need to desc
Done
Line 41: 
Line 42: in case that a task was created, its id will be passed in the response
Line 43: using the custom header 'X-Task-Id' for polling by the engine.
Line 44:
http://gerrit.ovirt.org/#/c/23281/1/vdsm/BindingXMLRPC.py
File vdsm/BindingXMLRPC.py:
Line 131:         else:
Line 132:             basehandler = SimpleXMLRPCServer.SimpleXMLRPCRequestHandler
Line 133: 
Line 134:         class RequestHandler(LoggingMixIn, basehandler):
Line 135:
...
For timeout, you can put here a class variable:
the correct place for this is in the server and not in the handler.
Line 136:             log = logging.getLogger("BindingXMLRPC.RequestHandler")
Line 137:             CUSTOM_HEADER_POOL = 'X-Storage-Pool-Id'
Line 138:             CUSTOM_HEADER_DOMAIN = 'X-Storage-Domain-Id'
Line 139:             CUSTOM_HEADER_IMAGE = 'X-Image-Id'
Line 132:             basehandler = SimpleXMLRPCServer.SimpleXMLRPCRequestHandler
Line 133: 
Line 134:         class RequestHandler(LoggingMixIn, basehandler):
Line 135: 
Line 136:             log = logging.getLogger("BindingXMLRPC.RequestHandler")
...
It would be nice to separate constants from the log.
Done
Line 137:             CUSTOM_HEADER_POOL = 'X-Storage-Pool-Id'
Line 138:             CUSTOM_HEADER_DOMAIN = 'X-Storage-Domain-Id'
Line 139:             CUSTOM_HEADER_IMAGE = 'X-Image-Id'
Line 140:             CUSTOM_HEADER_VOLUME = 'X-Volume-Id'
Line 133: 
Line 134:         class RequestHandler(LoggingMixIn, basehandler):
Line 135: 
Line 136:             log = logging.getLogger("BindingXMLRPC.RequestHandler")
Line 137:             CUSTOM_HEADER_POOL = 'X-Storage-Pool-Id'
...
CUSTOM_HEADER_ prefix is not needed and only make the constants too long, l
I removed the "custom" part, i prefer to keep the header in the name to indicate that those are headers.
Line 138:             CUSTOM_HEADER_DOMAIN = 'X-Storage-Domain-Id'
Line 139:             CUSTOM_HEADER_IMAGE = 'X-Image-Id'
Line 140:             CUSTOM_HEADER_VOLUME = 'X-Volume-Id'
Line 141:             CUSTOM_HEADER_BUFFER = 'X-Buffer-Size'
Line 137:             CUSTOM_HEADER_POOL = 'X-Storage-Pool-Id'
Line 138:             CUSTOM_HEADER_DOMAIN = 'X-Storage-Domain-Id'
Line 139:             CUSTOM_HEADER_IMAGE = 'X-Image-Id'
Line 140:             CUSTOM_HEADER_VOLUME = 'X-Volume-Id'
Line 141:             CUSTOM_HEADER_BUFFER = 'X-Buffer-Size'
...
Not needed as we discussed
Done
Line 142:             CUSTOM_HEADER_TASK_ID = 'X-Task-Id'
Line 143:             HEADER_CONTENT_LENGTH = 'content-length'
Line 144:             HEADER_CONTENT_TYPE = 'content-type'
Line 145:             RESPONSE_LENGTH_REQUIRED_ = 411
Line 144:             HEADER_CONTENT_TYPE = 'content-type'
Line 145:             RESPONSE_LENGTH_REQUIRED_ = 411
Line 146:             RESPONSE_NOT_ACCEPTABLE = 406
Line 147:             RESPONSE_REQUEST_SUCCEEDED = 200
Line 148:             RESPONSE_INTERNAL_ERROR = 500
...
For http status codes, please use httplib:
Done
Line 149: 
Line 150:             def setup(self):
Line 151:                 threadLocal.client = self.client_address[0]
Line 152:                 return basehandler.setup(self)
Line 150:             def setup(self):
Line 151:                 threadLocal.client = self.client_address[0]
Line 152:                 return basehandler.setup(self)
Line 153: 
Line 154:             def addTaskIdHeaderIfPresent(self):
...
This should be sendTaskHeaderIfPresent, since it sends a header.
Done
Line 155:                 if hasattr(vars, 'task'):
Line 156:                     self.send_header(self.CUSTOM_HEADER_TASK_ID,
Line 157:                                      vars.task.getID())
Line 158:
Line 157:                                      vars.task.getID())
Line 158: 
Line 159:             def do_PUT(self):
Line 160:                 ctype, pt = cgi.parse_header(self.headers.getheader(self.
Line 161:                                              HEADER_CONTENT_TYPE))
...
Why this check is out of the try-except block?
Done
Line 162:                 try:
Line 163:                     if ctype == 'application/octet-stream':
Line 164:                         contentLength = self.headers. \
Line 165:                             getheader(self.HEADER_CONTENT_LENGTH)
Line 159:             def do_PUT(self):
Line 160:                 ctype, pt = cgi.parse_header(self.headers.getheader(self.
Line 161:                                              HEADER_CONTENT_TYPE))
Line 162:                 try:
Line 163:                     if ctype == 'application/octet-stream':
...
The code would be more clear if we bail out as soon as we find an error:
Done
Line 164:                         contentLength = self.headers. \
Line 165:                             getheader(self.HEADER_CONTENT_LENGTH)
Line 166:                         if not contentLength:
Line 167:                             self.send_response(411)
Line 161:                                              HEADER_CONTENT_TYPE))
Line 162:                 try:
Line 163:                     if ctype == 'application/octet-stream':
Line 164:                         contentLength = self.headers. \
Line 165:                             getheader(self.HEADER_CONTENT_LENGTH)
...
It makes more sense to convert content-length to integer here:
Done
Line 166:                         if not contentLength:
Line 167:                             self.send_response(411)
Line 168:                             self.finish()
Line 169:                             return
Line 163:                     if ctype == 'application/octet-stream':
Line 164:                         contentLength = self.headers. \
Line 165:                             getheader(self.HEADER_CONTENT_LENGTH)
Line 166:                         if not contentLength:
Line 167:                             self.send_response(411)
...
Use constant
Done
Line 168:                             self.finish()
Line 169:                             return
Line 170: 
Line 171:                         # NEED TO DECIDE ABOUT THOSE
Line 164:                         contentLength = self.headers. \
Line 165:                             getheader(self.HEADER_CONTENT_LENGTH)
Line 166:                         if not contentLength:
Line 167:                             self.send_response(411)
Line 168:                             self.finish()
...
This pattern:
I don't think that its worth extracting to a method
Line 169:                             return
Line 170: 
Line 171:                         # NEED TO DECIDE ABOUT THOSE
Line 172:                         #self.rfile._sock.settimeout(5)
Line 169:                             return
Line 170: 
Line 171:                         # NEED TO DECIDE ABOUT THOSE
Line 172:                         #self.rfile._sock.settimeout(5)
Line 173:                         #self.rfile._sock.set_socket_read_timeout
...
We don't need it as we discussed, timeout class variable will handle this.
replied before, the correct place for this is in the server and not in the handler.
as we agreed f2f, timeout will be added to the server generally and not only for this usecase.
Line 174:                         spUUID = self.headers. \
Line 175:                             getheader(self.CUSTOM_HEADER_POOL)
Line 176:                         sdUUID = self.headers. \
Line 177:                             getheader(self.CUSTOM_HEADER_DOMAIN)
Line 171:                         # NEED TO DECIDE ABOUT THOSE
Line 172:                         #self.rfile._sock.settimeout(5)
Line 173:                         #self.rfile._sock.set_socket_read_timeout
Line 174:                         spUUID = self.headers. \
Line 175:                             getheader(self.CUSTOM_HEADER_POOL)
...
This hard to read lines are caused by too much indentation and too long con
Done
Line 176:                         sdUUID = self.headers. \
Line 177:                             getheader(self.CUSTOM_HEADER_DOMAIN)
Line 178:                         imgUUID = self.headers. \
Line 179:                             getheader(self.CUSTOM_HEADER_IMAGE)
Line 185:                             getheader(self.HEADER_CONTENT_LENGTH)
Line 186: 
Line 187:                         methodArgs = {'method': 'stream',
Line 188:                                       'fileObj': self.rfile,
Line 189:                                       'Content-Length': contentLength,
...
Use constant
I prefer to keep it as content-length, that's already the terminology that we use there.
Line 190:                                       'Buffer-Size': bufferSize}
Line 191:                         image = API.Image(imgUUID, spUUID, sdUUID)
Line 192:                         response = image.downloadFromStream(methodArgs,
Line 193:                                                             volUUID)
Line 186: 
Line 187:                         methodArgs = {'method': 'stream',
Line 188:                                       'fileObj': self.rfile,
Line 189:                                       'Content-Length': contentLength,
Line 190:                                       'Buffer-Size': bufferSize}
...
Not needed
Done
Line 191:                         image = API.Image(imgUUID, spUUID, sdUUID)
Line 192:                         response = image.downloadFromStream(methodArgs,
Line 193:                                                             volUUID)
Line 194:                         json_response = json.dumps(response)
Line 191:                         image = API.Image(imgUUID, spUUID, sdUUID)
Line 192:                         response = image.downloadFromStream(methodArgs,
Line 193:                                                             volUUID)
Line 194:                         json_response = json.dumps(response)
Line 195:                         self.send_response(200)
...
Use constant
Done
Line 196:                         self.send_header(self.HEADER_CONTENT_TYPE,
Line 197:                                          'application/json')
Line 198:                         self.send_header(self.HEADER_CONTENT_LENGTH,
Line 199:                                          len(json_response))
Line 201:                         self.end_headers()
Line 202:                         self.wfile.write(json_response)
Line 203:                         self.finish()
Line 204:                     else:
Line 205:                         self.send_response(406)
...
This status code is not correct, it should be 415 Unsupported media type:
Done
Line 206:                         self.finish()
Line 207:                 # shouldn't happen
Line 208:                 except Exception:
Line 209:                     self.log.error("execution exception",
Line 202:                         self.wfile.write(json_response)
Line 203:                         self.finish()
Line 204:                     else:
Line 205:                         self.send_response(406)
Line 206:                         self.finish()
...
If we use a timeout, we catch socket.timeout and handle it properly - we sh
i don't think that it's needed, we log the exc_info.
what we could do here is to re-raise to have the server default handling for it..but it doesn't seem to make a real diff from current handling.
Line 207:                 # shouldn't happen
Line 208:                 except Exception:
Line 209:                     self.log.error("execution exception",
Line 210:                                    exc_info=True)
Line 205:                         self.send_response(406)
Line 206:                         self.finish()
Line 207:                 # shouldn't happen
Line 208:                 except Exception:
Line 209:                     self.log.error("execution exception",
...
execution?
changed to unexpected error during execution
Line 210:                                    exc_info=True)
Line 211:                     try:
Line 212:                         self.send_response(500)
Line 213:                         self.addTaskIdHeaderIfPresent()
Line 208:                 except Exception:
Line 209:                     self.log.error("execution exception",
Line 210:                                    exc_info=True)
Line 211:                     try:
Line 212:                         self.send_response(500)
...
Use constant
Done
Line 213:                         self.addTaskIdHeaderIfPresent()
Line 214:                         self.end_headers()
Line 215:                         self.finish()
Line 216:                     except Exception:
http://gerrit.ovirt.org/#/c/23281/1/vdsm/storage/imageSharing.py
File vdsm/storage/imageSharing.py:
Line 48: 
Line 49:     return size
Line 50: 
Line 51: 
Line 52: def streamGetSize(methodArgs):
...
Unneeded
i prefer to keep the convenetion here, see 	_METHOD_IMPLEMENTATIONS and therefore i leave this method, it's actual implementation is other issue..but if we already define that there is "size" method, i want to implement it.
Line 53:     size = None
Line 54:     if 'Content-Length' in methodArgs:
Line 55:         size = int(methodArgs['Content-Length'])
Line 56:     return size
Line 59: def streamGetBufferSize(methodArgs):
Line 60:     size = None
Line 61:     if 'X-Buffer-Size' in methodArgs:
Line 62:         size = int(methodArgs['X-Buffer-Size'])
Line 63:     return size
...
Not needed
Done
Line 64: 
Line 65: 
Line 66: def httpDownloadImage(dstImgPath, methodArgs):
Line 67:     curlImgWrap.download(methodArgs.get('url'), dstImgPath,
Line 73:                        methodArgs.get("headers", {}))
Line 74: 
Line 75: 
Line 76: def streamDownloadImage(dstImgPath, methodArgs):
Line 77:     bytes_left = streamGetSize(methodArgs)
...
This will not work as streamGetSize may return None, and byte_left -= bytes
Right, it's a leftover from when this verb supported unspecified content length, revmoing.
Line 78:     buffer_size = streamGetBufferSize(methodArgs)
Line 79:     stream = methodArgs.get('fileObj')
Line 80: 
Line 81:     BS = constants.MEGAB
Line 78:     buffer_size = streamGetBufferSize(methodArgs)
Line 79:     stream = methodArgs.get('fileObj')
Line 80: 
Line 81:     BS = constants.MEGAB
Line 82:     count = bytes_left / BS
...
count is wrong, since bytes_left is not divisible by BS. For example, if we
it's a copy copy-paste from other invocation that we have..i just removed the count.
Line 83:     cmd = [constants.EXT_DD, "of=%s" % dstImgPath, "bs=%s" %
Line 84:            constants.MEGAB, "count=%d" % count]
Line 85:     p = utils.execCmd(cmd, sudo=False, sync=False)
Line 86:
Line 80: 
Line 81:     BS = constants.MEGAB
Line 82:     count = bytes_left / BS
Line 83:     cmd = [constants.EXT_DD, "of=%s" % dstImgPath, "bs=%s" %
Line 84:            constants.MEGAB, "count=%d" % count]
...
Check if more dd flags are needed for writing to storage - check how copyIm
will be checked.
Line 85:     p = utils.execCmd(cmd, sudo=False, sync=False)
Line 86: 
Line 87:     default_read_size = buffer_size
Line 88:     if default_read_size is None:
Line 81:     BS = constants.MEGAB
Line 82:     count = bytes_left / BS
Line 83:     cmd = [constants.EXT_DD, "of=%s" % dstImgPath, "bs=%s" %
Line 84:            constants.MEGAB, "count=%d" % count]
Line 85:     p = utils.execCmd(cmd, sudo=False, sync=False)
...
Rest of this must be in a try-except block, to ensure that dd is killed on
Done
Line 86: 
Line 87:     default_read_size = buffer_size
Line 88:     if default_read_size is None:
Line 89:         default_read_size = BS
Line 85:     p = utils.execCmd(cmd, sudo=False, sync=False)
Line 86: 
Line 87:     default_read_size = buffer_size
Line 88:     if default_read_size is None:
Line 89:         default_read_size = BS
...
There is no point to read from a socket and write to a pipe in 1MB chunks,
in the python documentation it's written:
"For best match with hardware and network realities, the value of bufsize should be a relatively small power of 2, for example, 4096."
we can start with the value.
Line 90: 
Line 91:     while bytes_left > 0:
Line 92:         i = min(default_read_size, bytes_left)
Line 93:         data = stream.read(i)
Line 88:     if default_read_size is None:
Line 89:         default_read_size = BS
Line 90: 
Line 91:     while bytes_left > 0:
Line 92:         i = min(default_read_size, bytes_left)
...
i is not a good name for a size variable in this context. People expect i t
Done
Line 93:         data = stream.read(i)
Line 94:         if not data:
Line 95:             break
Line 96:         p.stdin.write(data)
Line 91:     while bytes_left > 0:
Line 92:         i = min(default_read_size, bytes_left)
Line 93:         data = stream.read(i)
Line 94:         if not data:
Line 95:             break
...
You must kill the process here. Also, this unexpected situation, so you sho
Done
Line 96:         p.stdin.write(data)
Line 97:         p.stdin.flush()
Line 98:         bytes_left = bytes_left - i
Line 99:
Line 93:         data = stream.read(i)
Line 94:         if not data:
Line 95:             break
Line 96:         p.stdin.write(data)
Line 97:         p.stdin.flush()
...
Maybe put a note here so we remember to check if flush is needed. In a perf
I don't think that this note will somehow help, no real harm in having that here.
Line 98:         bytes_left = bytes_left - i
Line 99: 
Line 100:     p.stdin.close()
Line 101:     p.wait()
Line 94:         if not data:
Line 95:             break
Line 96:         p.stdin.write(data)
Line 97:         p.stdin.flush()
Line 98:         bytes_left = bytes_left - i
...
You must assume partial reads:
Done
Line 99: 
Line 100:     p.stdin.close()
Line 101:     p.wait()
Line 102:     if p.returncode != 0:
-- 
To view, visit http://gerrit.ovirt.org/23281
To unsubscribe, visit http://gerrit.ovirt.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I768b84799ed9fb2769c6d4240519d036f8988b99
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Liron Ar laravot@redhat.com
Gerrit-Reviewer: Ayal Baron abaron@redhat.com
Gerrit-Reviewer: Federico Simoncelli fsimonce@redhat.com
Gerrit-Reviewer: Liron Ar laravot@redhat.com
Gerrit-Reviewer: Nir Soffer nsoffer@redhat.com
Gerrit-Reviewer: Sergey Gotliv sgotliv@redhat.com
Gerrit-Reviewer: Tal Nisan tnisan@redhat.com
Gerrit-Reviewer: oVirt Jenkins CI Server
Gerrit-HasComments: Yes

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Change in vdsm[master]: introducing downloadImageFromStream