Change in vdsm[master]: introducing downloadImageFromStream

16 Jan 2014


      Nir Soffer has posted comments on this change.
Change subject: introducing downloadImageFromStream
......................................................................
Patch Set 1:
(22 comments)
Partial review: imageSharing, commit message
http://gerrit.ovirt.org/#/c/23281/1//COMMIT_MSG
Commit Message:
Line 3: AuthorDate: 2014-01-12 13:52:24 +0200
Line 4: Commit:     Liron Aravot laravot@redhat.com
Line 5: CommitDate: 2014-01-15 16:43:38 +0200
Line 6: 
Line 7: introducing downloadImageFromStream
Introducing
I think this feature should have better title then the twisted "downloadFromStream". Download from stream does not make sense and is just a hack to get this working with minimal changes in the code.
Line 8: 
Line 9: This patch introduces the downloadImageFromStream verb in vdsm which
Line 10: allows to upload using streaming content to vdsm images.
Line 11: XML-RPC spec doesn't support streaming, as we don't want to use HTTP
Line 10: allows to upload using streaming content to vdsm images.
Line 11: XML-RPC spec doesn't support streaming, as we don't want to use HTTP
Line 12: server in addition to the current XML-RPC server (to not use another
Line 13: port) we will threat some requests as http requests instead of xml-rpc
Line 14: requests.
Streaming content is not correct - no streaming done here. This is simply adding upload support without the size limits implied by xmlrpc.
Line 15: 
Line 16: General upload information:
Line 17: ---------------------------------------------------------------
Line 18: PUT requests arriving to the server with content type of
Line 13: port) we will threat some requests as http requests instead of xml-rpc
Line 14: requests.
Line 15: 
Line 16: General upload information:
Line 17: ---------------------------------------------------------------
A more useful formatting would be:
General upload information
 --------------------------
content...
Line 18: PUT requests arriving to the server with content type of
Line 19: application/octet-stream to default paths that we use
Line 20: today for request handling ('/', '/RPC2') will be threated as
Line 21: upload requests.
Line 17: ---------------------------------------------------------------
Line 18: PUT requests arriving to the server with content type of
Line 19: application/octet-stream to default paths that we use
Line 20: today for request handling ('/', '/RPC2') will be threated as
Line 21: upload requests.
I think it would be more correct to limit the upload support to certain path, for example, "/storage".
Later we may want to extend this to download data directly from storage, and we would like to add GET verb. This will not work if we use the same path of xmlrpc handlers.
Upload to storage:
PUT /stoarage
Download from storage:
GET /storage
Line 22: 
Line 23: The PUT stream upload request inspected headers are:
Line 24: -- Mandatory headers:
Line 25:         *content-length
Line 27: -- Mandatory custom headers:
Line 28:         *X-Storage-Pool-Id : mandatory for succesfull execution
Line 29:         *X-Storage-Domain-Id: mandatory for succesfull execution
Line 30:         *X-Image-Id: mandatory for succesfull execution
Line 31:         *X-Volume-Id: mandatory for succesfull execution
Consider passing these in the request path:
PUT /storage/pool-id/domain-id/image-id/volume-id
GET /storage/pool-id/domain-id/image-id/volume-id
Processing these is quite simple:
...
...
...
'/data/p/d/i/v'[len('/data/'):].split('/')
['p', 'd', 'i', 'v']
Line 32: 
Line 33: -- Optional custom headers;
Line 34:         *X-Buffer-Size: optional, indicates the passed buffer
Line 35:
Line 30:         *X-Image-Id: mandatory for succesfull execution
Line 31:         *X-Volume-Id: mandatory for succesfull execution
Line 32: 
Line 33: -- Optional custom headers;
Line 34:         *X-Buffer-Size: optional, indicates the passed buffer
This is pointless as we discussed.
Line 35: 
Line 36: The server will return answers with the following codes:
Line 37: 200 - request completed, the response body will contain the execution
Line 38: results, the returned content type is 'application/json'
Line 36: The server will return answers with the following codes:
Line 37: 200 - request completed, the response body will contain the execution
Line 38: results, the returned content type is 'application/json'
Line 39: 500 - internal error
Line 40: 406 - missing content-length header
This list is missing some values, but do not fix it - we don't need to describe these details here, it just creates more work to synchronize the implementation with this message.
Line 41: 
Line 42: in case that a task was created, its id will be passed in the response
Line 43: using the custom header 'X-Task-Id' for polling by the engine.
Line 44:
Line 59: 2. schedule a "dummy" task with "blank" function that will just be used to
Line 60: indicate that there's an downloadFromStream execution going on while the
Line 61: request thread will be used for executing the actual download.
Line 62: 
Line 63: Option 2 is implemented with this patch.
I like the dummy task hack.
Line 64: 
Line 65: needed:
Line 66: *socket timeout inspection
Line 67: *tests?
http://gerrit.ovirt.org/#/c/23281/1/vdsm/BindingXMLRPC.py
File vdsm/BindingXMLRPC.py:
Line 161:                                              HEADER_CONTENT_TYPE))
Line 162:                 try:
Line 163:                     if ctype == 'application/octet-stream':
Line 164:                         contentLength = self.headers. \
Line 165:                             getheader(self.HEADER_CONTENT_LENGTH)
It makes more sense to convert content-length to integer here:
try:
      contentLength = int(contentLength)
  except ValueError:
      send 400 error...
Then we can pass the integer down into the application, and we don't have to add functions such as streamGetSize().
Line 166:                         if not contentLength:
Line 167:                             self.send_response(411)
Line 168:                             self.finish()
Line 169:                             return
Line 185:                             getheader(self.HEADER_CONTENT_LENGTH)
Line 186: 
Line 187:                         methodArgs = {'method': 'stream',
Line 188:                                       'fileObj': self.rfile,
Line 189:                                       'Content-Length': contentLength,
...
Use constant
Lets *not* use http constants to pass data to our implementation. This will only make our life harder, having to share the constant, while we can use our shorter and easier to work with term.
Lets use here "size" or "length" key for the content-length.
Line 190:                                       'Buffer-Size': bufferSize}
Line 191:                         image = API.Image(imgUUID, spUUID, sdUUID)
Line 192:                         response = image.downloadFromStream(methodArgs,
Line 193:                                                             volUUID)
http://gerrit.ovirt.org/#/c/23281/1/vdsm/storage/imageSharing.py
File vdsm/storage/imageSharing.py:
Line 48: 
Line 49:     return size
Line 50: 
Line 51: 
Line 52: def streamGetSize(methodArgs):
Unneeded
Line 53:     size = None
Line 54:     if 'Content-Length' in methodArgs:
Line 55:         size = int(methodArgs['Content-Length'])
Line 56:     return size
Line 59: def streamGetBufferSize(methodArgs):
Line 60:     size = None
Line 61:     if 'X-Buffer-Size' in methodArgs:
Line 62:         size = int(methodArgs['X-Buffer-Size'])
Line 63:     return size
Not needed
Line 64: 
Line 65: 
Line 66: def httpDownloadImage(dstImgPath, methodArgs):
Line 67:     curlImgWrap.download(methodArgs.get('url'), dstImgPath,
Line 73:                        methodArgs.get("headers", {}))
Line 74: 
Line 75: 
Line 76: def streamDownloadImage(dstImgPath, methodArgs):
Line 77:     bytes_left = streamGetSize(methodArgs)
This will not work as streamGetSize may return None, and byte_left -= bytes_read does not work with None.
What you want here is simply:
bytes_left = int(methodArsgs['Content-Length'])
It should never raise a key error since we never call it without this key, and if the value is not an int, we would rather die now.
Line 78:     buffer_size = streamGetBufferSize(methodArgs)
Line 79:     stream = methodArgs.get('fileObj')
Line 80: 
Line 81:     BS = constants.MEGAB
Line 78:     buffer_size = streamGetBufferSize(methodArgs)
Line 79:     stream = methodArgs.get('fileObj')
Line 80: 
Line 81:     BS = constants.MEGAB
Line 82:     count = bytes_left / BS
count is wrong, since bytes_left is not divisible by BS. For example, if we get 234 bytes, count will be 0 and we will write no data.
This dd should copy all data it receive until you close the pipe, writing in blocks of 1MB. I guess that not specifying a count wold do what you want, please check.
Line 83:     cmd = [constants.EXT_DD, "of=%s" % dstImgPath, "bs=%s" %
Line 84:            constants.MEGAB, "count=%d" % count]
Line 85:     p = utils.execCmd(cmd, sudo=False, sync=False)
Line 86:
Line 80: 
Line 81:     BS = constants.MEGAB
Line 82:     count = bytes_left / BS
Line 83:     cmd = [constants.EXT_DD, "of=%s" % dstImgPath, "bs=%s" %
Line 84:            constants.MEGAB, "count=%d" % count]
Check if more dd flags are needed for writing to storage - check how copyImage is implemented.
Line 85:     p = utils.execCmd(cmd, sudo=False, sync=False)
Line 86: 
Line 87:     default_read_size = buffer_size
Line 88:     if default_read_size is None:
Line 81:     BS = constants.MEGAB
Line 82:     count = bytes_left / BS
Line 83:     cmd = [constants.EXT_DD, "of=%s" % dstImgPath, "bs=%s" %
Line 84:            constants.MEGAB, "count=%d" % count]
Line 85:     p = utils.execCmd(cmd, sudo=False, sync=False)
Rest of this must be in a try-except block, to ensure that dd is killed on errors:
create process...
 try:
     copy data...
 except Exception:
     kill process
     raise
Line 86: 
Line 87:     default_read_size = buffer_size
Line 88:     if default_read_size is None:
Line 89:         default_read_size = BS
Line 83:     cmd = [constants.EXT_DD, "of=%s" % dstImgPath, "bs=%s" %
Line 84:            constants.MEGAB, "count=%d" % count]
Line 85:     p = utils.execCmd(cmd, sudo=False, sync=False)
Line 86: 
Line 87:     default_read_size = buffer_size
Use constant
Line 88:     if default_read_size is None:
Line 89:         default_read_size = BS
Line 90: 
Line 91:     while bytes_left > 0:
Line 85:     p = utils.execCmd(cmd, sudo=False, sync=False)
Line 86: 
Line 87:     default_read_size = buffer_size
Line 88:     if default_read_size is None:
Line 89:         default_read_size = BS
There is no point to read from a socket and write to a pipe in 1MB chunks, unless you increased the socket and pipe buffers to similar size. Even if you do this, this is not effective usage of your memory. So using this value is just misleading the reader.
I guess that 64KB would be a nice start value for this task.
Line 90: 
Line 91:     while bytes_left > 0:
Line 92:         i = min(default_read_size, bytes_left)
Line 93:         data = stream.read(i)
Line 88:     if default_read_size is None:
Line 89:         default_read_size = BS
Line 90: 
Line 91:     while bytes_left > 0:
Line 92:         i = min(default_read_size, bytes_left)
i is not a good name for a size variable in this context. People expect i to be an iteration variable, increasing or decreasing on each iteration.
How about bytes_to_read?
Line 93:         data = stream.read(i)
Line 94:         if not data:
Line 95:             break
Line 96:         p.stdin.write(data)
Line 91:     while bytes_left > 0:
Line 92:         i = min(default_read_size, bytes_left)
Line 93:         data = stream.read(i)
Line 94:         if not data:
Line 95:             break
You must kill the process here. Also, this unexpected situation, so you should better raise an exception here.
Line 96:         p.stdin.write(data)
Line 97:         p.stdin.flush()
Line 98:         bytes_left = bytes_left - i
Line 99:
Line 93:         data = stream.read(i)
Line 94:         if not data:
Line 95:             break
Line 96:         p.stdin.write(data)
Line 97:         p.stdin.flush()
Maybe put a note here so we remember to check if flush is needed. In a perfect world, the file object keeps a limited buffer size and automatically flush the buffer when the buffer is full.
Line 98:         bytes_left = bytes_left - i
Line 99: 
Line 100:     p.stdin.close()
Line 101:     p.wait()
Line 94:         if not data:
Line 95:             break
Line 96:         p.stdin.write(data)
Line 97:         p.stdin.flush()
Line 98:         bytes_left = bytes_left - i
You must assume partial reads:
bytes_left -= len(data)
Line 99: 
Line 100:     p.stdin.close()
Line 101:     p.wait()
Line 102:     if p.returncode != 0:
-- 
To view, visit http://gerrit.ovirt.org/23281
To unsubscribe, visit http://gerrit.ovirt.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I768b84799ed9fb2769c6d4240519d036f8988b99
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Liron Ar laravot@redhat.com
Gerrit-Reviewer: Ayal Baron abaron@redhat.com
Gerrit-Reviewer: Federico Simoncelli fsimonce@redhat.com
Gerrit-Reviewer: Nir Soffer nsoffer@redhat.com
Gerrit-Reviewer: Sergey Gotliv sgotliv@redhat.com
Gerrit-Reviewer: Tal Nisan tnisan@redhat.com
Gerrit-Reviewer: oVirt Jenkins CI Server
Gerrit-HasComments: Yes

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Change in vdsm[master]: introducing downloadImageFromStream