Bob McWhirter wrote:
So to do this we'd need the drivers to support image upload, and the local image library would need to define the superset of required image attributes and metadata for all supported drivers. On the portal modeling side, this local image object will be a separate object type from the cloud image object we've already got. The local image object contains the actual image and can't be used directly to create an instance -- it's got to be uploaded into the cloud first. The cloud image object simply holds metadata for an image already hosted in a particular cloud provider -- with user permissions, etc.
I also wonder if we need an ingestion pipeline, to crack open images and tweak copies before sending them to the final provider. Things like installing the ec2-tools RPM when pushing to amazon, vmware-tools when pushing to a VMware provider, etc.
From my personal opinion and borrowing from a little grid computing experience, you'll need this approach and some generalized tools around it to be successful.
For some in-house lessons learned from MRG, take a look at Condor's file declaration in job submission, use of Stork to separate data transfer management with a URI/plugin system outside of job execution logic, use of job routing to transform job descriptions based on the new/different grid they're going into, etc. You might also want to look at other grant-funded research in datagrid systems like SRB or iRODS (irods.org). You'll need to either build or leverage existing technology for:
* A plug-in oriented data transfer management service that understands things like network bandwidth limiting, concurrency limits associated with various file service implementations, etc. * A rules engine for transforming the images. * A resource monitoring/management lingua franca that the rules engine can use. * An audit trail for proof of control over the disk images (auditing both transfers and changes made).
Again, my $.02. Just glad you guys are looking at addressing the problem set.
-- Lans Carstensen