Hi,
I was just chatting with Hugh, and we went another round or two on the 'Is libcloud stateless or stateful?' question.
Mostly to clear my head, some thoughts on that: there's two usage scenarios I'd want to capture with libcloud
1. Steve the Scripter writes a few Python scripts to make his miserable, awful, pitisome existence as a developer easier; some of these scripts need to start/stop virtual machines in the cloud (maybe a JBoss cluster from Ivan the Integrator) - Steve uses libcloud to make sure that he can move from cloud A to cloud B without having to rewrite all those scripts. 2. Don the Developer is writing a cloud portal UI. He uses libcloud in the backend of his portal so that he can claim 100% compatibility with N (fslvo of N) popular clouds.
I think these two uses cover a large percentage of the current EC2 API uses; they differ tremendously in the amount of deployment pain Don and Steve are willing to go through - Steve is perfectly happy with a very basic API, but would prefer libcloud setup be as simple as possible, ideally just installing a library, while Don wants to enable lots of spiffy features in his cloud portal UI (SSO, accounting, image mgmt, ...) and is happy with a more complex setup involving lots of auxiliary services (DB, web server, ...).
To serve Steve, we'd want a stateless, thin library; Steve might have to use different ways to authenticate from his script for different clouds, and therefore change that part of his script when he switches to a different cloud backend, but he'll be happy with that as long as those changes are localized and don't require a big rewrite of his scripts. IOW, authentication should be the only area where he needs to be aware of the concrete backend he's using - this usage is very close to what libvirt does.
Ideally, libcloud deployment for Steve involves little more than installing an RPM (and maybe starting a service) - but it should definitely not require setting up a fullblown DB server, manually loading schema etc.
How should such a stateless API be exposed ? I think an in-process DSO-style API is out of the question, since it would either require doing SOAP calls from C (yuck) or tying the API to a specific implementation language. That means that it needs to be exposed by some RPC mechanism, i.e. will require running a server; and since we already want to expose a REST API, some sort of web server makes the most sense.
To make both Don and Steve happy, we should consider either two different ways to run the server: stateless with reduced functionality as a fairly direct API-passthrough, and stateful with value-add features like accounting etc. 'Stateless' might actually not be completely stateless: we could still offer simple deployment if we use a sqlite backend to store some basic data like auth credentials for different clouds. (Completely stateless has the disadvantage that backend cloud credentials only live for the lifetime of the server, which leads to interesting error handling for the client)
For both Don and Steve a very basic session to start a bunch of VM's might look like
1. Authenticate to libcloud 2. Register credentials for backend clouds 3. List available backend clouds[1] (Steve has Alzheimer's and forgot what happened in step (2)) 4. List available templates[2] for each backend cloud 5. Run some VM's based on the available templates 6. Shut the VM's down
David
[1] At that point, a cloud is an abstraction that consists of things like a cost/billing model and resource quotas; what specific cloud we are talking about should be immaterial
[2] This is one place in the API where we need to decide whether we want to expose the concept of 'VM image' as distinct from 'VM template' in the API