On Wed, Jul 20, 2011 at 12:20:43PM +0100, Mark McLoughlin wrote:
On Tue, 2011-07-19 at 14:11 -0400, Hugh Brock wrote:
Hello all.
With release 0.3.0 about ready to ship it seems like a good time to start talking about features we'd like to see for 0.4.0. I'd like to continue the three-month release cycle we've been on, so that puts our next one around mid-October.
You know, looking at this list, I really wonder - why not release earlier and oftener?
we don't have a massive amount of sub-projects; we should be able to turn around a release very quickly and the more often we do it, the smoother the process will be
we're not intentionally breaking anything on a regular basis, so there shouldn't be any reason not to release more often
shorter release cycles means the goals for each cycle won't be as far reaching and hand-wavey, instead it would be a much more specific set of tasks and features
Why not aim for e.g. every 3 weeks? Or perhaps every 2 weeks?
I have no problem with releasing more often, as long as we do that without incurring the overhead of a full QE cycle. If people will be happy with a minor/major type of release setup where we do frequent releases but only do a really solid release once every 3 months or so, I see no problem with that.
Below are some obvious buckets, please feel free to suggest additional features large or small.
Finally, note I'm not making any claim that the list below is achievable in the timeframe we're talking about (although I would hope it's not that far from what is achievable). I'm more thinking in terms of what would make our 0.4.0 release seem like a coherent whole, and make the largest number of upstream users interested and happy.
I'll start with Conductor features:
Authorization. We have a fair amount of authorization checking in place, but no way to actually set who can do what. Given that a central Conductor feature is the ability to control access to cloud resources, this seems like an important feature. Things we'll need to put this in place:
UX around setting permissions
UX around displaying appropriate "You can't do that" messages where required, or showing/hiding controls as appropriate
Good tests
Not much model code -- I think it's all mostly in place. Correct me if I'm wrong.
I'd characterize this all as "paying closer attention to the self-service UI".
Perhaps simply a 'create_self_service_user' rake task to go along with our 'create_admin_user' task would help an awful lot?
i.e. set up a self-service user by default for developers and encourage everyone to test the UI using both users.
Identity and encryption. Authorization doesn't do a lot of good if anyone can bumble along and impersonate anyone else, so it would be pretty nice to have at least a workaday identity and encryption setup. Conversations with potential users have suggested the following minimum features, feel free to suggest your own:
Conductor will authenticate against an LDAP server. Since most LDAP servers in the real world are Windows Active Directory, we should probably include AD in the set of servers we test against.
Fall back to local user data store, maybe? You can imagine needing a local admin user that isn't in LDAP, for example
Be able to proxy identity when talking to other things that need to know it. Checking identity when saving things to/retrieving things from Image Warehouse is the main requirement for this. I think it's getting a GSSAPI library soon which should help. We will also probably need this for Katello, when we get to talking to it. FWIW Katello is currently using two-legged OAuth for this, so I would think this would be the primary candidate for us too.
A way to encrypt the traffic between Conductor, Deltacloud API, Warehouse, and Katello. The obvious solution for this is ssl certs that are created and signed by the installer, with some way to update/revoke them.
Well, there's a few things going on here:
Integrating with existing identity providers would be nice - the common example is LDAP. If you're using Aeolus within a corporate environment which has LDAP or AD, this would be desirable. But OpenID and OAuth etc. would be nice too.
(There's lots of questions around this - e.g. policy for self-service users, whether the admin user can be in the federated identity store etc.)
Authentication, authorization and permissions in iwhd
Authentication and authorization in imagefactory - e.g. you can't have an owner for an image in iwhd, unless imagefactory knows what user is building the image
Allowing deltacloud, iwhd and imagefactory to be deployed on different machines; it's only at this point you need to encrypt the communication to each
Considerations about what other projects like Katello need if they are going to build on (parts of?) Aeolus
OK, I'm willing to admit that's a better list than the one I made...
Admin UX work
We need to give the pool, pool family, and provider management screens the same loving treatment we have given the instance management screens.
We need to make sure self-service really is sane. A big part of self service is image visibility -- i.e. who can launch what where (VMWare's "Catalog" concept answers this requirement for them). A good self-service solution is going to take thinking through some use cases and some serious UX work as well.
I'd really like to see a front door to the Conductor app. I'm afraid to call it a "dashboard" because then it will never get built :). I'd love suggestions for what should appear on such a thing.
Other UX work
- I think we should be able to launch single images from Conductor without requiring a deployable XML. To make that easier for users, it would be nice if there was some UI for displaying images that are available to launch.
Absolutely.
The notion of managing single instances would be required for conductor to expose the deltacloud API too.
Status reporting
We should reliably display the status of a running instance and its uptime
We should start thinking about how we will handle the richer data about instance health that we will get once Matahari is in place
What kind of monitoring data are we talking about, specifically? Why are we assuming Matahari is the solution here?
Well, I think we're talking about the kind of instance health data you would get from virt-top and (possibly) virt-dmesg. Unfortunately in a cloud environment you can't get to the host and use those tools, so we have to settle for an in-instance agent, which we have been saying is going to be Matahari.
Having said all that, I don't think Conductor cares where the data comes from -- I think we really just need to start thinking about how we display it.
Users should be able to view an audit trail of events for an instance or a set of instances
Users should be able to export those events
Are we simply talking about start/stop events?
To begin with, yes -- but with better monitoring we'll have more of them.
API
- We've been saying for a very long time that we need a real API for managing Conductor and for doing instance stuff in Conductor. If we admit that we have to manage instances that are not part of deployments, then we can also just say that the Deltacloud API we expose only works for instances. I think this is good enough for the next release.
Right, a deltacloud API implementation in conductor for instances and images should be the first goal.
It's tempting to think that adding the admin API is a simpler task, but I think my summary showed that it's not as straightforward as it seems:
https://fedorahosted.org/pipermail/aeolus-devel/2011-July/002883.html
Yes.
Infrastructure-around-Conductor features:
Identity and encryption. In addition to the bits that go in Conductor proper, there's going to be a lot of work in the installer and in other projects nearby.
Better self-monitoring. I'd like to see a quick shell command that will give a meaningful report of the status of all the app components.
Way better logging and error reporting.
- All components should be using syslog if at all possible
Why syslog?
My thinking here was that we should, to the extent we can, be using logging facilities we don't have to manage ourselves. I doubt syslog is appropriate for the Rails app, but I would think it would be appropriate for IWHD for example. I'm ultimately more interested in getting the logs managed, rotated properly, and put in a well known location for support though.
Logs should be timestamped
We should not be logging credentials or things that are potentially embarassing
Components can be distributed across multiple machines
RHEV-M 3.0 really works as a cloud provider.
"Orchestrator" features (even though these aren't yet separate components, I've bracketed off stuff that concerns post-boot and multi-instance operations as conceptually different topics to work on)
Assemblies
- Users can define assemblies that cause the post-boot config apparatus to install software and set config parameters on instances when they check in after booting
Deployables and deployments
Users can define deployables that contain multiple assemblies.
Users can specify parameters that should be collected from a user when the user launches the deployable.
Users can direct that parameters collected from a user be interpolated in arbitrary spots in the deployable descriptor.
There is a UI for collecting parameters from the launching user
There is a mechanism for passing all the assembly and deployable config information through to the post-boot agent. (I think this could use user-data, *or* a config server.)
Authorization
- Should there be some way of restricting the assemblies/deployables that a user can launch on particular hardware?
Okay, some other things that occur to me:
Move to Rails 3
Re-instate searching in the UI, possibly using scoped_search
At least one single real life example of templates and deployables in use
All good choices. I hope we'll have rails 3 sorted by the beginning of this iteration.
Thanks, I will incorporate your thoughts in the next revision of the doc.
--H