OK, so I've been mulling this over after some talks with Jay last week. Thought I'd put something in writing. This isn't intended to be a solution, rather the intent is to capture thoughts on what may be needed for the design and then we can discuss the best implementation.
It is not clear to me where to draw the line for the component responsible for various pieces of functionality. I will list *most everything* I can think of here and the ensuing discussions can help sort out where that functionality should lie.
In lieu of hard requirements, I have listed some assumptions that I'm starting with to frame the discussion. If you feel that any of these are incorrect, please speak up so we don't spend too much time on bad assumptions.
Assumptions
- For the sake of this discussion, I am listing a lot of functionality that may not be in the initial release. Best to consider it now and
make sure the design can handle it. Also some of this function will most likely not end up in the monitoring component, but the functionality needs to be discussed, if for no reason other to explain what the trade offs are...
- the use of this monitoring data will need to cover one to many users. a) There is no real upper bound to the number of users at this point in time. However we should decide on the order of magnitude (1K, 100K,
etc) b) must be able to provide data to each individual user concurrently. c) Must provide the capability to arrange users and aggregate data in groups and in a hierarchical manner. (This would an example of I am stating some functionality that some Deltacloud component should provide, I mention it here as it could impact the design even if the "ownership" ends up elsewhere. Best to decide "where does this functionality belong?" in this discussion and understand its ramifications.)
[jg] I believe this one is outside the scope of spectre, and would be handled by pools/users/permissions in deltacloud portal (though perhaps there could be some simplified representation here of this, if it makes sense.) I agree that it may have an impact on our design here though.
- Billing does not imply a simple end of the month run, but rather the ability to monitor usage and calculate running "balance" (but does not need to consider payments outside of "reset" functionality.) a) down the road we will want to be able to provide some enforcement of spending (Jay is only budgeted to spend $100, don't let him go over it) As Jay has pointed out, this is moist likely outside the scope of
Spectre, except for:(but this is why we need to understand the entire problem) 1. Spectre needs to collect whatever data portal requires for billing 2. Archiving of this information beyond the time a provider may do so needs to be accounted for by Spectre and configurable per provider
- Data collection from the various clouds will be a "pull" event. In my mind, pull is preferred because it allow the app to control the load it generates. With a push model, it could leave to server overload.
[jg] I had not considered this as a potential issue. There had been some talk of hoping for push to be supported by some vendors in the future. Maybe it is just my web background, but polling feels less efficient to me.
a) It also needs to be dynamic. So as the user adds additional VM's
the data for those also is gathered. So "poll" is still technically what is happening,we just need a dynamic polling mechanism.
[jg] Some kind of trigger may be a good idea, but could we not also get new vms and such on the next scheduled 'poll' of information?
5) For monitoring, data will be collected on all "active" Deltacloud users.
Here "active" user implies that the user is logged in; or someone is monitoring the group that the user is part of. a) This is just a "stake in the ground. We really should investigate whether its more efficient to get all the data all the time or only as needed. b) if we only get data as needed, we will need some extra logic to determine if there are "holes" in the data and catch up as needed. ** Jay has mentioned that some flexibility should be built in here, there may be providers who charge for accessing the data based on the access rate.
- In general. I can see a need for "business logic" specific to each vendors Cloud. This will include billing rates, types of data available, etc. I'm wondering if there is a "global" way to handle this in Deltacloud already... a) At some level, the data retrieval / monitoring aspects will need to have cloud specific knowledge. For instance, CloudWatch keeps the
data for two weeks. Exactly where in the design this information needs to be kept still needs to be determined. My main goal for thinking of a business layer is to allow for a vendor specific "modules" to be created and used w/o any changes to the underlying stats layer.
[jg] Agreed, I see this potentially varying a vast amount from one provider to the next. However, I think the business logic for spectre should be limited to retrieving the information available, with any provider-specific logic being more for aggregation or other data views (though perhaps there could be provider-specific modules on the retrieval side of the api). Billing I see being handled in the portal (or possibly even the framework, since that has provider-specific drivers also, so it could encapsulate billing logic in a common api that the portal could use).
- It is assumed that any Red Hat cloud product will be treated like JAC (Just Another Cloud).
- Users, this could be one of the more complicated pieces to get right. My initial thought is a single user can have many different cloud associations. However its not clear if we need to track the case where a single cloud account could be shared between users and we need to
account for each users usage. Again, please keep in mind that this document is intended to cover all the functionality we see down the road, so it may not an initial release goal but we may need to plan for it.
So my thoughts are that the user management tracking, etc is done under some other Deltacloud component and Spectre will just need to be able store and retrieve data based on some type of unique user / cloud identifier. (hey, its hard to pull implementation out of design)
[jg]To help clarify (and make sure I am right), I believe with what we have so far on the portal side of the design, we would have cloud account X. As far as the provider is concerned, X is the only account. Portal, however, may have N accounts associated with X. Each of these accounts can have 0 or more VMs (and possibly other things), so on the monitoring side, if we can find things based on account X and some identifier for the actual item requested, I think that will get us what we need w/o building in too much business logic that is already handled elsewhere.
However we still need to define things like a data request. Should we require the caller to be very specific about the user cloud combinations? Ans lots of other stuff like that...
- Spectre will need to be able to provide data to a caller for a specific
cloud in a manner that is more efficient than recursively walking a list of users. Thus Spectre must be able to retrieve data based on user or cloud or a combination of both.
- I would expect that the Spectre functionality is provided in a manner
that would allow for it to be distributed, federated or as a "module" to some other application. My initial thought would be a web service type of architecture, but lets not ahead of ourselves...
[jg] Agreed on both points.
Straw man proposal (http://en.wikipedia.org/wiki/Straw_man_proposal)
In general, we need to define something that stores and retrieves data, that has clear APIs for both the top and bottom layers. For this discussion, I would propose that we view Spectre with three distinct layers:
- On the "top" level, we need to define methods for data insertion and
retrieval. 2) The middle layer is basically the traffic cop, it is also the least defined 3) The bottom level provides the interaction to the data store(s). For the bottom level, we need to define an API that we will call to interact with the data store(s). I will dive a little deeper into these layers below.
I am assuming that specific interfaces for both the top and bottom layers will need to be created in order to interface to different cloud vendors or data stores. For the sake of discussion, I will refer to these as "modules" although the intent is to help facilitate the discussion, not to imply a specific implementation.
One of the main goals of the resultant architecture is that a new vendor specific module can be created for either the top or bottom layers without impacting design / code of the middle Stats layer.
Lets try this bottom up.
The goal of defining an API at the bottom this stack is to allow for different data stores to be used. By clearly defining the API, anyone should be able to write their own interface layer to the data store. Examples of data stores would be RRD, memcache, mySQL, etc.
[jg] I think I agree with this sentiment, but believe you are proposing
one level 'down' from what I originally thought. So, we already have the concept of provider 'drivers' for collecting the info. Lets say, for the sake of argument that we extended our existing 'stats' module to include write functionality. I think the driver would then call stats to save whatever it had collected (maybe too implementation-specific, but not sure how else to describe). Stats in turn would call the actual 'save' against whatever datastore is enabled (this being the additional layer I think you are describing). Is this what you meant?
This would seem to imply that the data store specific module be able to
translate a request for "Marks EC2 usage data from last week" into the specific language needed to access its data store. It will need to format the response from its data store back into a common (but yet to be defined) intermediate data format and passed "up the stack".
[jg] I was not really thinking of multiple potential languages, mainly just ruby, but this seems like a reasonable requirement as long as we keep it as simple as possible so we don't get bogged down in implementation. First thought would be xml, as most languages can handle that.
Configuration data (User credentials, db name, directory structure, etc. )
- Two ways to go here.
a) Each data store module at the should be responsible for the required configuration information. This should be loaded by the module when it is initialized. In other words, the data service should not need to know any of the details. b) Alternatively, the API should include a "required" set of calls that the service can call to retrieve the configuration data required and store the information. (The module indicates it needs dbname, userid, pwd, etc) This way the overall service would still be used for controlling the configuration. The parameters would need to be supplied by the module so that the data service could remain data store agnostic.
Not sure if we should allow the use of multiple modules simultaneously. It would be nice to use a memcache-like mechanism for some things to avoid more expensive lookups.... This could be something like a write through cache. Lots to discuss...
[jg] Initially I thought just one for storage, but upon further reflection, it could be really nice to be able to 'layer' this, imo. So perhaps check a cache of some sort first, then move down to true storage layer if data is not found.
Middle Layer
This is the traffic cop of the data service. It could be fairly lightweight, just taking input, validating it and and passing it through. For data requests, it could some coalescing of requests; provide the ability to look in a local data cache, etc.
One major area that will need to be looked at is security. This layer would seem to be the location where any security would be implemented. Not sure how much we want or need.
If this layer does any work on data, it will be the "intermediate format"
Top Layer
This is layer that is called to store or retrieve data. My initial thoughts are that while these two types of functionality are at the same level, they are drastically different in what they do so I'll treat them separately.
[jg] One thought, since this is what everything would call, if we have auth
in the middle layer, then top needs to accept credentials and pass them down to middle layer.
Data Input API ("mystery data collection module")
This provides an API for storing data in the data service. It will be called by the cloud specific module. My thoughts are that these modules will be used to pull data from the cloud and push it into the data store.
It is the responsibility of the module to take the data from the cloud and translate it into the intermediate data format.
It should be possible for many different modules to access this API in parallel.
Data Retrieval
There are clearly needs to provide data back to a caller in different formats. This API will need to support that. I think the main design decision would be how to structure this. I am almost thinking that sticking with the module design will work well. This will allow end users to add there own modules. It will also allow for mechanisms to allow more levels of data processing to be added w/o polluting the main API. For instance, you could create a module to compute rolling averages.
So this layer will need to some thought to choose the right solution for future needs and maintainability. (Hint, its easier to drop support for a module than to change the main API down the road...)
This level must be able to translate a request for "Marks EC2 usage data from last week" into the intermediate language.
It should go w/o saying that this API must support concurrent access.
Higher Level questions
So some design questions that will hopefully lead us to pick the right solution...
- do we need to provide synch APIs, asynch APIs or both ?
[jg] my inclination would be both.
- do we support a data stream vs "one shot" (for instance, do we
provide a call to allow a continuous stream of data in or out Spectre?
[jg] I think this would be nice to allow, so all clients do not have to poll.
- how long should we be storing data ?
Next Steps
Start discussions based on the above content
Identify vendors and investigate the requirements for getting data from
different clouds. (EC2, vmware, RHEV-M, rackspace ?)
- look at high level questions, build requirements.
spectre-devel mailing list spectre-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/spectre-devel
deltacloud-devel@lists.fedorahosted.org