OK, so I've been mulling this over after some talks with Jay last week. Thought I'd put something in writing. This isn't intended to be a solution, rather the intent is to capture thoughts on what may be needed for the design and then we can discuss the best implementation.
It is not clear to me where to draw the line for the component responsible for various pieces of functionality. I will list *most everything* I can think of here and the ensuing discussions can help sort out where that functionality should lie.
In lieu of hard requirements, I have listed some assumptions that I'm starting with to frame the discussion. If you feel that any of these are incorrect, please speak up so we don't spend too much time on bad assumptions.
Assumptions ----------- 1) For the sake of this discussion, I am listing a lot of functionality that may not be in the initial release. Best to consider it now and make sure the design can handle it. Also some of this function will most likely not end up in the monitoring component, but the functionality needs to be discussed, if for no reason other to explain what the trade offs are...
2) the use of this monitoring data will need to cover one to many users. a) There is no real upper bound to the number of users at this point in time. However we should decide on the order of magnitude (1K, 100K, etc) b) must be able to provide data to each individual user concurrently. c) Must provide the capability to arrange users and aggregate data in groups and in a hierarchical manner. (This would an example of I am stating some functionality that some Deltacloud component should provide, I mention it here as it could impact the design even if the "ownership" ends up elsewhere. Best to decide "where does this functionality belong?" in this discussion and understand its ramifications.)
3) Billing does not imply a simple end of the month run, but rather the ability to monitor usage and calculate running "balance" (but does not need to consider payments outside of "reset" functionality.) a) down the road we will want to be able to provide some enforcement of spending (Jay is only budgeted to spend $100, don't let him go over it) As Jay has pointed out, this is moist likely outside the scope of Spectre, except for:(but this is why we need to understand the entire problem) 1. Spectre needs to collect whatever data portal requires for billing 2. Archiving of this information beyond the time a provider may do so needs to be accounted for by Spectre and configurable per provider
4) Data collection from the various clouds will be a "pull" event. In my mind, pull is preferred because it allow the app to control the load it generates. With a push model, it could leave to server overload. a) It also needs to be dynamic. So as the user adds additional VM's the data for those also is gathered. So "poll" is still technically what is happening,we just need a dynamic polling mechanism.
5) For monitoring, data will be collected on all "active" Deltacloud users. Here "active" user implies that the user is logged in; or someone is monitoring the group that the user is part of. a) This is just a "stake in the ground. We really should investigate whether its more efficient to get all the data all the time or only as needed. b) if we only get data as needed, we will need some extra logic to determine if there are "holes" in the data and catch up as needed. ** Jay has mentioned that some flexibility should be built in here, there may be providers who charge for accessing the data based on the access rate.
6) In general. I can see a need for "business logic" specific to each vendors Cloud. This will include billing rates, types of data available, etc. I'm wondering if there is a "global" way to handle this in Deltacloud already... a) At some level, the data retrieval / monitoring aspects will need to have cloud specific knowledge. For instance, CloudWatch keeps the data for two weeks. Exactly where in the design this information needs to be kept still needs to be determined. My main goal for thinking of a business layer is to allow for a vendor specific "modules" to be created and used w/o any changes to the underlying stats layer.
7) It is assumed that any Red Hat cloud product will be treated like JAC (Just Another Cloud).
8) Users, this could be one of the more complicated pieces to get right. My initial thought is a single user can have many different cloud associations. However its not clear if we need to track the case where a single cloud account could be shared between users and we need to account for each users usage. Again, please keep in mind that this document is intended to cover all the functionality we see down the road, so it may not an initial release goal but we may need to plan for it.
So my thoughts are that the user management tracking, etc is done under some other Deltacloud component and Spectre will just need to be able store and retrieve data based on some type of unique user / cloud identifier. (hey, its hard to pull implementation out of design)
However we still need to define things like a data request. Should we require the caller to be very specific about the user cloud combinations? Ans lots of other stuff like that...
9) Spectre will need to be able to provide data to a caller for a specific cloud in a manner that is more efficient than recursively walking a list of users. Thus Spectre must be able to retrieve data based on user or cloud or a combination of both.
10) I would expect that the Spectre functionality is provided in a manner that would allow for it to be distributed, federated or as a "module" to some other application. My initial thought would be a web service type of architecture, but lets not ahead of ourselves...
Straw man proposal (http://en.wikipedia.org/wiki/Straw_man_proposal) ------------------
In general, we need to define something that stores and retrieves data, that has clear APIs for both the top and bottom layers. For this discussion, I would propose that we view Spectre with three distinct layers: 1) On the "top" level, we need to define methods for data insertion and retrieval. 2) The middle layer is basically the traffic cop, it is also the least defined 3) The bottom level provides the interaction to the data store(s). For the bottom level, we need to define an API that we will call to interact with the data store(s). I will dive a little deeper into these layers below.
I am assuming that specific interfaces for both the top and bottom layers will need to be created in order to interface to different cloud vendors or data stores. For the sake of discussion, I will refer to these as "modules" although the intent is to help facilitate the discussion, not to imply a specific implementation.
One of the main goals of the resultant architecture is that a new vendor specific module can be created for either the top or bottom layers without impacting design / code of the middle Stats layer.
Lets try this bottom up. ------------------------- The goal of defining an API at the bottom this stack is to allow for different data stores to be used. By clearly defining the API, anyone should be able to write their own interface layer to the data store. Examples of data stores would be RRD, memcache, mySQL, etc.
This would seem to imply that the data store specific module be able to translate a request for "Marks EC2 usage data from last week" into the specific language needed to access its data store. It will need to format the response from its data store back into a common (but yet to be defined) intermediate data format and passed "up the stack".
Configuration data (User credentials, db name, directory structure, etc. ) - Two ways to go here. a) Each data store module at the should be responsible for the required configuration information. This should be loaded by the module when it is initialized. In other words, the data service should not need to know any of the details. b) Alternatively, the API should include a "required" set of calls that the service can call to retrieve the configuration data required and store the information. (The module indicates it needs dbname, userid, pwd, etc) This way the overall service would still be used for controlling the configuration. The parameters would need to be supplied by the module so that the data service could remain data store agnostic.
Not sure if we should allow the use of multiple modules simultaneously. It would be nice to use a memcache-like mechanism for some things to avoid more expensive lookups.... This could be something like a write through cache. Lots to discuss...
Middle Layer ------------
This is the traffic cop of the data service. It could be fairly lightweight, just taking input, validating it and and passing it through. For data requests, it could some coalescing of requests; provide the ability to look in a local data cache, etc.
One major area that will need to be looked at is security. This layer would seem to be the location where any security would be implemented. Not sure how much we want or need.
If this layer does any work on data, it will be the "intermediate format"
Top Layer ---------
This is layer that is called to store or retrieve data. My initial thoughts are that while these two types of functionality are at the same level, they are drastically different in what they do so I'll treat them separately.
Data Input API ("mystery data collection module") --------------
This provides an API for storing data in the data service. It will be called by the cloud specific module. My thoughts are that these modules will be used to pull data from the cloud and push it into the data store.
It is the responsibility of the module to take the data from the cloud and translate it into the intermediate data format.
It should be possible for many different modules to access this API in parallel.
Data Retrieval --------------- There are clearly needs to provide data back to a caller in different formats. This API will need to support that. I think the main design decision would be how to structure this. I am almost thinking that sticking with the module design will work well. This will allow end users to add there own modules. It will also allow for mechanisms to allow more levels of data processing to be added w/o polluting the main API. For instance, you could create a module to compute rolling averages.
So this layer will need to some thought to choose the right solution for future needs and maintainability. (Hint, its easier to drop support for a module than to change the main API down the road...)
This level must be able to translate a request for "Marks EC2 usage data from last week" into the intermediate language.
It should go w/o saying that this API must support concurrent access.
Higher Level questions ----------------------- So some design questions that will hopefully lead us to pick the right solution... 1) do we need to provide synch APIs, asynch APIs or both ?
2) do we support a data stream vs "one shot" (for instance, do we provide a call to allow a continuous stream of data in or out Spectre?
3) how long should we be storing data ?
Next Steps ------------- 1) Start discussions based on the above content
2) Identify vendors and investigate the requirements for getting data from different clouds. (EC2, vmware, RHEV-M, rackspace ?)
3) look at high level questions, build requirements.
deltacloud-devel@lists.fedorahosted.org