On 06/28/2011 10:47 AM, Greg Blomquist wrote:
On 06/28/2011 12:05 PM, Perry Myers wrote:
On 06/28/2011 10:26 AM, Steven Dake wrote:
On 06/28/2011 06:24 AM, Joseph VLcek wrote:
On Mon, 2011-06-27 at 13:59 -0700, Steven Dake wrote:
On 06/24/2011 04:16 PM, Steven Dake wrote:
Currently most linux distributions that use dbus store a UUID in /var/lib/dbus/machine_id. In our pacemaker-cloud work test tools, we must manipulate this file via oz image creation to match a value we know about.
Q1. Is this file freshly created on each image creation/cloning process?
If not, it should be, because Matahari uses this information to uniquely identify a host. If it is copied exactly to each new image, that creates a problem (all hosts appear the same to matahari).
Q2. If/when it is created by image factory, will it be stored in a database or other storage medium?
In pacemaker-cloud we need to have a mapping from image->internal id id so that we know which VM maps to which deployable HA configuration.
If we wait on this point until after our 1.0 release, we could end up with a bunch of images in the field that have either the same machine id or are not mapped in any way that allows us to provide HA functionality.
Regards -steve
After more investigation, Chris and I came up with a workable plan for handling unique VM ids (see thread with subject How Audrey, Conductor, and Audrey's config server interact and their relationship to a unique vm instance id).
Currently the Audrey script runs as a replacement to rc.local. Matahari runs at S99. The general idea is for Audrey script to run as S98 (before Matahari) and write a management-wide unique instance UUID to the file /etc/vm_machine_id (audrey has access to this information).
Matahari could be changed to read /etc/vm_machine_id first. If that file doesn't exist, /var/lib/dbus/machine_id would be read.
This creates some difficulty in running the audrey script at a specific runlevel (it requires some changes to oz to insert init scripts).
Another option is for the current rc.local script that audrey replaces to run the Matahari service starting giblits as its first action.
Comments welcome before I start writing code... -steve
I think the solution of having Audrey store a launch time unique UUID in a file that Matahari can read will work.
It may not be necessary to alter oz to insert init scripts to ensure Audrey runs before Matahari.
Let me explain.
Audrey does not replace /etc/rc.local
When Image Factory builds the image it appends to the end of /etc/rc.local a line of code that will start Audrey.
e.g.: [ -f /usr/bin/audrey ] && /usr/bin/audrey
I propose having Image Factory append another line to /etc/rc.local below where it starts Audrey to start Matahari.
e.g.:
[ -f /usr/bin/audrey ] && /usr/bin/audrey
[ -f <Matahari start> ] && <run Matahari start>
We may need to manage timing to ensure /usr/bin/audrey does not return until it has stored the unique UUID in a file and have it return an error status if it is unable to.
Thoughts?
This sounds fine to me, although the rc.local modification always sounded a bit hacky (rather then using a proper init script). One minor issue is matahari expects to be started via service xxx start and each agent has a separate init script. There are 5 or 6 agents.
Yes, I think having audrey 'start matahari' is very hackish, since in normal systems matahari would start via regular init scripts. So this means for cloud we'd need to disable the normal init scripts and then relegate control to audrey. I don't like that approach...
Let's back up a bit... why does matahari start need to depend on audrey starting? The answer is that we need audrey to put in the /etc/machine-id file.
Hmm...I'm not sure I buy this. I don't think there needs to be even this level of dependency. Why can't another mechanism be in charge of putting the /etc/machine-id in place?
It still looks like matahari and audrey both require a machine-id (and even pacemaker?). If they could rely on the same machine id for their individual purposes, then I think we're closer to a better answer.
So, how do we get that machine-id in place? With ec2, it's "user data". I honestly think with our own cloud offerings (rhev-m) we need a solution for this, too. With rhev-m 3, it's supposed to be solved with hooks (I don't claim to know how those work, just that they've been proposed as the solution for injecting data into a launching instance).
I really think that cloud engine should be responsible for creating this machine-id and injecting it into the launching instance. This gives us one place to generate this id and doesn't require mapping this id to other ids in order to translate a single instance between services that want to interact with the instance.
This puts all the logic back into deltacloud. It becomes deltacloud's responsibility to figure out how to interact with the various cloud providers, so that no matter in what provider an instance is running, we can always get to the machine-id for that instance.
Am I seeing this completely upside down? Or, off in the trees by myself on this?
I had originally thought deltacloud might be involved here in some way to consistently load instance data into the instance (and then provide a consistent way of accessing it inside the vm that doesn't involve writing 10 different access mechanisms).
I don't know enough about deltacloud, but if the insertion functionality is currently a gap in any virtual machine manager (such as paravirt Xen) or RHEVH, the entire Aeolus solution as well as cloud-ha becomes non-functional on those VMMs.
The question for deltacloud evaluation of this functionality then boils down to: 1) is this functionality available on every deltacloud cloud provider implementation 2) is the functionality inside the vm consistent (does a user only call a "read_my_id" function or read a file inside the vm) - if library access what is the new dependency inside the vm?
Regards -steve
Well, why not let matahari start at its normal runlevel, and respond to queries, etc, but if you call get-id API, then it should return something that indicates '/etc/machine-id not set yet, giving you dbus-id instead'
Then it's just up to the person doing the querying to wait until the id returned is the /etc/machine-id.
So remove the dep on service start and replace with intelligent application usage
Thoughts?
Perry _______________________________________________ aeolus-devel mailing list aeolus-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/aeolus-devel
aeolus-devel mailing list aeolus-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/aeolus-devel