Hi,
I was asked this afternoon to look into the bug where some instances (particularly on vSphere, and possibly on other providers) reported a MAC address instead of an IP. There are a few things going on and I wanted to update everybody on the status. We're going to need to wrap this up tomorrow.
The problem is that before the instance has obtained an IP, all that's available is its MAC address. This MAC gets reported over deltacloud as the IP address, so EventLog has this as the "IP" and dbomatic picks it up and dutifully stores it in the database. But there is no Condor event triggered on the change of an IP; it's not considered a state change. Ergo, we never notice that the MAC has been replaced by an IP.
Worse, after spending a bit trying to just read the data out of Condor, we discovered that Condor is never updated with the IP at all in these cases. Even long after deltacloud starts reporting an IP and Condor polls for status, the MAC stores as an IP in Condor is never updated.
It looks like deltacloud will be updated tomorrow to split out the 'address' tag to have type="mac" and "type="ipv4", but this won't fully solve our problem -- we'll be able to tell the difference between MACs and IPs now, but we'll still have the problem of not getting an IP in Conductor.
Short-term, the fix we identified is to go off to deltacloud any time we display IP and obtain the information there. It should work to define public_addresses and private_addresses in instance.rb, and insert calls to Deltacloud there. (As a slight optimization, we could write the values to the database here, and use those if the values were present, so we don't have to go out to the network each time.) This isn't ideal in the long-term, since it's fairly slow, but it's the best we're able to do on short notice.
Unfortunately I've got to head out for the night now, so I can't take care of this. I think the task will be slightly complicated by trying to find the right instance on the API-side, since vSphere instances (that will be reported over the API) have crazy UUID-derived names. I have a sneaking suspicion you'll also have to go out to Image Warehouse to get the image or build UUID, and then use that to find the matching instance in deltacloud. From there, though, it should be a simple matter of grabbing the IP, possibly updating it in the database, and then returning it. It'll just take queries to a couple of external services.
Is there any way someone not in UTC-5 could pick this up in the morning, to try to get a fix out? I'm told it's the last bug blocking us from releasing. And I'm afraid I've got to head out for the night.
-- Matt
Hi, here is a hotfix, I wasn't even able to test it on vsphere driver - getting "Create_Instance_Failure: Failed to perform transfer: Server returned nothing (no headers, no data)" message :/, so I cheated with script/console when testing.
Instance ID is fetched from condor's DeltacloudProviderId param - I suppose this ID should be unique even for vsphere.
Jan
From: Jan Provaznik jprovazn@redhat.com
This fixes problem with MAC address in public address attribute, if MAC address is set, we try fetch actual public_address directly through deltacloud api. To fetch instance directly I had to add provider_instance_id where provider side instance id is kept. --- src/app/models/instance.rb | 15 +++++++++++++++ .../20110726072809_add_provider_instance_id.rb | 9 +++++++++ src/dbomatic/dbomatic | 14 +++++++++++++- 3 files changed, 37 insertions(+), 1 deletions(-) create mode 100644 src/db/migrate/20110726072809_add_provider_instance_id.rb
diff --git a/src/app/models/instance.rb b/src/app/models/instance.rb index f16b79d..4700660 100644 --- a/src/app/models/instance.rb +++ b/src/app/models/instance.rb @@ -340,6 +340,21 @@ class Instance < ActiveRecord::Base [possibles, errors] end
+ def public_addresses + # FIXME: detect MAC format properly + addr = read_attribute(:public_addresses) + if addr and addr =~ /\w\w:\w\w:\w\w:\w\w:\w\w:\w\w/ + begin + client = provider_account.connect + self.public_addresses = client.instance(provider_instance_id).public_addresses.first + save! + rescue + logger.error "failed to fetch public address for #{self.name}: #{$!.message}" + end + end + read_attribute(:public_addresses) + end + named_scope :with_hardware_profile, lambda { {:include => :hardware_profile} } diff --git a/src/db/migrate/20110726072809_add_provider_instance_id.rb b/src/db/migrate/20110726072809_add_provider_instance_id.rb new file mode 100644 index 0000000..7b68cba --- /dev/null +++ b/src/db/migrate/20110726072809_add_provider_instance_id.rb @@ -0,0 +1,9 @@ +class AddProviderInstanceId < ActiveRecord::Migration + def self.up + add_column :instances, :provider_instance_id, :string + end + + def self.down + remove_column :instances, :provider_instance_id + end +end diff --git a/src/dbomatic/dbomatic b/src/dbomatic/dbomatic index 1c1a206..013be84 100755 --- a/src/dbomatic/dbomatic +++ b/src/dbomatic/dbomatic @@ -131,6 +131,8 @@ class CondorEventLog < Nokogiri::XML::SAX::Document @public_addresses = string elsif @tag == "DeltacloudPrivateNetworkAddresses" @private_addresses = string + elsif @tag == "DeltacloudProviderId" + @provider_instance_id = string end end end @@ -199,6 +201,15 @@ class CondorEventLog < Nokogiri::XML::SAX::Document @logger.info "update_instance_addresses completed for #{inst}" end
+ def update_provider_instance_id(inst) + @logger.info "update_provider_instance_id for #{inst}, \ + setting id: #{@provider_instance_id}" + + inst.provider_instance_id = @provider_instance_id + inst.save! + @logger.info "update_provider_instance_id completed for #{inst}" + end + # Create a new entry for events which we have all the necessary data for def end_element(element) begin @@ -211,9 +222,10 @@ class CondorEventLog < Nokogiri::XML::SAX::Document @logger.info "Instance #{inst} found, running update events" update_instance_state_event(inst) update_instance_addresses(inst) + update_provider_instance_id(inst) @logger.info "Instance #{inst} update events completed" end - @tag = @event_type = @event_cmd = @event_time = @trigger_type = @grid_resource = @execute_host = @hold_reason = @public_addresses = @private_addresses = nil + @tag = @event_type = @event_cmd = @event_time = @trigger_type = @grid_resource = @execute_host = @hold_reason = @public_addresses = @private_addresses = @provider_instance_id = nil end rescue Exception => e @logger.error "#{e.backtrace.shift}: #{e.message}"
On Tue, Jul 26, 2011 at 12:56:53PM +0200, jprovazn@redhat.com wrote:
From: Jan Provaznik jprovazn@redhat.com
This fixes problem with MAC address in public address attribute, if MAC address is set, we try fetch actual public_address directly through deltacloud api. To fetch instance directly I had to add provider_instance_id where provider side instance id is kept.
ACK! I just completed end-to-end testing here and the automated test suite returned with 100% pass.
-- Matt
On 07/26/2011 12:56 PM, jprovazn@redhat.com wrote:
Hi, here is a hotfix, I wasn't even able to test it on vsphere driver - getting "Create_Instance_Failure: Failed to perform transfer: Server returned nothing (no headers, no data)" message :/, so I cheated with script/console when testing.
Instance ID is fetched from condor's DeltacloudProviderId param - I suppose this ID should be unique even for vsphere.
Jan
Just to be clear - this patch requires proper testing as I wasn't able to reproduce the bug.
Jan
aeolus-devel@lists.fedorahosted.org