git: [patch aggregator] first impl. of adapter for condor backend.
Hey guys,
probably this is going to be very controversial patch. This patch should allow people to be able to use different job-backend if they chose to. I have refactored Condor backend to be easily subtituteable via Adapter pattern. The implementation is just begining, but if at least some of you will like it I will continue to provide at least one other job-backend (DelayedJobs). The intention isn't to replace Condor, but make it optional and not hardcoded.
If there will be no reply or dicussion I will drop this patch.
Thanks for your feedback.
From: Ladislav Martincik lmartinc@redhat.com
--- src/app/controllers/instances_controller.rb | 16 +- src/app/controllers/pools_controller.rb | 3 - src/app/util/condormatic.rb | 155 +------------------ src/config/environment.rb | 2 +- src/lib/deltacloud_api/base_adapter.rb | 23 +++ src/lib/deltacloud_api/condor_adapter.rb | 168 ++++++++++++++++++++ src/lib/deltacloud_api/deltacloud_api.rb | 31 ++++ src/spec/lib/deltacloud_api/deltacloud_api_spec.rb | 32 ++++ 8 files changed, 266 insertions(+), 164 deletions(-) create mode 100644 src/lib/deltacloud_api/base_adapter.rb create mode 100644 src/lib/deltacloud_api/condor_adapter.rb create mode 100644 src/lib/deltacloud_api/deltacloud_api.rb create mode 100644 src/spec/lib/deltacloud_api/deltacloud_api_spec.rb
diff --git a/src/app/controllers/instances_controller.rb b/src/app/controllers/instances_controller.rb index 3ffb287..9e2b193 100644 --- a/src/app/controllers/instances_controller.rb +++ b/src/app/controllers/instances_controller.rb @@ -19,8 +19,6 @@ # Filters added to this controller apply to all controllers in the application. # Likewise, all the methods added will be available for all controllers.
-require 'util/condormatic' - class InstancesController < ApplicationController before_filter :require_user, :get_nav_items before_filter :instance, :only => [:show, :key] @@ -111,7 +109,7 @@ class InstancesController < ApplicationController :task_target => @instance, :action => InstanceTask::ACTION_CREATE}) if @task.save - condormatic_instance_create(@task) + deltacloud_api.instance_create(@task) if Quota.can_start_instance?(@instance, nil) flash[:notice] = "Instance added." else @@ -167,11 +165,11 @@ class InstancesController < ApplicationController
case action when 'stop' - condormatic_instance_stop(@task) + deltacloud_api.instance_stop(@task) when 'destroy' - condormatic_instance_destroy(@task) + deltacloud_api.instance_destroy(@task) when 'start' - condormatic_instance_create(@task) + deltacloud_api.instance_create(@task) else raise ActionError.new("Sorry, action '#{action}' is currently not supported by condor backend.") end @@ -188,7 +186,7 @@ class InstancesController < ApplicationController action ='remove failed' raise ActionError.new("#{action} cannot be performed on this instance.") unless @instance.state == Instance::STATE_ERROR - condormatic_instance_reset_error(@instance) + deltacloud_api.instance_reset_error(@instance) action end
@@ -199,4 +197,8 @@ class InstancesController < ApplicationController require_privilege(Privilege::INSTANCE_VIEW, @instance.pool) end
+ def deltacloud_api + @deltacloud_api ||= DeltacloudAPI::Backend.new + end + end diff --git a/src/app/controllers/pools_controller.rb b/src/app/controllers/pools_controller.rb index 2ba6940..1c479b0 100644 --- a/src/app/controllers/pools_controller.rb +++ b/src/app/controllers/pools_controller.rb @@ -19,8 +19,6 @@ # Filters added to this controller apply to all controllers in the application. # Likewise, all the methods added will be available for all controllers.
-require 'util/condormatic' - class PoolsController < ApplicationController before_filter :require_user, :get_nav_items
@@ -152,5 +150,4 @@ class PoolsController < ApplicationController end redirect_to :action => 'show', :id => @pool.id end - kick_condor end diff --git a/src/app/util/condormatic.rb b/src/app/util/condormatic.rb index 37040cd..02b7adb 100644 --- a/src/app/util/condormatic.rb +++ b/src/app/util/condormatic.rb @@ -20,143 +20,13 @@ require 'nokogiri' require 'socket'
-def escape(str) +module CondormaticHelper + def escape(str) str = str.gsub('\', '\\') str = str.gsub(' ', '\ ') -end - -def condormatic_instance_create(task) - - begin - instance = task.instance - realm = instance.realm rescue nil - - job_name = "job_#{instance.name}_#{instance.id}" - - instance.condor_job_id = job_name - instance.save! - - # I use the 2>&1 to get stderr and stdout together because popen3 does not support - # the ability to get the exit value of the command in ruby 1.8. - pipe = IO.popen("condor_submit 2>&1", "w+") - pipe.puts "universe = grid\n" - Rails.logger.info "universe = grid\n" - pipe.puts "executable = #{job_name}\n" - Rails.logger.info "executable = #{job_name}\n" - - resource = "grid_resource = dcloud $$(provider_url) $$(username) $$(password) $$(image_key) #{escape(instance.name)}" - if realm != nil - resource += " $$(realm_key)" - else - resource += " NULL" - end - resource += " $$(hardwareprofile_key) $$(keypair)\n" - - pipe.puts resource - Rails.logger.info resource - - requirements = "requirements = hardwareprofile == "#{instance.hardware_profile.id}" && image == "#{instance.template.id}"" - requirements += " && realm == "#{realm.id}"" if realm != nil - # We may need to add some stuff to the provider classads like pool id, provider id etc. This is mostly just - # to test and make sure this works for now. - requirements += " && deltacloud_quota_check("#{job_name}", other.cloud_account_id)" - requirements += "\n" - - pipe.puts requirements - Rails.logger.info requirements - - pipe.puts "notification = never\n" - Rails.logger.info "notification = never\n" - pipe.puts "queue\n" - Rails.logger.info "queue\n" - pipe.close_write - out = pipe.read - pipe.close - - Rails.logger.info "$? (return value?) is #{$?}" - raise ("Error calling condor_submit: #{out}") if $? != 0 - - rescue Exception => ex - task.state = Task::STATE_FAILED - Rails.logger.error ex.message - Rails.logger.error ex.backtrace - else - # FIXME: We're kinda lying here.. we don't know the state for the task but I don't think that matters so much - # as we are just going to use the 'task' table as a kind of audit log. - task.state = Task::STATE_PENDING - end - task.instance.save! -end - -# JobStatus for condor jobs: -# -# 0 Unexpanded U -# 1 Idle I -# 2 Running R -# 3 Removed X -# 4 Completed C -# 5 Held H -# 6 Submission_err E -# - -def condor_to_instance_state(state_val) - case state_val - when '0' - return Instance::STATE_PENDING - when '1' - return Instance::STATE_PENDING - when '2' - return Instance::STATE_RUNNING - when '3' - return Instance::STATE_STOPPED - when '4' - return Instance::STATE_STOPPED - when '5' - return Instance::STATE_ERROR - when '6' - return Instance::STATE_CREATE_FAILED - else - return Instance::STATE_PENDING end end
-def condormatic_instance_stop(task) - instance = task.instance_of?(InstanceTask) ? task.instance : task - - Rails.logger.info("calling condor_rm -constraint 'Cmd == "#{instance.condor_job_id}"' 2>&1") - pipe = IO.popen("condor_rm -constraint 'Cmd == "#{instance.condor_job_id}"' 2>&1") - out = pipe.read - pipe.close - - Rails.logger.info("condor_rm return status is #{$?}") - Rails.logger.error("Error calling condor_rm (exit code #{$?}) on job: #{out}") if $? != 0 -end - -def condormatic_instance_reset_error(instance) - - condormatic_instance_stop(instance) - Rails.logger.info("calling condor_rm -forcex -constraint 'Cmd == "#{instance.condor_job_id}"' 2>&1") - pipe = IO.popen("condor_rm -forcex -constraint 'Cmd == "#{instance.condor_job_id}"' 2>&1") - out = pipe.read - pipe.close - - Rails.logger.info("condor_rm return status is #{$?}") - Rails.logger.error("Error calling condor_rm (exit code #{$?}) on job: #{out}") if $? != 0 -end - -def condormatic_instance_destroy(task) - instance = task.instance - - Rails.logger.info("calling condor_rm -constraint 'Cmd == "#{instance.condor_job_id}"' 2>&1") - pipe = IO.popen("condor_rm -constraint 'Cmd == "#{instance.condor_job_id}"' 2>&1") - out = pipe.read - pipe.close - - Rails.logger.info("condor_rm return status is #{$?}") - Rails.logger.error("Error calling condor_rm (exit code #{$?}) on job: #{out}") if $? != 0 -end - - def condormatic_classads_sync Rails.logger.info "Starting condormatic_classads_sync..." index = 0 @@ -244,7 +114,6 @@ def condormatic_classads_sync pipe.puts "cloud_account_id="#{account.id}"" pipe.puts "keypair="#{escape(account.instance_key.name)}"" pipe.close_write - out = pipe.read pipe.close
@@ -254,23 +123,3 @@ def condormatic_classads_sync } Rails.logger.info "done" end - -def kick_condor - begin - socket = Socket.new(Socket::AF_INET, Socket::SOCK_DGRAM, 0) - in_addr = Socket.pack_sockaddr_in(7890, 'localhost') - socket.connect(in_addr) - socket.write("kick") - socket.close - rescue - # if any of the above failed, it's possible that the condor_refreshd - # daemon is not running. This is especially useful when running the - # spec tests, since you don't necessarily want condor running in that - # circumstance. - # FIXME: there are a couple of problems with ignoring errors here. The - # first is that if this does actually fail, then it's unclear when in the - # future we will update the classads next. The second problem is that - # if condor_refreshd died for some reason, but condor itself is running, - # condor could be running with stale data. - end -end diff --git a/src/config/environment.rb b/src/config/environment.rb index 7d8ba2c..c3cad8b 100644 --- a/src/config/environment.rb +++ b/src/config/environment.rb @@ -51,7 +51,7 @@ Rails::Initializer.run do |config| config.gem "compass-960-plugin", :lib => "ninesixty" config.gem "simple-navigation" config.gem "typhoeus" - config.gem "rb-inotify" +# config.gem "rb-inotify"
config.active_record.observers = :instance_observer, :task_observer # Only load the plugins named here, in the order given. By default, all plugins diff --git a/src/lib/deltacloud_api/base_adapter.rb b/src/lib/deltacloud_api/base_adapter.rb new file mode 100644 index 0000000..527843a --- /dev/null +++ b/src/lib/deltacloud_api/base_adapter.rb @@ -0,0 +1,23 @@ +module DeltacloudAPI + + class NotImplemented < Exception; end + + class BaseAdapter + include Singleton + + # Basic API methods every Adapter should implement + def instance_create(*args) + raise NotImplemented.new + end + + def instance_stop(*args) + raise NotImplemented.new + end + + def instance_destroy(*args) + raise NotImplemented.new + end + + end + +end diff --git a/src/lib/deltacloud_api/condor_adapter.rb b/src/lib/deltacloud_api/condor_adapter.rb new file mode 100644 index 0000000..ab04afe --- /dev/null +++ b/src/lib/deltacloud_api/condor_adapter.rb @@ -0,0 +1,168 @@ +module DeltacloudAPI + class CondorAdapter < BaseAdapter + + def initialize + start + end + + def instance_create(task) + begin + instance = task.instance + realm = instance.realm rescue nil + + job_name = "job_#{instance.name}_#{instance.id}" + + instance.condor_job_id = job_name + instance.save! + + # I use the 2>&1 to get stderr and stdout together because popen3 does not support + # the ability to get the exit value of the command in ruby 1.8. + pipe = IO.popen("condor_submit 2>&1", "w+") + pipe.puts "universe = grid\n" + Rails.logger.info "universe = grid\n" + pipe.puts "executable = #{job_name}\n" + Rails.logger.info "executable = #{job_name}\n" + + resource = "grid_resource = dcloud $$(provider_url) $$(username) $$(password) $$(image_key) #{escape(instance.name)}" + if realm != nil + resource += " $$(realm_key)" + else + resource += " NULL" + end + resource += " $$(hardwareprofile_key) $$(keypair)\n" + + pipe.puts resource + Rails.logger.info resource + + requirements = "requirements = hardwareprofile == "#{instance.hardware_profile.id}" && image == "#{instance.template.id}"" + requirements += " && realm == "#{realm.id}"" if realm != nil + # We may need to add some stuff to the provider classads like pool id, provider id etc. This is mostly just + # to test and make sure this works for now. + requirements += " && deltacloud_quota_check("#{job_name}", other.cloud_account_id)" + requirements += "\n" + + pipe.puts requirements + Rails.logger.info requirements + + pipe.puts "notification = never\n" + Rails.logger.info "notification = never\n" + pipe.puts "queue\n" + Rails.logger.info "queue\n" + pipe.close_write + out = pipe.read + pipe.close + + Rails.logger.info "$? (return value?) is #{$?}" + raise ("Error calling condor_submit: #{out}") if $? != 0 + + rescue Exception => ex + task.state = Task::STATE_FAILED + Rails.logger.error ex.message + Rails.logger.error ex.backtrace + else + # FIXME: We're kinda lying here.. we don't know the state for the task but I don't think that matters so much + # as we are just going to use the 'task' table as a kind of audit log. + task.state = Task::STATE_PENDING + end + task.instance.save! + end + + def instance_stop(task) + instance = task.instance_of?(InstanceTask) ? task.instance : task + + Rails.logger.info("calling condor_rm -constraint 'Cmd == "#{instance.condor_job_id}"' 2>&1") + pipe = IO.popen("condor_rm -constraint 'Cmd == "#{instance.condor_job_id}"' 2>&1") + out = pipe.read + pipe.close + + Rails.logger.info("condor_rm return status is #{$?}") + Rails.logger.error("Error calling condor_rm (exit code #{$?}) on job: #{out}") if $? != 0 + end + + def instance_reset_error(instance) + instance_stop(instance) + Rails.logger.info("calling condor_rm -forcex -constraint 'Cmd == "#{instance.condor_job_id}"' 2>&1") + pipe = IO.popen("condor_rm -forcex -constraint 'Cmd == "#{instance.condor_job_id}"' 2>&1") + out = pipe.read + pipe.close + + Rails.logger.info("condor_rm return status is #{$?}") + Rails.logger.error("Error calling condor_rm (exit code #{$?}) on job: #{out}") if $? != 0 + end + + def instance_destroy(task) + instance = task.instance + + Rails.logger.info("calling condor_rm -constraint 'Cmd == "#{instance.condor_job_id}"' 2>&1") + pipe = IO.popen("condor_rm -constraint 'Cmd == "#{instance.condor_job_id}"' 2>&1") + out = pipe.read + pipe.close + + Rails.logger.info("condor_rm return status is #{$?}") + Rails.logger.error("Error calling condor_rm (exit code #{$?}) on job: #{out}") if $? != 0 + end + + private + + def start + require 'socket' + begin + socket = Socket.new(Socket::AF_INET, Socket::SOCK_DGRAM, 0) + in_addr = Socket.pack_sockaddr_in(7890, 'localhost') + socket.connect(in_addr) + socket.write("kick") + socket.close + rescue + # if any of the above failed, it's possible that the condor_refreshd + # daemon is not running. This is especially useful when running the + # spec tests, since you don't necessarily want condor running in that + # circumstance. + # FIXME: there are a couple of problems with ignoring errors here. The + # first is that if this does actually fail, then it's unclear when in the + # future we will update the classads next. The second problem is that + # if condor_refreshd died for some reason, but condor itself is running, + # condor could be running with stale data. + end + end + + private + + # JobStatus for condor jobs: + # + # 0 Unexpanded U + # 1 Idle I + # 2 Running R + # 3 Removed X + # 4 Completed C + # 5 Held H + # 6 Submission_err E + # + + def condor_to_instance_state(state_val) + case state_val + when '0' + return Instance::STATE_PENDING + when '1' + return Instance::STATE_PENDING + when '2' + return Instance::STATE_RUNNING + when '3' + return Instance::STATE_STOPPED + when '4' + return Instance::STATE_STOPPED + when '5' + return Instance::STATE_ERROR + when '6' + return Instance::STATE_CREATE_FAILED + else + return Instance::STATE_PENDING + end + end + + def escape(str) + str = str.gsub('\', '\\') + str = str.gsub(' ', '\ ') + end + + end +end diff --git a/src/lib/deltacloud_api/deltacloud_api.rb b/src/lib/deltacloud_api/deltacloud_api.rb new file mode 100644 index 0000000..58d55b2 --- /dev/null +++ b/src/lib/deltacloud_api/deltacloud_api.rb @@ -0,0 +1,31 @@ +require 'singleton' + +module DeltacloudAPI + + mattr_accessor :adapter + self.adapter = :condor + + class Backend + attr_reader :adapter + + def initialize(adapter = nil) + adapter = "#{adapter || DeltacloudAPI.adapter}_adapter" + adapter_clazz = "DeltacloudAPI::#{adapter.camelize}" + + # All adapters should behave as Singleton object + @adapter = adapter_clazz.constantize.instance + rescue NameError + # Try to require it before exiting with error + require File.join(Rails.root, 'lib', 'deltacloud_api', adapter) + @adapter = adapter_clazz.constantize.instance + end + + # For now we pass everything to adapter + # TODO: Should be restricted API based on BaseAdapter + def method_missing(sym, *args, &block) + @adapter.__send__(sym, *args, &block) + end + + end + +end diff --git a/src/spec/lib/deltacloud_api/deltacloud_api_spec.rb b/src/spec/lib/deltacloud_api/deltacloud_api_spec.rb new file mode 100644 index 0000000..3a5e8c0 --- /dev/null +++ b/src/spec/lib/deltacloud_api/deltacloud_api_spec.rb @@ -0,0 +1,32 @@ +require 'spec_helper' +require 'deltacloud_api/deltacloud_api' + +describe DeltacloudAPI::Backend do + after do + DeltacloudAPI.adapter = :condor + end + + it "correctly loads default backend adapter" do + backend = DeltacloudAPI::Backend.new + backend.adapter.should be_a_kind_of DeltacloudAPI::CondorAdapter + end + + it "correctly loads non default backend adapter" do + class ::DeltacloudAPI::LocalMockAdapter < ::DeltacloudAPI::BaseAdapter; end + DeltacloudAPI.adapter = :local_mock + backend = DeltacloudAPI::Backend.new + backend.adapter.should be_a_kind_of ::DeltacloudAPI::LocalMockAdapter + end + + it "throws exceptions if adapter not present" do + DeltacloudAPI.adapter = :nonsense_1234 + lambda do + backend = DeltacloudAPI::Backend.new + end.should raise_error MissingSourceFile + end + + it "forwards all non defined calls to adapter" do + backend = DeltacloudAPI::Backend.new + end + +end
On 12/03/10 - 12:53:59PM, lmartinc@redhat.com wrote:
git: [patch aggregator] first impl. of adapter for condor backend.
Hey guys,
probably this is going to be very controversial patch. This patch should allow people to be able to use different job-backend if they chose to. I have refactored Condor backend to be easily subtituteable via Adapter pattern. The implementation is just begining, but if at least some of you will like it I will continue to provide at least one other job-backend (DelayedJobs). The intention isn't to replace Condor, but make it optional and not hardcoded.
If there will be no reply or dicussion I will drop this patch.
I guess the obvious question is: why? We have condor patches and packages, and it is pretty easy to install. More people testing it out is better, too, rather than fracturing our testing into a production and non-production test.
On Fri, 2010-12-03 at 12:53 +0100, lmartinc@redhat.com wrote:
git: [patch aggregator] first impl. of adapter for condor backend.
Hey guys,
probably this is going to be very controversial patch. This patch should allow people to be able to use different job-backend if they chose to. I have refactored Condor backend to be easily subtituteable via Adapter pattern. The implementation is just begining, but if at least some of you will like it I will continue to provide at least one other job-backend (DelayedJobs). The intention isn't to replace Condor, but make it optional and not hardcoded.
If there will be no reply or dicussion I will drop this patch.
Thanks for your feedback.
You're right, it is controversial :)
I really hope we do not have alternative backends any time soon. There's only a handful of people who test/use the aggregator end to end. If some of these people decide it would be easier to not use condor, it's just that many fewer people testing critical production code paths.
I really cannot stress that enough. It's all about testing.. it needs testing, lots of it, and this just enables it to be tested less.
If there is something difficult about the installation we need to address that (and we are addressing some aspects of it).
To play devils advocate, I could easily create something to replace the web UI as well, but I don't, because that's a huge part of the product and it too needs testing and I hope everyone uses/tests it. You see what I'm saying?
Maybe once the whole thing is more mature this would be a good idea, but for now I don't think it's appropriate.
Ian
On Tue, 2010-12-07 at 09:39 -0800, Ian Main wrote:
On Fri, 2010-12-03 at 12:53 +0100, lmartinc@redhat.com wrote:
git: [patch aggregator] first impl. of adapter for condor backend.
Hey guys,
probably this is going to be very controversial patch. This patch should allow people to be able to use different job-backend if they chose to. I have refactored Condor backend to be easily subtituteable via Adapter pattern. The implementation is just begining, but if at least some of you will like it I will continue to provide at least one other job-backend (DelayedJobs). The intention isn't to replace Condor, but make it optional and not hardcoded.
And with that I'm sorry but I feel this should be NACK'd.
Ian
Ahh ahh, I'm going home crying. :)
Thank you guys for feedback, very appreciated.
-- Ladislav
On Dec 7, 2010, at 7:06 PM, Ian Main wrote:
On Tue, 2010-12-07 at 09:39 -0800, Ian Main wrote:
On Fri, 2010-12-03 at 12:53 +0100, lmartinc@redhat.com wrote:
git: [patch aggregator] first impl. of adapter for condor backend.
Hey guys,
probably this is going to be very controversial patch. This patch should allow people to be able to use different job-backend if they chose to. I have refactored Condor backend to be easily subtituteable via Adapter pattern. The implementation is just begining, but if at least some of you will like it I will continue to provide at least one other job-backend (DelayedJobs). The intention isn't to replace Condor, but make it optional and not hardcoded.
And with that I'm sorry but I feel this should be NACK'd.
Ian
On 12/07/2010 12:39 PM, Ian Main wrote:
On Fri, 2010-12-03 at 12:53 +0100, lmartinc@redhat.com wrote:
git: [patch aggregator] first impl. of adapter for condor backend.
Hey guys,
probably this is going to be very controversial patch. This patch should allow people to be able to use different job-backend if they chose to. I have refactored Condor backend to be easily subtituteable via Adapter pattern. The implementation is just begining, but if at least some of you will like it I will continue to provide at least one other job-backend (DelayedJobs). The intention isn't to replace Condor, but make it optional and not hardcoded.
If there will be no reply or dicussion I will drop this patch.
Thanks for your feedback.
You're right, it is controversial :)
I really hope we do not have alternative backends any time soon. There's only a handful of people who test/use the aggregator end to end. If some of these people decide it would be easier to not use condor, it's just that many fewer people testing critical production code paths.
I really cannot stress that enough. It's all about testing.. it needs testing, lots of it, and this just enables it to be tested less.
I think the counter argument to this would be that if we support pluggable components at multiple levels in our application we can push deltacloud as being more flexible and more able to fit into a custom infrastructure. This would drive more adoption which would drive more testing of deltacloud in general.
We can still only support the one official scheduling backend, through our docs and maintenance infrastructure, but should a customer have some custom scheduling / job component already in place, we can also say that things would work out of the box as is without needing to hack deltacloud (of course making sure that they understand that support of their custom components and interaction with them is up to them).
If there is something difficult about the installation we need to address that (and we are addressing some aspects of it).
To play devils advocate, I could easily create something to replace the web UI as well, but I don't, because that's a huge part of the product and it too needs testing and I hope everyone uses/tests it. You see what I'm saying?
The thing with this is that the deltacloud codebase is able to support custom web ui's, we just don't support alternatives as a team. Likewise if an alternative scheduling system were to be developed, great, but we wouldn't be responsible for any problems that arise from using it. We're not talking about creating the alternative ourselves, just an easy means which to be able to use an alternative in hopes that it'll drive adoption of the whole system in general (and most likely if an end user wishes to develop and test and alternative they would start by setting up and testing condor itself first).
Maybe once the whole thing is more mature this would be a good idea, but for now I don't think it's appropriate.
Ian
Honestly doesn't matter to me either way, just like playing devil's advocate myself ;-)
-Mo
deltacloud-devel@lists.fedorahosted.org