High Availability Feature, looking for information

List overview All Threads
Download

newer

older

Instance key generation (rev. 2)

Iteration 4 Feature: Authorization...

Jason Guiditta

27 Jul 2011 27 Jul '11

3:26 p.m.

Hello list (and Steve), I have been tasked with looking into the following nebulous feature item for consideration of being included in Iteration 4: '* HA configuration available?'

This is from an 'infrastructure around aeolus' perspective, if that helps context at all.

There was brief discussion of this in an extremely broad way on a recent call, and given my lack of involvement with anything HA, I find myself with absolutely no idea how to describe what might need doing here, let alone how it might be done. I will be looking through the list archives to see what has been discussed, but all the threads I recall seeing were very long and detailed, and I would rather get a synopsis if someone can summarize the current thinking in this area. Also, any links to specific threads or documentation/designs would be much appreciated. Thanks for any direction here,

-j

Show replies by date

Hugh O. Brock

27 Jul 27 Jul

3:43 p.m.

On Wed, Jul 27, 2011 at 10:26:08AM -0400, Jason Guiditta wrote:

...

Hello list (and Steve), I have been tasked with looking into the following nebulous feature item for consideration of being included in Iteration 4: '* HA configuration available?'

This is from an 'infrastructure around aeolus' perspective, if that helps context at all.

There was brief discussion of this in an extremely broad way on a recent call, and given my lack of involvement with anything HA, I find myself with absolutely no idea how to describe what might need doing here, let alone how it might be done. I will be looking through the list archives to see what has been discussed, but all the threads I recall seeing were very long and detailed, and I would rather get a synopsis if someone can summarize the current thinking in this area. Also, any links to specific threads or documentation/designs would be much appreciated. Thanks for any direction here,

Sure.

One of the requirements the SAs brought up is that it should be possible to make Conductor and its attendant pieces highly available. I'm not sure this is something we need to actually implement for release 0.4.0, but it would be nice to at least have a design for it by then. At a minimum we would need the entire app to be able to run in failover mode on two machines. We don't have to provide the infrastructure to deal with the failover, but the app needs to handle being failed over gracefully without data loss or (much) service interruption.

Does that help at all?

--Hugh

-- == Hugh Brock, hbrock@redhat.com == == Engineering Manager, Cloud BU == == Aeolus Project: Manage virtual infrastructure across clouds. == == http://aeolusproject.org == "I know that you believe you understand what you think I said, but I’m not sure you realize that what you heard is not what I meant." --Robert McCloskey

Jason Guiditta

4:05 p.m.

On Wed, 2011-07-27 at 10:43 -0400, Hugh Brock wrote:

...

On Wed, Jul 27, 2011 at 10:26:08AM -0400, Jason Guiditta wrote:

...
Hello list (and Steve), I have been tasked with looking into the following nebulous feature item for consideration of being included in Iteration 4: '* HA configuration available?'

This is from an 'infrastructure around aeolus' perspective, if that helps context at all.

There was brief discussion of this in an extremely broad way on a recent call, and given my lack of involvement with anything HA, I find myself with absolutely no idea how to describe what might need doing here, let alone how it might be done. I will be looking through the list archives to see what has been discussed, but all the threads I recall seeing were very long and detailed, and I would rather get a synopsis if someone can summarize the current thinking in this area. Also, any links to specific threads or documentation/designs would be much appreciated. Thanks for any direction here,

Sure.

One of the requirements the SAs brought up is that it should be possible to make Conductor and its attendant pieces highly available. I'm not sure this is something we need to actually implement for release 0.4.0, but it would be nice to at least have a design for it by then. At a minimum we would need the entire app to be able to run in failover mode on two machines. We don't have to provide the infrastructure to deal with the failover, but the app needs to handle being failed over gracefully without data loss or (much) service interruption.

Does that help at all?

--Hugh

If that is really the gist, then yes, it is a start, thanks. Sounds like a feature to see how to best setup rails/$backing-db to be clustered, and possibly separate tasks to research same some some of the other components (iwhd seems a likely candidate). However, I have this feeling there is much more to it than that. Perhaps this task/feature is a combination of documenting how to do some of these thing, and writing up what what does and does not currently exist for this kind of scenario? Also, probably something around 'what is going to be the aeolus story for HA, and what does it mean?'.

-j

Hugh O. Brock

4:12 p.m.

On Wed, Jul 27, 2011 at 11:05:28AM -0400, Jason Guiditta wrote:

...

On Wed, 2011-07-27 at 10:43 -0400, Hugh Brock wrote:

...
On Wed, Jul 27, 2011 at 10:26:08AM -0400, Jason Guiditta wrote:

...
Hello list (and Steve), I have been tasked with looking into the following nebulous feature item for consideration of being included in Iteration 4: '* HA configuration available?'

This is from an 'infrastructure around aeolus' perspective, if that helps context at all.

There was brief discussion of this in an extremely broad way on a recent call, and given my lack of involvement with anything HA, I find myself with absolutely no idea how to describe what might need doing here, let alone how it might be done. I will be looking through the list archives to see what has been discussed, but all the threads I recall seeing were very long and detailed, and I would rather get a synopsis if someone can summarize the current thinking in this area. Also, any links to specific threads or documentation/designs would be much appreciated. Thanks for any direction here,

Sure.

One of the requirements the SAs brought up is that it should be possible to make Conductor and its attendant pieces highly available. I'm not sure this is something we need to actually implement for release 0.4.0, but it would be nice to at least have a design for it by then. At a minimum we would need the entire app to be able to run in failover mode on two machines. We don't have to provide the infrastructure to deal with the failover, but the app needs to handle being failed over gracefully without data loss or (much) service interruption.

Does that help at all?

--Hugh

If that is really the gist, then yes, it is a start, thanks. Sounds like a feature to see how to best setup rails/$backing-db to be clustered, and possibly separate tasks to research same some some of the other components (iwhd seems a likely candidate). However, I have this feeling there is much more to it than that. Perhaps this task/feature is a combination of documenting how to do some of these thing, and writing up what what does and does not currently exist for this kind of scenario? Also, probably something around 'what is going to be the aeolus story for HA, and what does it mean?'.

If all we got done was that, I'd be delighted.

--H

Chris Alfonso

4:38 p.m.

On Jul 27, 2011, at 11:14 AM, "Hugh Brock" hbrock@redhat.com wrote:

...

On Wed, Jul 27, 2011 at 11:05:28AM -0400, Jason Guiditta wrote:

...
On Wed, 2011-07-27 at 10:43 -0400, Hugh Brock wrote:

...
On Wed, Jul 27, 2011 at 10:26:08AM -0400, Jason Guiditta wrote:

...
Hello list (and Steve), I have been tasked with looking into the following nebulous feature item for consideration of being included in Iteration 4: '* HA configuration available?'

This is from an 'infrastructure around aeolus' perspective, if that helps context at all.

There was brief discussion of this in an extremely broad way on a recent call, and given my lack of involvement with anything HA, I find myself with absolutely no idea how to describe what might need doing here, let alone how it might be done. I will be looking through the list archives to see what has been discussed, but all the threads I recall seeing were very long and detailed, and I would rather get a synopsis if someone can summarize the current thinking in this area. Also, any links to specific threads or documentation/designs would be much appreciated. Thanks for any direction here,

Sure.

One of the requirements the SAs brought up is that it should be possible to make Conductor and its attendant pieces highly available. I'm not sure this is something we need to actually implement for release 0.4.0, but it would be nice to at least have a design for it by then. At a minimum we would need the entire app to be able to run in failover mode on two machines. We don't have to provide the infrastructure to deal with the failover, but the app needs to handle being failed over gracefully without data loss or (much) service interruption.

Does that help at all?

--Hugh

If that is really the gist, then yes, it is a start, thanks. Sounds like a feature to see how to best setup rails/$backing-db to be clustered, and possibly separate tasks to research same some some of the other components (iwhd seems a likely candidate). However, I have this feeling there is much more to it than that. Perhaps this task/feature is a combination of documenting how to do some of these thing, and writing up what what does and does not currently exist for this kind of scenario? Also, probably something around 'what is going to be the aeolus story for HA, and what does it mean?'.

If all we got done was that, I'd be delighted.

--H

-- == Hugh Brock, hbrock@redhat.com == == Engineering Manager, Cloud BU == == Aeolus Project: Manage virtual infrastructure across clouds. == == http://aeolusproject.org ==

"I know that you believe you understand what you think I said, but I’m not sure you realize that what you heard is not what I meant." --Robert McCloskey _______________________________________________ aeolus-devel mailing list aeolus-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/aeolus-devel

If there is interest in a easy way to implement an HA database configuration and you don't mind pushing users to a MySQL based solution, I'll send out the m2s and m2m configs and describe how we're flipping back and forth between nodes. access.redhat.com/management is using master to master, and it's working out quite well. Let me know.

Chris

Jason Guiditta

4:51 p.m.

On Wed, 2011-07-27 at 11:38 -0400, Christopher Alfonso wrote:

...

On Jul 27, 2011, at 11:14 AM, "Hugh Brock" hbrock@redhat.com wrote:

...
On Wed, Jul 27, 2011 at 11:05:28AM -0400, Jason Guiditta wrote:

...
On Wed, 2011-07-27 at 10:43 -0400, Hugh Brock wrote:

...
On Wed, Jul 27, 2011 at 10:26:08AM -0400, Jason Guiditta wrote:

...
Hello list (and Steve), I have been tasked with looking into the following nebulous feature item for consideration of being included in Iteration 4: '* HA configuration available?'

This is from an 'infrastructure around aeolus' perspective, if that helps context at all.

There was brief discussion of this in an extremely broad way on a recent call, and given my lack of involvement with anything HA, I find myself with absolutely no idea how to describe what might need doing here, let alone how it might be done. I will be looking through the list archives to see what has been discussed, but all the threads I recall seeing were very long and detailed, and I would rather get a synopsis if someone can summarize the current thinking in this area. Also, any links to specific threads or documentation/designs would be much appreciated. Thanks for any direction here,

Sure.

One of the requirements the SAs brought up is that it should be possible to make Conductor and its attendant pieces highly available. I'm not sure this is something we need to actually implement for release 0.4.0, but it would be nice to at least have a design for it by then. At a minimum we would need the entire app to be able to run in failover mode on two machines. We don't have to provide the infrastructure to deal with the failover, but the app needs to handle being failed over gracefully without data loss or (much) service interruption.

Does that help at all?

--Hugh

If that is really the gist, then yes, it is a start, thanks. Sounds like a feature to see how to best setup rails/$backing-db to be clustered, and possibly separate tasks to research same some some of the other components (iwhd seems a likely candidate). However, I have this feeling there is much more to it than that. Perhaps this task/feature is a combination of documenting how to do some of these thing, and writing up what what does and does not currently exist for this kind of scenario? Also, probably something around 'what is going to be the aeolus story for HA, and what does it mean?'.

If all we got done was that, I'd be delighted.

--H

-- == Hugh Brock, hbrock@redhat.com == == Engineering Manager, Cloud BU == == Aeolus Project: Manage virtual infrastructure across clouds. == == http://aeolusproject.org ==

"I know that you believe you understand what you think I said, but I’m not sure you realize that what you heard is not what I meant." --Robert McCloskey _______________________________________________ aeolus-devel mailing list aeolus-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/aeolus-devel

If there is interest in a easy way to implement an HA database configuration and you don't mind pushing users to a MySQL based solution, I'll send out the m2s and m2m configs and describe how we're flipping back and forth between nodes. access.redhat.com/management is using master to master, and it's working out quite well. Let me know.

Chris

Even though our official supported db is postgres, any HA docs of any kind would be much appreciated - we may make them optional upstream info on our wiki or something if you are not opposed.

Chris Alfonso

6:51 p.m.

On 07/27/2011 11:51 AM, Jason Guiditta wrote:

...

On Wed, 2011-07-27 at 11:38 -0400, Christopher Alfonso wrote:

...
On Jul 27, 2011, at 11:14 AM, "Hugh Brock" hbrock@redhat.com wrote:

...
On Wed, Jul 27, 2011 at 11:05:28AM -0400, Jason Guiditta wrote:

...
On Wed, 2011-07-27 at 10:43 -0400, Hugh Brock wrote:

...
On Wed, Jul 27, 2011 at 10:26:08AM -0400, Jason Guiditta wrote:

...
Hello list (and Steve), I have been tasked with looking into the following nebulous feature item for consideration of being included in Iteration 4: '* HA configuration available?'

This is from an 'infrastructure around aeolus' perspective, if that helps context at all.

There was brief discussion of this in an extremely broad way on a

recent

...

...
...
...
...
...
call, and given my lack of involvement with anything HA, I find myself with absolutely no idea how to describe what might need doing

here, let

...

...
...
...
...
...
alone how it might be done. I will be looking through the list

archives

...

...
...
...
...
...
to see what has been discussed, but all the threads I recall

seeing were

...

...
...
...
...
...
very long and detailed, and I would rather get a synopsis if

someone can

...

...
...
...
...
...
summarize the current thinking in this area. Also, any links to

specific

...

...
...
...
...
...
threads or documentation/designs would be much appreciated. Thanks for any direction here,

Sure.

One of the requirements the SAs brought up is that it should be possible to make Conductor and its attendant pieces highly available. I'm not sure this is something we need to actually implement for release 0.4.0, but it would be nice to at least have a design for it by then. At a minimum we would need the entire app to be able to run in failover mode on two machines. We don't have to provide the infrastructure to deal with the failover, but the app needs to handle being failed over gracefully without data loss or (much) service interruption.

Does that help at all?

--Hugh

If that is really the gist, then yes, it is a start, thanks. Sounds like a feature to see how to best setup rails/$backing-db to be clustered, and possibly separate tasks to research same some some of the other components (iwhd seems a likely candidate). However, I have this feeling there is much more to it than that. Perhaps this task/feature is a combination of documenting how to do some of these thing, and writing up what what does and does not currently exist for this kind of scenario? Also, probably something around 'what is going to be the aeolus story for HA, and what does it mean?'.

If all we got done was that, I'd be delighted.

--H

-- == Hugh Brock, hbrock@redhat.com == == Engineering Manager, Cloud BU == == Aeolus Project: Manage virtual infrastructure across clouds. == == http://aeolusproject.org ==

"I know that you believe you understand what you think I said, but I’m not sure you realize that what you heard is not what I meant." --Robert McCloskey _______________________________________________ aeolus-devel mailing list aeolus-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/aeolus-devel

If there is interest in a easy way to implement an HA database configuration and you don't mind pushing users to a MySQL based solution, I'll send out the m2s and m2m configs and describe how we're flipping back and forth between nodes. access.redhat.com/management is using master to master, and it's working out quite well. Let me know.

Chris

Even though our official supported db is postgres, any HA docs of any kind would be much appreciated - we may make them optional upstream info on our wiki or something if you are not opposed.

aeolus-devel mailing list aeolus-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/aeolus-devel

If you're familiar with authoring/maintaining puppet modules you can jump straight to the referenced docs [1,2] and drill down through the puppet manifests. If not, I'll describe what's going on here, and probably what you would really be interested in.

Keep in mind, for the candlepin databases that sit behind access.redhat.com/management we have a VIP sitting in front of two MySQL m2m configured databases. If there is a connectivity problem detected at the database VIP at runtime, connections from the application server switch from the active master to the standby master. If no VIP sits between the application server and the multiple m2m MySQL nodes, it's a bit more challenging to automatically fail over (you get to implement the retry yourself!).

The candlepin databases in production are set up using the following yaml files (which just reference which puppet classes to pull in, the class you're interested in is candlepin:db). That module is defined in http://git.corp.redhat.com/cgit/puppet-cfg/modules/candlepin/tree/manifests/.... You can follow the references to the puppet classes that are invoked, but the important thing to take away from this is that each MySQL node is setup as a master and a slave, such that node A is a slave to node B, and node B is a slave to node A (hence the master to master replication).

Node 1: http://git.corp.redhat.com/cgit/puppet-cfg/manifests/tree/nodes/db02.candlep... Node 2: http://git.corp.redhat.com/cgit/puppet-cfg/manifests/tree/nodes/db01.candlep...

There is a bit of automation in place and custom puppet types in the libmysql module, but at the end of the day, the nodes are configured with http://git.corp.redhat.com/cgit/puppet-cfg/modules/libmysql/tree/templates/c... and http://git.corp.redhat.com/cgit/puppet-cfg/modules/libmysql/tree/templates/c... (values filled in from the puppet manifests).

Once the configuration files are laid down and mysqld is running, running the CHANGE MASTER TO[3] syntax will 'wire' the nodes together.

References: [1] https://docspace.corp.redhat.com/docs/DOC-59614 [2] http://git.corp.redhat.com/cgit/puppet-cfg/modules/libmysql/ [3] http://dev.mysql.com/doc/refman/5.0/en/change-master-to.html

Thanks!

Chris

Jason Guiditta

9 p.m.

On Wed, 2011-07-27 at 13:51 -0400, Chris Alfonso wrote:

...

On 07/27/2011 11:51 AM, Jason Guiditta wrote:

...
On Wed, 2011-07-27 at 11:38 -0400, Christopher Alfonso wrote:

...
On Jul 27, 2011, at 11:14 AM, "Hugh Brock" hbrock@redhat.com wrote:

<snip>

...

l If you're familiar with authoring/maintaining puppet modules you can jump straight to the referenced docs [1,2] and drill down through the puppet manifests. If not, I'll describe what's going on here, and probably what you would really be interested in.

Keep in mind, for the candlepin databases that sit behind access.redhat.com/management we have a VIP sitting in front of two MySQL m2m configured databases. If there is a connectivity problem detected at the database VIP at runtime, connections from the application server switch from the active master to the standby master. If no VIP sits between the application server and the multiple m2m MySQL nodes, it's a bit more challenging to automatically fail over (you get to implement the retry yourself!).

The candlepin databases in production are set up using the following yaml files (which just reference which puppet classes to pull in, the class you're interested in is candlepin:db). That module is defined in http://git.corp.redhat.com/cgit/puppet-cfg/modules/candlepin/tree/manifests/.... You can follow the references to the puppet classes that are invoked, but the important thing to take away from this is that each MySQL node is setup as a master and a slave, such that node A is a slave to node B, and node B is a slave to node A (hence the master to master replication).

Node 1: http://git.corp.redhat.com/cgit/puppet-cfg/manifests/tree/nodes/db02.candlep... Node 2: http://git.corp.redhat.com/cgit/puppet-cfg/manifests/tree/nodes/db01.candlep...

There is a bit of automation in place and custom puppet types in the libmysql module, but at the end of the day, the nodes are configured with http://git.corp.redhat.com/cgit/puppet-cfg/modules/libmysql/tree/templates/c... and http://git.corp.redhat.com/cgit/puppet-cfg/modules/libmysql/tree/templates/c... (values filled in from the puppet manifests).

Once the configuration files are laid down and mysqld is running, running the CHANGE MASTER TO[3] syntax will 'wire' the nodes together.

References: [1] https://docspace.corp.redhat.com/docs/DOC-59614 [2] http://git.corp.redhat.com/cgit/puppet-cfg/modules/libmysql/ [3] http://dev.mysql.com/doc/refman/5.0/en/change-master-to.html

Thanks!

Chris

Chris, great information, thanks for passing it along! We may hit you up in the coming weeks once whoever works on this has had a chance to digest it and come up with further questions.

-j

Mo Morsi

29 Jul 29 Jul

12:03 a.m.

...

If there is interest in a easy way to implement an HA database configuration and you don't mind pushing users to a MySQL based solution

We're using ActiveRecord to access our DB and shouldn't be doing anything specific to postgres.

This will need to be verified and tested, but a great feature to have in 0.4.0 would be to be able to use aeolus against postgres, mysql, or whatever other existing DB that a customer may have already deployed.

Furthermore, we can parametrize this in configure to allow a user to select which db to use.

-Mo

Chris Alfonso

12:17 a.m.

On Jul 28, 2011, at 7:03 PM, Mo Morsi mmorsi@redhat.com wrote:

...

...
If there is interest in a easy way to implement an HA database configuration and you don't mind pushing users to a MySQL based solution

We're using ActiveRecord to access our DB and shouldn't be doing anything specific to postgres.

This will need to be verified and tested, but a great feature to have in 0.4.0 would be to be able to use aeolus against postgres, mysql, or whatever other existing DB that a customer may have already deployed.

Furthermore, we can parametrize this in configure to allow a user to select which db to use.

I probably didn't state clearly that the replication suggestion I described uses MySQL replication features for data HA. Each db is going to have their own take on HA configuration and some don't support an HA configuration without add-ons.

...

-Mo _______________________________________________ aeolus-devel mailing list aeolus-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/aeolus-devel

Steven Dake

27 Jul 27 Jul

7:16 p.m.

On 07/27/2011 08:05 AM, Jason Guiditta wrote:

...

On Wed, 2011-07-27 at 10:43 -0400, Hugh Brock wrote:

...
On Wed, Jul 27, 2011 at 10:26:08AM -0400, Jason Guiditta wrote:

...
Hello list (and Steve), I have been tasked with looking into the following nebulous feature item for consideration of being included in Iteration 4: '* HA configuration available?'

This is from an 'infrastructure around aeolus' perspective, if that helps context at all.

There was brief discussion of this in an extremely broad way on a recent call, and given my lack of involvement with anything HA, I find myself with absolutely no idea how to describe what might need doing here, let alone how it might be done. I will be looking through the list archives to see what has been discussed, but all the threads I recall seeing were very long and detailed, and I would rather get a synopsis if someone can summarize the current thinking in this area. Also, any links to specific threads or documentation/designs would be much appreciated. Thanks for any direction here,

Sure.

One of the requirements the SAs brought up is that it should be possible to make Conductor and its attendant pieces highly available. I'm not sure this is something we need to actually implement for release 0.4.0, but it would be nice to at least have a design for it by then. At a minimum we would need the entire app to be able to run in failover mode on two machines. We don't have to provide the infrastructure to deal with the failover, but the app needs to handle being failed over gracefully without data loss or (much) service interruption.

Does that help at all?

--Hugh

If that is really the gist, then yes, it is a start, thanks. Sounds like a feature to see how to best setup rails/$backing-db to be clustered, and possibly separate tasks to research same some some of the other components (iwhd seems a likely candidate). However, I have this feeling there is much more to it than that. Perhaps this task/feature is a combination of documenting how to do some of these thing, and writing up what what does and does not currently exist for this kind of scenario? Also, probably something around 'what is going to be the aeolus story for HA, and what does it mean?'.

-j

Jason,

Our current plans around HA are focused on providing high levels of service availability for resources/assemblies/deployables. A good understanding of HA theory as well as our plans is explained here:

http://www.redhat.com/summit/2011/presentations/summit/whats_new/thursday/da...

Since this task is on HA of the infrastructure itself (rather then the end user applications), I'd recommend looking into refining this task to the following:

* Contain infrastructure in a deployable *

Rather then has been previously suggested, this is not a circular dependency issue or infinite regression issue, but rather a bootstrapping task, much like creating a toolchain.

Why go to the trouble? We get automatic high availability (from www.pacemaker-cloud.org), the ability to scale out, and an easy configuration model for our users.

If you like, my team will take on this task during the next iteration.

Regards -steve

...

aeolus-devel mailing list aeolus-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/aeolus-devel

Jason Guiditta

9:14 p.m.

On Wed, 2011-07-27 at 11:16 -0700, Steven Dake wrote:

...

On 07/27/2011 08:05 AM, Jason Guiditta wrote:

...
On Wed, 2011-07-27 at 10:43 -0400, Hugh Brock wrote:

...
On Wed, Jul 27, 2011 at 10:26:08AM -0400, Jason Guiditta wrote:

...
Hello list (and Steve), I have been tasked with looking into the following nebulous feature item for consideration of being included in

<snip>

...

...
Jason,

Our current plans around HA are focused on providing high levels of service availability for resources/assemblies/deployables. A good understanding of HA theory as well as our plans is explained here:

http://www.redhat.com/summit/2011/presentations/summit/whats_new/thursday/da...

Since this task is on HA of the infrastructure itself (rather then the end user applications), I'd recommend looking into refining this task to the following:

Contain infrastructure in a deployable *

Rather then has been previously suggested, this is not a circular dependency issue or infinite regression issue, but rather a bootstrapping task, much like creating a toolchain.

Why go to the trouble? We get automatic high availability (from www.pacemaker-cloud.org), the ability to scale out, and an easy configuration model for our users.

If you like, my team will take on this task during the next iteration.

Regards -steve

Steve, that would be excellent, thanks - though I am quite sure we'll need to do some work on our side too. This level of HA you refer to is more what I was initially thinking was meant, I just want to be careful we don't over-promise on this (especially since we now seem to have 2 level of HA topic floating about). I'll read through your plan today/tomorrow, so I have a better idea what is going on from your side. Meanwhile, if there are things we can do on the aeolus/infra or conductor side to make this smoother/easier, please let us know so we can plan accordingly. Thanks,

-j

Steven Dake

9:34 p.m.

On 07/27/2011 01:14 PM, Jason Guiditta wrote:

...

On Wed, 2011-07-27 at 11:16 -0700, Steven Dake wrote:

...
On 07/27/2011 08:05 AM, Jason Guiditta wrote:

...
On Wed, 2011-07-27 at 10:43 -0400, Hugh Brock wrote:

...
On Wed, Jul 27, 2011 at 10:26:08AM -0400, Jason Guiditta wrote:

...
Hello list (and Steve), I have been tasked with looking into the following nebulous feature item for consideration of being included in

<snip> >> > > Jason, > > Our current plans around HA are focused on providing high levels of > service availability for resources/assemblies/deployables. A good > understanding of HA theory as well as our plans is explained here: > > http://www.redhat.com/summit/2011/presentations/summit/whats_new/thursday/dake_th_1130_high_availability_in_the_cloud.pdf > > Since this task is on HA of the infrastructure itself (rather then the > end user applications), I'd recommend looking into refining this task to > the following: > > * Contain infrastructure in a deployable * > > Rather then has been previously suggested, this is not a circular > dependency issue or infinite regression issue, but rather a > bootstrapping task, much like creating a toolchain. > > Why go to the trouble? We get automatic high availability (from > www.pacemaker-cloud.org), the ability to scale out, and an easy > configuration model for our users. > > If you like, my team will take on this task during the next iteration. > > Regards > -steve

Steve, that would be excellent, thanks - though I am quite sure we'll need to do some work on our side too. This level of HA you refer to is more what I was initially thinking was meant, I just want to be careful we don't over-promise on this (especially since we now seem to have 2 level of HA topic floating about). I'll read through your plan today/tomorrow, so I have a better idea what is going on from your side. Meanwhile, if there are things we can do on the aeolus/infra or conductor side to make this smoother/easier, please let us know so we can plan accordingly. Thanks,

-j

Jason,

For 0.4.0 iteration our priorities are: 1) eeolus monitoring via pacemaker cloud/matahari infrastructure 2) aeolus infrastructure in a deployable for later HA-ifying aeolus infrastructure

We bring roughly 2.5 developers to make these two tasks happen.

For later iterations, we can attack problem of deployable HA integration at which point we get aeolus infrastructure HA feature set. Note we have deployable HA functioning presently and going into F16.

Regards -steve

Perry Myers

28 Jul 28 Jul

6:10 p.m.

On 07/27/2011 02:16 PM, Steven Dake wrote:

...

On 07/27/2011 08:05 AM, Jason Guiditta wrote:

...
On Wed, 2011-07-27 at 10:43 -0400, Hugh Brock wrote:

...
On Wed, Jul 27, 2011 at 10:26:08AM -0400, Jason Guiditta wrote:

...
Hello list (and Steve), I have been tasked with looking into the following nebulous feature item for consideration of being included in Iteration 4: '* HA configuration available?'

This is from an 'infrastructure around aeolus' perspective, if that helps context at all.

There was brief discussion of this in an extremely broad way on a recent call, and given my lack of involvement with anything HA, I find myself with absolutely no idea how to describe what might need doing here, let alone how it might be done. I will be looking through the list archives to see what has been discussed, but all the threads I recall seeing were very long and detailed, and I would rather get a synopsis if someone can summarize the current thinking in this area. Also, any links to specific threads or documentation/designs would be much appreciated. Thanks for any direction here,

Sure.

One of the requirements the SAs brought up is that it should be possible to make Conductor and its attendant pieces highly available. I'm not sure this is something we need to actually implement for release 0.4.0, but it would be nice to at least have a design for it by then. At a minimum we would need the entire app to be able to run in failover mode on two machines. We don't have to provide the infrastructure to deal with the failover, but the app needs to handle being failed over gracefully without data loss or (much) service interruption.

Does that help at all?

--Hugh

If that is really the gist, then yes, it is a start, thanks. Sounds like a feature to see how to best setup rails/$backing-db to be clustered, and possibly separate tasks to research same some some of the other components (iwhd seems a likely candidate). However, I have this feeling there is much more to it than that. Perhaps this task/feature is a combination of documenting how to do some of these thing, and writing up what what does and does not currently exist for this kind of scenario? Also, probably something around 'what is going to be the aeolus story for HA, and what does it mean?'.

-j

Jason,

Our current plans around HA are focused on providing high levels of service availability for resources/assemblies/deployables. A good understanding of HA theory as well as our plans is explained here:

http://www.redhat.com/summit/2011/presentations/summit/whats_new/thursday/da...

Since this task is on HA of the infrastructure itself (rather then the end user applications), I'd recommend looking into refining this task to the following:

Contain infrastructure in a deployable *

Rather then has been previously suggested, this is not a circular dependency issue or infinite regression issue, but rather a bootstrapping task, much like creating a toolchain.

Why go to the trouble? We get automatic high availability (from www.pacemaker-cloud.org), the ability to scale out, and an easy configuration model for our users.

If you like, my team will take on this task during the next iteration.

Steve,

One thing to consider is that (perhaps) not everyone will want to run the Aeolus infrastructure in the cloud that they are managing. I agree that this is a valid deployment model, but perhaps does not cover the span of what people will want.

So to address the various deployment models for Aeolus, how about:

1. Separate bare metal cluster of hosts that run Aeolus infrastructure components that utilizes the 'Cluster Stack' (be it rgmanager or Pacemaker based) to provide HA of those server components

2. Deploy Aeolus infrastructure on internal cloud (RHEV, VMware, etc) and monitor/provide HA via Pacemaker Cloud

The former is really just a documentation effort and perhaps creation of some resource agents specific to aeolus components. So limited dev work needed here

The latter is (I think) what you are getting at. No objection to it, just think that it is complementary with the rgmanager/Pacemaker solution

Thoughts?

Perry

4720

Age (days ago)

4721

Last active (days ago)

aeolus-devel@lists.fedorahosted.org

13 comments

6 participants

tags (0)

participants (6)

Chris Alfonso
Hugh O. Brock
Jason Guiditta
Mo Morsi
Perry Myers
Steven Dake