==================================================================
#fedora-classroom: Infrastructure Private Cloud Class (2014-03-25)
==================================================================
Meeting started by nirik at 18:00:03 UTC. The full logs are available at
http://meetbot.fedoraproject.org/fedora-classroom/2014-03-25/infrastructu...
Meeting summary
---------------
* intro (nirik, 18:00:03)
* History/current setup (nirik, 18:02:51)
* LINK:
https://fed-cloud02.cloud.fedoraproject.org/dashboard/
(nirik, 18:10:48)
* LINK:
http://infrastructure.fedoraproject.org/cgit/ansible.git/tree/README
has a bunch of cloud specific info. (nirik, 18:11:35)
* IDEA: add openid support to openstack's horizon dashboard
(threebean, 18:13:19)
* LINK:
https://wiki.openstack.org/wiki/Nova_openid_service
(danofsatx-work, 18:17:06)
* Upcoming plans / TODO (nirik, 18:26:33)
* LINK:
https://en.wikipedia.org/wiki/OpenStack#Components
(threebean, 18:27:00)
* Open Questions (nirik, 18:40:57)
* LINK:
https://fedoraproject.org/wiki/Infrastructure_private_cloud
(nirik, 18:41:36)
Meeting ended at 19:02:55 UTC.
People Present (lines said)
---------------------------
* nirik (143)
* mirek-hm (29)
* danofsatx-work (21)
* threebean (19)
* tflink (16)
* smooge (5)
* zodbot (3)
* abadger1999 (3)
* webpigeon (3)
* jsmith (3)
* jamielinux (1)
* blob (1)
* relrod (1)
* janeznemanic (1)
18:00:03 <nirik> #startmeeting Infrastructure Private Cloud Class (2014-03-25)
18:00:03 <zodbot> Meeting started Tue Mar 25 18:00:03 2014 UTC. The chair is nirik.
Information about MeetBot at
http://wiki.debian.org/MeetBot.
18:00:03 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:00:03 <nirik> #meetingname infrastructure-private-cloud-class
18:00:03 <nirik> #topic intro
18:00:03 <zodbot> The meeting name has been set to
'infrastructure-private-cloud-class'
18:00:22 <nirik> hey everyone. who is around for a bit of talking about fedora's
infrastructure private cloud?
18:00:26 * threebean is here
18:00:32 <smooge> hello
18:00:38 <danofsatx-work> aqui! aqui!
18:00:44 * mirek-hm is here
18:00:45 * tflink is here
18:00:53 * blob is here
18:00:57 * relrod here
18:01:01 * webpigeon is here
18:01:11 * jamielinux is here out of curiousity
18:01:53 * abadger1999 is here
18:01:57 <janeznemanic> hi
18:02:24 <nirik> cool. ;) ok, I thought I would give a bit of background/history
first, then a bit about ansible integration, then talk about plans...
18:02:51 <nirik> #topic History/current setup
18:02:58 <threebean> that sounds good. can we interrupt you with questions? or
should be hold them for later?
18:03:10 <nirik> threebean: please do... questions are good. ;)
18:03:16 <threebean> cool, cool.
18:03:20 * jsmith is late
18:03:23 <nirik> so, our current setup is a openstack folsom cloud
18:03:31 <nirik> It was mostly manually installed by me.
18:03:38 <nirik> Ie, I installed rpms, ran setup, etc...
18:03:55 <nirik> there are currently 7 nodes in use
18:04:07 <jsmith> Nodes = servers?
18:04:08 <nirik> 1 (fed-cloud02.cloud) is the 'head node'
18:04:15 <nirik> 6 are compute nodes
18:04:20 <nirik> jsmith: yeah, physical boxes.
18:04:37 <jsmith> Perfect, just wanted to make sure I was getting the nomenclature
right
18:04:53 <nirik> The compute notes only run openstack-nova-compute, and one of them
also runs cinder (the storage service will get to that in a few)
18:05:04 <nirik> the head node runs all the other stuff.
18:05:18 <nirik> and acts as the gateway to all the other things.
18:05:34 <nirik> it runs network, the db (mysqld), amqp, etc, etc.
18:05:48 <nirik> It also runs cinder for storage.
18:05:50 <mirek-hm> head node is
fed-cloud02.cloud.fedoraproject.org BTW
18:06:29 <nirik> when you fire off an instance openstack looks around and schedules
it on a compute node.
18:06:34 <tflink> outside of cinder, is there any shared storage?
18:07:07 <nirik> when you ask for a persistent volume, it allocates it from one of
the cinder servers... it makes a lv in a pool and shares it via iscsi
18:07:14 <nirik> tflink: nope. Not currently.
18:07:45 <mirek-hm> aha, that are VG cinder-volumes in lvs output
18:07:46 <nirik> all storage is either cinder from 02 or cinder from 08... or each
compute node has a small amount of storage it uses to cache images locally.
18:07:56 <nirik> mirek-hm: yeah.
18:08:04 <tflink> is that a requirement of newer openstack? when I set up my
dev/standalone system, I got the impression that something like gluster was highly
reccomended for more than 1 mode
18:08:05 <nirik> lets see...what elese.
18:08:24 <nirik> tflink: we tried gluster, but it was really really really really
slow at the time.
18:08:42 <nirik> oh, networking:
18:08:44 <danofsatx-work> no swift for image service?
18:09:01 <mirek-hm> nirik: you forget on one exception, Copr-be have one storage
allocated on one compute node (800 GB)
18:09:02 <nirik> we have a number of tenants (projects). Each one gets it's own
vlan.
18:09:18 <nirik> mirek-hm: thats from the 08 compute node via cinder there. ;)
18:09:44 <nirik> we have a /24 network for external ips. Each instance gets one by
default and can also allocated a static one.
18:10:08 <nirik> danofsatx-work: yeah, we do have swift.
18:10:17 <nirik> it uses storage on fed-cloud02.
18:10:34 <nirik> There's a dashboard available for web (horizon):
18:10:48 <nirik>
https://fed-cloud02.cloud.fedoraproject.org/dashboard/
18:11:04 <nirik> The main way we interact tho is via ansible on lockbox01.
18:11:35 <nirik>
http://infrastructure.fedoraproject.org/cgit/ansible.git/tree/README has a bunch of cloud
specific info.
18:11:45 <nirik> basically we use the ansible ec2 module to spin things up, etc.
18:11:48 <webpigeon> does the web interface work for appentices
18:12:04 <danofsatx-work> fi-apprentice group can't login to dashboard :(
18:12:10 <nirik> webpigeon: it doesn't. It also doesn't interact with our
other authentication at all. ;( it's all manually needing to add people.
18:12:10 <mirek-hm> webpigeon: no
18:12:25 <nirik> I keep hoping someday they will add openid support.
18:12:32 * webpigeon wonders if he can get access to have a look :)
18:12:47 <danofsatx-work> I thought I saw that mentioned in the Havana release
notes.....
18:12:49 <nirik> sure, we could probibly do that. :)
18:12:53 <mirek-hm> you either have to have account there or you can root login to
fed-clooud02 and there is admin password stored in keystonerc file
18:12:55 <nirik> danofsatx-work: oh? excellent.
18:13:19 <threebean> #idea add openid support to openstack's horizon dashboard
18:13:20 <nirik> ok, lets see what else on current setup... oh, we have some
playbooks that make transient instances...
18:13:39 <nirik> so it can spin up one and then ansible configures it all for
testing something.
18:13:52 <nirik> we also have persistent instances where it's for a specific
use.
18:14:31 <nirik> we picked the ec2 interface over the nova one because we wanted to
be portable to euca or amazon... we could possibly revisit that, but it seems to do most
of what we need.
18:15:06 <nirik> oh, we also hacked onto fed-cloud02 a nginx proxy to proxy https
for us. By default folsom had it's ec2 interface http
18:15:42 <nirik> also, one compute node has been down with a bad disk, but should be
ready to re-add again now. I will likely do that in the next few days.
18:16:09 <nirik> So, any questions on the current setup? or shall I talk plans and
then open up to more questions?
18:16:31 <threebean> so, last week the cloud blew up because... one of the compute
nodes got the ip of the router?
18:16:35 <threebean> where did you look to figure that out?
18:16:38 <mirek-hm> if you are logged to fed-cloud2 and want to check status of
services you may run openstack-status which should give such output
http://fpaste.org/88542/57713621/
18:17:06 <nirik> threebean: yes. I looked on fed-cloud02 with nova-manage at the
ips... or perhaps I saw the .254 one in the dashboard first
18:17:06 <danofsatx-work>
https://wiki.openstack.org/wiki/Nova_openid_service
18:18:03 <mirek-hm> openstack services are (re)stared by /etc/init.d/openstack-*
init.d files
18:18:06 <nirik> nice!
18:18:13 <nirik> danofsatx-work: thanks for the link//info.
18:18:36 <danofsatx-work> np ;) - I'm teaching an Openstack class at uni right
now
18:18:45 <threebean> cool. what group(s) of people are allowed to login to
fed-cloud02 (and the others?) its all locally managed?
18:19:14 <nirik> threebean: yes, it's all local. sysadmin-main should have their
keys there for root (there arent any local users I don't think). Also a few other
folks like mirek-hm
18:19:36 <nirik> the cloud network is completely isolated from our other stuff and
rh's internal networks.
18:19:50 <nirik> so if you need to talk to it you have to go via the external
ips...
18:20:21 <mirek-hm> nirik: btw I will remove seth ssh key from authorizes_keys
18:20:27 <nirik> mirek-hm: ok. ;(
18:20:44 <tflink> does that have anything to do with the network goofiness where
instances can't always talk to eachother on non-public IPs?
18:20:44 <nirik> on to a bit of plans?
18:20:50 <threebean> one more Q
18:20:53 <nirik> tflink: nope, thats a openstack issue. ;)
18:20:56 <threebean> how close to full capacity are we on the compute nodes?
18:20:58 <tflink> ok, wasn't sure
18:21:07 <nirik> it's due to the way that they setup firewalls. ;(
18:21:26 <nirik> instances on different compute nodes can talk to each other fine,
but if they happen to be on the same one they cannot.
18:21:34 <nirik> I don't know if thats fixed in newer openstack.
18:21:49 <nirik> threebean: hard to say. ;) There's not any easy "show me
my capacity"
18:22:03 <threebean> heh, ok.
18:22:11 <nirik> each compute node reports on itself...
18:22:20 <threebean> perhaps we need a scripts/cloud-info script in our ansible
repo.
18:22:20 <nirik> so fed-cloud02 (which is also a compute node) says:
18:22:27 <nirik> 2014-03-25 18:21:44 30752 AUDIT nova.compute.resource_tracker [-]
Free ram (MB): 10732
18:22:27 <nirik> 2014-03-25 18:21:44 30752 AUDIT nova.compute.resource_tracker [-]
Free disk (GB): 951
18:22:28 <nirik> 2014-03-25 18:21:44 30752 AUDIT nova.compute.resource_tracker [-]
Free VCPUS: -4
18:22:57 <mirek-hm> also routing in fedora cloud is little bit tricky as external IP
does not work inside of cloud. Therefore if you comunicate from cloud machine to another
cloud machine you must use internal IP.
18:23:10 * nirik nods.
18:23:26 <tflink> mirek-hm: that's not always
18:23:27 <tflink> true
18:23:32 <nirik> also, you get automatically a external ip with each instance, but
it's dynamically assigned.
18:23:49 <tflink> but I've had trouble figuring out a way to always let
instances talk with eachother
18:23:49 <nirik> so if you want a static/known external you have to assign one, and
then you have 2 of them. ;(
18:24:16 <nirik> and you cannot give up the dynamic one.
18:24:33 <mirek-hm> nirik: what command you used to get that resource information?
18:25:00 <nirik> threebean: there is a newer project called celiometer or something
that is supposed to give you usage info, but thats not available in the old version we
have.
18:25:10 <tflink> a way to consistently let instanced talk to eachother over the
network, rahter. it's always been a one-off "does this work with instances X and
Y"
18:25:10 <nirik> mirek-hm: tail /var/log/nova/compute.log
18:25:24 <danofsatx-work> I thought that was heat, not celiometer.....
18:25:28 * danofsatx-work checks real quick
18:25:35 <mirek-hm> it is celiometer
18:25:48 <mirek-hm> heat is for configuration
18:25:48 <tflink> heat is coordination and setup of instances, no?
18:26:24 <danofsatx-work> ok, yeah....It's the Telemetry service.
18:26:33 <nirik> #topic Upcoming plans / TODO
18:26:48 <danofsatx-work> openstack isn't real good about actually giving the
name of each service, you have to dig for it.
18:27:00 <threebean>
https://en.wikipedia.org/wiki/OpenStack#Components
18:27:16 <nirik> so, we have a few new machines we are getting in soon (3). My plan
is to try and set those up as a seperate cloud, ideally via ansible
18:27:32 <nirik> so we can easily rebuild it from the ground unlike the current
one.
18:27:42 * danofsatx-work likes that plan
18:27:53 <nirik> also, if we can do openid that would be lovely.
18:27:54 <danofsatx-work> makes for a cleaner, easier transition
18:28:16 <nirik> once its all working, we could then migrate things over until all
is moved
18:28:33 <mirek-hm> yes, becouse if somebody or something break current cloud, we
have to manualy rebuild it, which would last ages.
18:28:34 <nirik> (and move compute nodes as needed)
18:28:58 <nirik> sadly, I haven't been able to find a good ansible recipe for
setting up a cloud, but I can look some more.
18:29:18 <nirik> it would be nice to just have it in our regular ansible repo with
everything else.
18:29:18 <mirek-hm> can we use TripleO for installation?
18:29:33 <nirik> mirek-hm: not sure. I have heard of that, but wasn't clear what
it did.
18:29:40 <mirek-hm> I will ask collegues if that can be scripted
18:30:11 <mirek-hm> nirik: it is installation of OpenStack using OpenStack which use
OpenStack :)
18:30:19 <nirik> cool. That would be nice. The way I read it is that it needs a
cloud installed already?
18:30:34 <danofsatx-work> what about RH's own product, packstack?
18:30:38 <mirek-hm> you first start with disk image, which you boot, then it install
undercloud and then it install normal cloud
18:30:59 <nirik> triple-o also looks a bit experemental.
18:31:03 <nirik> danofsatx-work: thats another option yeah.
18:31:15 <nirik> it's puppet/chef, but if we have to...
18:31:47 <mirek-hm> TripleO at devconf
http://www.youtube.com/watch?v=Qcpe2gjdRz0&list=PLjT7F8YwQhr928YsRxmO...
18:32:51 <nirik> Also on upcoming plans... in Q4 we are supposed to get some
dedicated storage for the cloud...
18:33:23 <threebean> for the new 3-node cloud.. do you expect we would move to a
more modern openstack release?
18:33:31 <mirek-hm> nirik: how big?
18:33:40 <threebean> looks like icehouse is supposed to be released in ~2 weeks.
18:33:52 <nirik> threebean: definitely.
18:34:11 <nirik> mirek-hm: unknown yet, as much as we can get for the money. ;)
Ideally something thats 2 units in a HA...
18:35:15 <nirik> also in Q3 we are getting some more compute nodes.
18:36:00 <abadger1999> nirik: What's our plans for what we're oging to use
the cloud for? (like -- are we going to move more and more services over to it?)
18:36:23 <nirik> abadger1999: currently, I think it's a good fit for some things
and not for others.
18:36:31 <nirik> devel and test instances -> definitely
18:36:34 <nirik> coprs -> yep
18:37:04 <nirik> our main applications -> not sure... I like the control we have
currently with them
18:37:10 <abadger1999> <nod>
18:37:13 <mirek-hm> nirik: extrapolated data say that copr will run out of disk
space on christmass, so Q3 is really last date
18:37:32 <nirik> mirek-hm: well, can you split into 2 volumes?
18:37:49 <nirik> all the other compute nodes have similar space... so we can fire up
cinder on them and add more 800GB volumes.
18:38:05 <mirek-hm> nirik: probably, if I can concate them using lvm
18:38:32 <danofsatx-work> is the old cloud hardware being rolled into the new
instance, or decommissioned?
18:38:33 <nirik> not sure if that will work, but we can see...
18:38:40 <nirik> danofsatx-work: rolled into the new one.
18:38:51 <nirik> it's got a few more years on it's warentee
18:40:11 <nirik> mirek-hm: we could also try and move the storage to Q3 and move the
new compute nodes to Q4. that might be best
18:40:13 <nirik> (if we can do it)
18:40:57 <nirik> #topic Open Questions
18:41:22 <nirik> So, we have a wiki page for our cloud, but it's pretty out of
date. We should likely revamp it
18:41:36 <nirik>
https://fedoraproject.org/wiki/Infrastructure_private_cloud
18:41:47 <nirik> That has some more older long term plans, etc.
18:42:10 <mirek-hm> will we add some arm64 machine to cloud as they arrive?
18:42:51 <nirik> I'd like to add one there yeah...
18:42:52 <tflink> I assume that the plan going forward is to retain network
isolation from the rest of infra?
18:42:57 <nirik> tflink: yep.
18:43:21 <nirik> that has some downsides...
18:43:39 <nirik> but overall I think it's very good given the carefree nature of
instances.
18:43:52 <tflink> yeah, I think it's a positive thing overall
18:43:53 <nirik> A few other thoughts to toss out for comment:
18:44:07 <tflink> just not for all my possible use cases :)
18:44:22 <nirik> 1. Should we continute to autoassign a external ip on the new
cloud? or require you to get one if you need it?
18:44:29 <nirik> it has advantages/disadvantages
18:44:55 <tflink> how many instances are using non-floating IPs right now?
18:45:04 <mirek-hm> can I reach wild internet when I have internal IP only (e.g via
NAT)?
18:45:35 <nirik> mirek-hm: not sure actually.
18:45:38 <danofsatx-work> EC2 only assigns (static) public ip's when asked for,
and it's the only IP you get
18:45:53 <nirik> danofsatx-work: we have it set to autoassign one.
18:45:59 <nirik> (in the current cloud)
18:46:16 <nirik> auto_assign_floating_ip = True
18:46:25 <nirik> tflink: I'm not sure how to tell. ;(
18:46:26 <mirek-hm> danofsatx-work: EC2 instance have both, internal and externa IP
18:47:04 <danofsatx-work> I understand, I'm trying to figure out how to
translate my thoughts into coherent IRC language ;)
18:48:16 <danofsatx-work> ok, scratch my last....it's not making sense anymore
in my head :(
18:48:39 <nirik> there's 101 external ips being used right now.
18:49:20 <threebean> heh, how did you figure that out? :P
18:49:41 <nirik> and 34 instances seem to have 2 external ips.
18:50:07 <nirik> nova-manage floating list | grep 209 | awk '{print $3 "
" $2 }' | grep -v None | wc -l
18:50:22 <threebean> great, thanks.
18:50:23 <nirik> ugly, but should be right.
18:50:24 <tflink> ah, not as many as I suspected
18:50:47 <nirik> so, I guess we should see if instances with no external can reach
out any. I'm not sure they can.
18:50:55 <nirik> but we can test that on the new cloud.
18:51:32 <danofsatx-work> unless there's a router set up to route the internal
traffic out, no they can't get out.
18:51:45 <nirik> question 2. We have been very lax about maint (mostly because this
current cloud is so fragile) would folks be ok with a more practive one on the new cloud?
ie, more frequent reboots/maint windows?
18:52:14 <nirik> danofsatx-work: all the instances would hit the head node, but I
don't know if nova-network will nat them or not.
18:52:19 <danofsatx-work> I would be, but I come from a different background than
the rest of y'all
18:52:55 <threebean> yeah, that would be fine. I worry that I'm responsible for
wasted resources somewhere, some node I forgot about.
18:53:48 <threebean> erm, I misinterpreted the question.. for some reason I read it
as being more proactive about cleanup.
18:53:48 <nirik> last time we rebooted stuff we were able to actually suspend
instances... and mostly they came back ok
18:53:56 <nirik> that too. :)
18:54:16 <nirik> more reporting would be nice... like a weekly snapshot of
'here's all running instances'
18:54:24 <nirik> sometimes it's hard to tell what an instance was for tho
18:54:52 <mirek-hm> nirik: +1 to keep auto_assign_floating_ip = True; +1 to planned
outages on ne Fedora Cloud
18:54:57 <threebean> oo, send some numbers to collectd?
18:55:53 <tflink> re: more proactive maintenance - I'd like to see coordination
with standard backup times. ie - run shortly after backup runs so taht any possible
data-loss would be minimized
18:55:57 <nirik> yeah, that might be nice. However, no vpn, so not sure it could
talk to log02.
18:56:00 <smooge> well getting that data is hard because of the nat
18:56:12 <nirik> or it could run it's own I guess.
18:56:24 <smooge> I would say we might want to look at having a system on that
network which could do that for us
18:56:40 <smooge> the various things that we would like but can't because of
seperate networks
18:56:52 <nirik> actually, it could be a cloud instance even. ;)
18:57:20 <mirek-hm> March 2014: Active Instances: 66 Active RAM: 352GB This
Month's VCPU-Hours: 40433.81 This Month's GB-Hours: 2067464.96
18:57:29 <smooge> well it might be nice if the box was able to run when the cloud
wasn't
18:57:42 <nirik> smooge: details, details.
18:57:49 <nirik> mirek-hm: thats only one tennant tho right?
18:58:23 <mirek-hm> nirik: that is from dashboard overview of admin user, so I would
say that it count everything
18:58:24 <nirik> tflink: agreed.
18:58:46 <nirik> mirek-hm: I think it's only the project it has active at the
time... if you change that the numbers change right?
18:59:12 <mirek-hm> yes
18:59:24 <nirik> so we would need to sum all those.
19:00:58 <nirik> ok, any other questions or comments?
19:01:02 <nirik> or shall we wrap up?
19:01:13 * nirik is happy to answer anything outside meeting, etc.
19:01:31 <danofsatx-work> any room for an apprectice on the cloud team?
19:02:03 <nirik> sure. always room for help... perhaps assistance setting up the new
cloud?
19:02:31 <danofsatx-work> yeah, I can do that...I need to learn ansible, and I am
building clouds at school and work anyhow ;)
19:02:39 <nirik> excellent. ;)
19:02:50 <nirik> ok, thanks for coming everyone... lets continue over in
#fedora-admin...
19:02:55 <nirik> #endmeeting