changing a few things in our host mgmt tools

Kevin Fenzi kevin at scrye.com
Fri Mar 18 15:48:00 UTC 2011


On Fri, 18 Mar 2011 11:04:32 -0400
seth vidal <skvidal at fedoraproject.org> wrote:

> Hi folks,
>  some thoughts have been slowly coalescing in my head about how we're
> managing our boxes/services and I have some suggestions I've passed by
> various folks but I wanted to check them out with everyone:
> 
> 
> 1. puppetd sucks..... memory. Right now we have puppetd running on
> every box and it wakes up every half hour and runs itself. This is
> fine but in the time where it is not doing anything it just eats
> memory for no good reason. I'd like to suggest we move to a
> cron-driven model instead of puppetd. I'd write a simple cron job
> that runs every half hour to run puppetd, if a lock file is not
> found. Pretty straightforward, of course. 

I think this is a fine idea. ;) 

> 2. monitoring if puppetd has run properly:
>    two things we want to know about puppet runs:
>    a. when they last happened per-box
>    b. if they fell over in a horrible way.
> 
>     (a) can be known by looking at the $nodename.yaml file which lives
> on the puppetmaster. I've written a script to check if that file is
> older than 1 hour and report the nodename if it is.
>     (b) can be done via the cron job - ie: taking error output from
> the puppet run and mailing to people until we fix it! :)

Sounds good. There are some few boxes where we don't run puppet, (the
sign* boxes, some of the backup boxes?)

Options here: 

1) if we don't intend to puppet manage them, perhaps we should
completely disable them/comment them out for normal operations? 
I know the sign* machines puppet module is intended to setup everything
needed on those machines with a blank db and ready to configure. So, we
would only be using this in setting up new instances. Disable the rest
of the time. 

2) Fix the puppet modules on them so that for normal operations they
only do a small number of things... fasClient,etc. I think this is not
intended however for security reasons. 

> 3. sign** boxes. problems here:
>    a. These boxes are falling out of date, repeatedly, b/c they aren't
> in our normal updating path.
>    b. these boxes don't email out to the same locations as the other
> boxes
>    c. these boxes don't get faspassword updates properly
>    d. these boxes don't get config changes normally via puppet
> 
>    (a) I'd like to suggest that they be put into a normal updating
> path and/or we setup a nag mail to tell us about them
>    (b) obviously, fix their mail configs
>    (c) fasclient is failing b/c of a missing token b/c, most likely,
> of (d)
> 
>   I'm open to suggestions on those but it is a bit annoying b/c while
> I understand their 'sensitivity' I think our way of treating them is
> making the problem WORSE not better.

a) I'd agree. nag mail on updates might be the easy path. 
b) yep
c) Perhaps we should just make them non fas accounts there? Like
backup? 
d) we either need to fix the puppet module to not tamper with any db
stuff in normal operations, or not use puppet on them except to setup
initial config. 

I know one of the things I was going to look at doing was making a new
sign-{bridge|vault} pair with puppet and see what all it did and if it
got everything setup, etc. 

So, short term, I would say we should apply updates, fix mail, setup
nag mail for updates, and fix fasclient and leave the puppet issue for
later after we look at what all is going on in that module. 

kevin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20110318/bdf9756e/attachment-0001.bin 


More information about the infrastructure mailing list