changing a few things in our host mgmt tools

Toshio Kuratomi a.badger at gmail.com
Fri Mar 18 16:28:51 UTC 2011


On Fri, Mar 18, 2011 at 11:04:32AM -0400, seth vidal wrote:
> Hi folks,
>  some thoughts have been slowly coalescing in my head about how we're
> managing our boxes/services and I have some suggestions I've passed by
> various folks but I wanted to check them out with everyone:
> 
> 
> 1. puppetd sucks..... memory. Right now we have puppetd running on every
> box and it wakes up every half hour and runs itself. This is fine but in
> the time where it is not doing anything it just eats memory for no good
> reason. I'd like to suggest we move to a cron-driven model instead of
> puppetd. I'd write a simple cron job that runs every half hour to run
> puppetd, if a lock file is not found. Pretty straightforward, of
> course. 
> 
+1

Might need to update kickstarts and/or the SOP pages:

http://fedoraproject.org/wiki/Kickstart_Infrastructure_SOP
http://fedoraproject.org/wiki/Puppet_Infrastructure_SOP

> 2. monitoring if puppetd has run properly:
>    two things we want to know about puppet runs:
>    a. when they last happened per-box
>    b. if they fell over in a horrible way.
> 
>     (a) can be known by looking at the $nodename.yaml file which lives
> on the puppetmaster. I've written a script to check if that file is
> older than 1 hour and report the nodename if it is.
>     (b) can be done via the cron job - ie: taking error output from the
> puppet run and mailing to people until we fix it! :)
> 
+1

> 3. sign** boxes. problems here:
>    a. These boxes are falling out of date, repeatedly, b/c they aren't
> in our normal updating path.
>    b. these boxes don't email out to the same locations as the other
> boxes
>    c. these boxes don't get faspassword updates properly
>    d. these boxes don't get config changes normally via puppet
> 
>    (a) I'd like to suggest that they be put into a normal updating path
> and/or we setup a nag mail to tell us about them
>    (b) obviously, fix their mail configs
>    (c) fasclient is failing b/c of a missing token b/c, most likely, of
> (d)
> 
>   I'm open to suggestions on those but it is a bit annoying b/c while I
> understand their 'sensitivity' I think our way of treating them is
> making the problem WORSE not better.
> 
I agree with your assessment.  I guess we need to tell releng our concerns
and figure out what needs to be done  For a: perhaps have releng okay us/a
specific subset of sysadmins to run updates along with all the other
updates.

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20110318/ed59b933/attachment.bin 


More information about the infrastructure mailing list