changing a few things in our host mgmt tools
Toshio Kuratomi
a.badger at gmail.com
Fri Mar 18 16:28:51 UTC 2011
On Fri, Mar 18, 2011 at 11:04:32AM -0400, seth vidal wrote:
> Hi folks,
> some thoughts have been slowly coalescing in my head about how we're
> managing our boxes/services and I have some suggestions I've passed by
> various folks but I wanted to check them out with everyone:
>
>
> 1. puppetd sucks..... memory. Right now we have puppetd running on every
> box and it wakes up every half hour and runs itself. This is fine but in
> the time where it is not doing anything it just eats memory for no good
> reason. I'd like to suggest we move to a cron-driven model instead of
> puppetd. I'd write a simple cron job that runs every half hour to run
> puppetd, if a lock file is not found. Pretty straightforward, of
> course.
>
+1
Might need to update kickstarts and/or the SOP pages:
http://fedoraproject.org/wiki/Kickstart_Infrastructure_SOP
http://fedoraproject.org/wiki/Puppet_Infrastructure_SOP
> 2. monitoring if puppetd has run properly:
> two things we want to know about puppet runs:
> a. when they last happened per-box
> b. if they fell over in a horrible way.
>
> (a) can be known by looking at the $nodename.yaml file which lives
> on the puppetmaster. I've written a script to check if that file is
> older than 1 hour and report the nodename if it is.
> (b) can be done via the cron job - ie: taking error output from the
> puppet run and mailing to people until we fix it! :)
>
+1
> 3. sign** boxes. problems here:
> a. These boxes are falling out of date, repeatedly, b/c they aren't
> in our normal updating path.
> b. these boxes don't email out to the same locations as the other
> boxes
> c. these boxes don't get faspassword updates properly
> d. these boxes don't get config changes normally via puppet
>
> (a) I'd like to suggest that they be put into a normal updating path
> and/or we setup a nag mail to tell us about them
> (b) obviously, fix their mail configs
> (c) fasclient is failing b/c of a missing token b/c, most likely, of
> (d)
>
> I'm open to suggestions on those but it is a bit annoying b/c while I
> understand their 'sensitivity' I think our way of treating them is
> making the problem WORSE not better.
>
I agree with your assessment. I guess we need to tell releng our concerns
and figure out what needs to be done For a: perhaps have releng okay us/a
specific subset of sysadmins to run updates along with all the other
updates.
-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20110318/ed59b933/attachment.bin
More information about the infrastructure
mailing list