Hi Ho Everybody!
There will likely be a short period of time over the next couple of days
where Zabbix will be unable to report/etc (manual DB updates).
I'll also be taking a moment to 'restructure' (for want of a better
word) the way we monitor external hosts etc. - One of the reasons we
went for this self punishment was to get a feel for how it compared to
Nagios, I think it has performed great and we'll be able to orphan off
Cacti at the same time :).
Now here is the important bit:
We have a variety of applications that have been recently had things
added/altered/moved etc etc, or they have just been never added to
Nagios etc, so here is your challenge:
If you run/work on/do something with the Infrastructure that meets any
of the following criteria:
* Is seen by the public (fas, etc)
* Can cause problems to the normal routine (i.e. rawhide builds etc -
did it succeed?)
* Is important in some other way
* Has a nice statistic that people might want to know/track...
THEN PLEASE... let us know...
What we need to know is:
* How can such a thing be monitored? - Open ports/service, number of
processes, age of a file, running a command and checking the output,
running a custom script (to make it easier for us, if you can create
such script it'd be helpful) etc etc etc
* How often would it need to be checked?
* What does 'failure' mean wrt the check (if one exists - statistics
don't need this)
* How can such a 'failure' be fixed automatically (ditto for above)...
Then we can add them all together, stir the pot and be happy happy
Be extravagant too while we mightn't want to implement every single
check you suggest, you might think of something that might have been
(sysadmin-noc: I still need to work out the best way of scaling this,
but I think I've nearly got it, and a SOP will be written when it's
Nigel Jones <dev(a)nigelj.com>