host-lifecycle-policy
by Mike McGrath
It dawns on me I never actually sent this to the list for comments and we
haven't officially adopted it yet.
http://infrastructure.fedoraproject.org/csi/host-lifecycle-policy/en-US/h...
It's still in flux but give it a read over. For some stuff it seems like
it adds more work (like post kickstart checklist) but really it just lists
what needs to be done as part of bringing a host online. Backups and
monitoring are things that often get forgotten about.
I'm working on a very similar doc for bringing services online.
-Mike
14 years, 3 months
Won't be around off-hours
by Toshio Kuratomi
My wife is going to a funeral out of state for a week so I'm home with the
kids. I'll still be here normal working hours but not so much outside of
those.
-Toshio
14 years, 3 months
logging infrastructure and notes
by Seth Vidal
I did a little spelunking around our system and I have some suggestions
for the logging infrastructure. We have enough hosts and complexity that
log analysis will help us know when something is misconfigured or flapping
in a weird way.
1. logs in /var/log/hosts on log1 are not consistently named - sometimes
they are being reported with ips, sometimes with short hostname, sometimes
with fqdn. It needs to be made consistent
2. we need to make sure we cleanup old logs from the above, too.
3. the structure of the log dir doesn't seem to match what we'd normally
see in /var/log on any host. They are being logged as a different dir per
day, which is great, but it'd be good if rsyslog was putting in the same
file structure as a normal set of logs so normal log analysis tools will
work on it
4. I installed pflogsumm on log1 so I could do a little postfix mail log
analysis - found some issues that way too. Regularly generating these
reports, especially the error reports would help us figure out what we
need to improve. We are clearly sending/redelivering A LOT more mail than
we're receiving so bumping our smtp process count would help.
5. Grouping the logs by type of service would also help look at
group/service trending and issues. especially if an issue is only popping
up on one box.
Just some initial thoughts.
-sv
14 years, 3 months
Outage Notification - Fri Jan 15 10:30:00 UTC
by Mike McGrath
There will be an outage starting at Fri Jan 15 10:30:00 UTC, which is
ongoing.
To convert UTC to your local time, take a look at
http://fedoraproject.org/wiki/Infrastructure/UTCHowto
or run:
date -d 'Fri Jan 15 10:30:00 UTC'
Affected Services:
Fedora People
Unaffected Services:
Buildsystem
CVS / Source Control
Database
DNS
Fedora Hosted
Fedora Talk
Mail
Mirror System
Torrent
Translation Services
Websites
Ticket Link:
https://fedorahosted.org/fedora-infrastructure/ticket/1930
Reason for Outage:
The xen host fedorapeople is on is currently not responding to pings.
I've contacted the hosting provider but have not heard back yet. It is
unclear when this host will be back online.
Contact Information:
Please join #fedora-admin in irc.freenode.net or respond to this email to
track the status of this outage.
14 years, 3 months
Meeting Log - 2010-01-14
by Ricky Zhou
20:01 < mmcgrath> .startmeeting Infrastructure
20:01 < mmcgrath> Who's here?
20:01 -!- belegdol [n=belegdol@fedora/belegdol] has joined #fedora-meeting
20:01 * wzzrd is here
20:01 * nirik hands mmcgrath a #
20:01 * sijis is here
20:01 < PhrkOnLsh> mind if i watch the show?
20:01 -!- a-k [n=akistler@2002:6390:aba9:3:20d:56ff:fe10:bb8d] has joined #fedora-meeting
20:02 < mmcgrath> PhrkOnLsh: not at all
20:02 < nirik> mmcgrath: you want #startmeeting there, not .startmeeting. ;)
20:02 < mmcgrath> #startmeeting Infrastructure
20:02 < zodbot> Meeting started Thu Jan 14 20:02:37 2010 UTC. The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:02 < zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
20:02 -!- zodbot changed the topic of #fedora-meeting to: (Meeting topic: Infrastructure)
20:02 < mmcgrath> damnit
20:02 < mmcgrath> Who's here?
20:02 * a-k is
20:02 * wzzrd is here
20:02 * nirik is hanging around in the back
20:02 -!- _lmr_ [n=lmr(a)187.64.216.6] has joined #fedora-meeting
20:02 * PhrkOnLsh .
20:02 * skvidal is here
20:03 < mmcgrath> Ok, lets get started
20:03 < mmcgrath> #topic Infrastructure -- Tickets
20:03 -!- zodbot changed the topic of #fedora-meeting to: Infrastructure -- Tickets (Meeting topic: Infrastructure)
20:03 < mmcgrath> Looks like no meeting notes so that's good.
20:03 < mmcgrath> I'll go over just a couple of things that happened this week worth mentioning
20:03 < mmcgrath> #topic Fedora hosted
20:03 -!- zodbot changed the topic of #fedora-meeting to: Fedora hosted (Meeting topic: Infrastructure)
20:04 < mmcgrath> hosted got it's new memory replaced.
20:04 < mmcgrath> it wasn't as smooth as it could have been but it did get done and within the outage window time.
20:04 < mmcgrath> #topic SPOF for koji and bastion
20:04 -!- zodbot changed the topic of #fedora-meeting to: SPOF for koji and bastion (Meeting topic: Infrastructure)
20:04 < mmcgrath> koj and bastion are still SPOF right now.
20:04 -!- mdomsch [n=Matt_Dom(a)cpe-70-124-57-11.austin.res.rr.com] has joined #fedora-meeting
20:04 < mmcgrath> mostly because we've not re-configured heartbeat.
20:04 < mmcgrath> but I'm also trying to re-think our vpn setup.
20:04 < mmcgrath> to be more robust with outage scenarios.
20:05 < mmcgrath> not allowing outbound udp makes that difficult.
20:05 < mmcgrath> though I suspect specific outound udp requests could be allowed.
20:05 < mmcgrath> #topic /mnt/koji
20:05 -!- zodbot changed the topic of #fedora-meeting to: /mnt/koji (Meeting topic: Infrastructure)
20:05 < dgilmore> im waiting on the try-n-buy to be approved
20:05 * ricky is here for a bit
20:05 < mmcgrath> So this one's kind of happened behind closed doors, not intentional just lots of "do you think this would work?" "how much budget?" etc, etc.
20:06 < mmcgrath> but yeah, dgilmore's been working on trying to get what will be the new /mnt/koji
20:06 < mmcgrath> and I've been working to figure out what will be the new backup of /mnt/koji.
20:06 < mmcgrath> this is all good for several reasons.
20:06 < mmcgrath> 1) it'll allow us to use all of our tapes for *everything* else and do daily, weekly and monthly backups as we should be doing.
20:06 < skvidal> mmcgrath: what's the plan for the storage for the backup? Still tape or are you switching to disks?
20:06 < mmcgrath> 2) /mnt/koji will hopefully be much faster.
20:07 < mmcgrath> skvidal: switching to disks. I'm hoping we'll be able to make snapshots of it for testing.
20:07 < mmcgrath> but dgilmore and I also talked about using bacula to backup to disk.
20:07 < mmcgrath> in the end though I think it will be a better solution because if /mnt/koji does die right now a horrible death.
20:07 < mmcgrath> it'd take a long time to get a new /mnt/koji up
20:07 < mmcgrath> whereas if we have disks to backup to I think the impact would be lessoned.
20:08 < mmcgrath> I do, however, have concerns just because it's so easy to wipe disks.
20:08 < dgilmore> mmcgrath: i agree
20:08 < mmcgrath> blowing away the /mnt/koji backup right now is a significantly more difficult task I think.
20:08 < mmcgrath> but at the end of the day I think it'll be worth it to us.
20:09 < mmcgrath> we'll be spending something like 5X what we spent for the /mnt/koji now so hopefully it'll all work out :)
20:09 < mmcgrath> ehh actually that's not quite true, probably closer to 4X
20:09 < mmcgrath> but still.
20:09 < dgilmore> and ive requested extra budget for next yera to grow it and hopefully help it more
20:09 < mmcgrath> here's to hoping it's all fast and usable and safe and backed up :)
20:09 < mmcgrath> anyone else have any questions or concerns on that?
20:10 < mmcgrath> alllrighty
20:10 < mmcgrath> #topic ssh_known_hosts
20:10 -!- zodbot changed the topic of #fedora-meeting to: ssh_known_hosts (Meeting topic: Infrastructure)
20:10 < mmcgrath> smooge: around to talk about this?
20:11 < mmcgrath> the smooge is likely busy :)
20:11 -!- walters [n=walters@nat/redhat/x-dtvgzmjuntdndbxu] has quit Remote closed the connection
20:11 < mmcgrath> After the move our ssh_known_hosts got way out of wack and instead of fixing it, he's been redoing it from scratch
20:11 < mmcgrath> and including more possible names so it should be even more useful then it was.
20:11 < smooge> here
20:11 < smooge> sorry
20:11 < smooge> one sec
20:11 < mmcgrath> smooge: no worries, just wanted a quick blurb about ssh_known_hosts so people know we're working on it.
20:12 < mdomsch> do our shells automatically use it, or do we need to do something in .ssh/config for it?
20:12 < smooge> ok sorry. I have updated ssh_known_hosts to all systems I could find in DNS and
20:12 < mmcgrath> mdomsch: they'd use it automatically but you should remove your .ssh/known_hosts
20:12 < smooge> removed ones that were no longer available
20:12 < mmcgrath> mdomsch: worst case, you'll get a conflict in known_hosts and you can just remove it.
20:12 < mdomsch> ok
20:12 -!- DraZoro1 [n=CT(a)vc-41-18-142-115.umts.vodacom.co.za] has joined #fedora-meeting
20:13 < smooge> I have tested it on app01.stg and was able to go to hosts inside fedora. I have commited and pushed
20:13 < mmcgrath> but the search order is ~/.known_hosts then /etc/ssh/ssh_known_hosts
20:13 < smooge> The only things that aren't in it are IPV6 addresses
20:13 < mmcgrath> smooge: excellent, thanks.
20:13 < mmcgrath> smooge: have you updated the SOP to match the new search order?
20:13 < smooge> new search order?
20:13 < smooge> and which SOP
20:13 < dgilmore> thats the default search order
20:13 < mmcgrath> http://fedoraproject.org/wiki/Infrastructure/SOP/ssh_known_hosts
20:14 < mmcgrath> smooge: I think right now we just have short hostname, and actual IP of the host.
20:14 < smooge> ah ok will fix
20:14 < mmcgrath> in the SOP I mean, you expanded on that so it'd be good to get it in the example :)
20:14 < mmcgrath> smooge: thanks
20:14 < mmcgrath> Ok, any other questions on that?
20:14 -!- fedbot [n=supybot(a)scrye.com] has quit Remote closed the connection
20:14 < mmcgrath> zodbot: hey look, fedbot's a big quitter.
20:14 < mmcgrath> ok
20:14 < mmcgrath> #topic Search Engine
20:14 -!- zodbot changed the topic of #fedora-meeting to: Search Engine (Meeting topic: Infrastructure)
20:14 < mmcgrath> a-k: take it!
20:14 -!- fedbot [n=supybot(a)scrye.com] has joined #fedora-meeting
20:15 < a-k> The two search candidates I wanted to put in public test are Xapian and Nutch
20:16 < a-k> Xapian installed fine and crawled the wiki for 90 minutes, then died
20:16 < a-k> I think it's fixable, so I still have some hope for Xapian
20:16 < mmcgrath> a-k: has the pt instance worked out for you so far?
20:16 < a-k> For Nutch, I've got Tomcat installed, but not configured yet
20:16 < a-k> This is pt3, BTW
20:16 < a-k> I haven't checked if Tomcat's ports are open
20:16 < mmcgrath> If you need more ram or disk let me know, sometimes we're stuck but sometimes not
20:16 < a-k> Yeah, pt is working fine
20:16 < a-k> I intended to reverse proxy Tomcat through Apache, so no need for extra open ports
20:16 < mmcgrath> a-k: I've never used any of them, what are the key differences?
20:17 -!- mchua is now known as mchua_afk
20:17 < a-k> Xapian is pretty flexible and customizable, so the custom keyword requirement should be satisfied, wherever that goes
20:17 < mmcgrath> then died a good death or a bad death?
20:17 < a-k> Nutch is very pre-packaged, so not so flexible
20:17 -!- DraZoro1 [n=CT(a)vc-41-18-142-115.umts.vodacom.co.za] has left #fedora-meeting []
20:17 -!- belegdol [n=belegdol@fedora/belegdol] has left #fedora-meeting ["Leaving"]
20:17 < a-k> Xapian didn't like a long URL, plus it may not like non-UTF8 so much
20:18 < mmcgrath> ahhh
20:18 < mmcgrath> interesting
20:18 < a-k> The URL thing I think is fixable with how I do the crawl
20:18 < mmcgrath> <nod>
20:18 < a-k> Plus, I still intend to keep looking at other candidates on my list
20:18 < mmcgrath> maybe not the non-UTF8 thing though huh?
20:18 < a-k> So these aren't necessarily the only two choices
20:18 < mmcgrath> sure
20:18 < abadger1999> Both java solutions?
20:18 * mmcgrath is happy to hear progress is being made.
20:19 < dgilmore> a-k: did we look at the one archive.org uses
20:19 < dgilmore> ?
20:19 < a-k> Yeah, UTF-8 could be the thing that kills most of the candidates, if that's going to be a real requirement
20:19 < mmcgrath> a-k: yeah what was the wiki link again?
20:19 < skvidal> why wouldn't utf-8 be a real requirement?
20:19 < a-k> dgilmore: archive.org uses arhiving, not indexing
20:19 < abadger1999> Or xapian is the C++ solution. /me was thinking lucene.
20:19 < mmcgrath> skvidal: I think non-utf-8 is the possible requirement
20:20 < a-k> .link http://fedoraproject.org/wiki/Infrastructure/Search
20:20 < a-k> #link http://fedoraproject.org/wiki/Infrastructure/Search
20:20 < mmcgrath> a-k: thanks
20:20 < dgilmore> a-k: they have a crawler http://crawler.archive.org/
20:20 < a-k> Xapian is C, with a little Perl
20:20 < a-k> Nutch is Java, hence Tomcat
20:21 < a-k> dgilmore: I can look at archive.org again
20:21 < a-k> I think that's it, unless there are more questions
20:21 < mmcgrath> a-k: just because I know people will ask, can you make sure that those that have been eliminated have a specific "eliminated because: " section?
20:21 -!- dhillon-v10 [n=dhillon-(a)66.222.202.68.cfl.res.rr.com] has joined #fedora-meeting
20:22 < a-k> I can do that
20:22 * nirik wonders if any solutions here will tie into the mailman/lists archives?
20:22 < smooge> eliminated because: It kills kittens and makes you eat them
20:22 < nirik> to replace pipermail.
20:22 < smooge> why would a search engine replace pipermail?
20:22 -!- rishi [n=rishi@gnu-india/supporter/debarshi] has joined #fedora-meeting
20:22 < skvidal> smooge: he means a new archiver
20:23 < skvidal> nirik: I don't think any of the others are any more maintained
20:23 < mmcgrath> a-k: if you want additional help feel free to ask on the list and recruit :) This is a pretty massive project. Especially for your first one for Fedora. If you need anything feel free to ask :)
20:23 < a-k> Sure. Thanks.
20:23 < mmcgrath> anyone have anything else right now?
20:23 < nirik> sure, personally I think pipermail is ok, but there is a ticket wanting us to replace it with monharc or something.
20:24 * dgilmore is ok with pipermail
20:24 < skvidal> nirik: mhonarc is more trouble than its worth, ime
20:24 < skvidal> mmcgrath: logging? anyone interested in working on it?
20:24 * nirik nods. perhaps 3.0 will come out someday with pipermail improvements.
20:25 < wzzrd> I'm kinda working on it, fwiw
20:25 < skvidal> nirik: perhaps monkeys will fly out of my arse
20:25 < smooge> skvidal, I am interested in doing it
20:25 < wzzrd> the logging that is
20:25 < smooge> I was starting to do it after I finished DNS/NTP
20:25 < mmcgrath> skvidal: sure
20:25 < mmcgrath> #topic logging
20:25 -!- zodbot changed the topic of #fedora-meeting to: logging (Meeting topic: Infrastructure)
20:25 < smooge> wzzrd, what are you doing
20:25 < skvidal> smooge: okay
20:25 < smooge> skvidal, what did you see
20:26 < skvidal> there are a number of things to work on
20:26 < mmcgrath> So logging is a pretty large topic with lots of sub parts
20:26 < skvidal> first - cleaning up the logs we have
20:26 < smooge> skvidal, or would I be too many cooks
20:26 < mmcgrath> lets start on what I gave wzzrd a week or two back.
20:26 < skvidal> okay
20:26 < mmcgrath> which is log analysis.
20:26 < skvidal> which is the wrong place to start
20:26 < mmcgrath> wzzrd: have you had a chance to look at some of the suggestions on th elist?
20:26 < skvidal> until you have logs under control
20:26 < mmcgrath> skvidal: none of which will get done while this meeting is going on :)
20:27 < wzzrd> i've checked out epylog a bit further
20:27 < skvidal> umm - if you don't want to discuss logging, that's fine
20:27 * skvidal is sorry for making the meeting longer
20:27 < mmcgrath> skvidal: keep your pants on we're talking about logging right now, I'm just trying to do so in a way that doesn't discourage wzzrd, a new potential sysadmin member.
20:28 < mmcgrath> wzzrd: any luck with it?
20:28 < wzzrd> but i want to raise the matter of realtime parsing vs. cron-based once-a-day parsing, before i dive into something head over heels
20:28 < mmcgrath> I have very little experience with it but I know skvidal has used it as has Jeff_S and some others.
20:28 < mmcgrath> I don't think we need realtime parsing at this time.
20:28 < skvidal> wzzrd: realtime parsing and analysis are seldom done by the same tool
20:28 < mmcgrath> at least I don't think it would buys us much.
20:29 < wzzrd> epylog is nice, but i think it woulde require the logs from a group of servers going into *one* file on the loghost if you want a single report for that group
20:29 < skvidal> wzzrd: and realtime parsing is handy if there are specific triggers you know to look for - but only useful insofar as they can raise a warning in nagios
20:29 < mmcgrath> but yeah, I haven't heard anyone really argue for realtime so we can just assume non-realtime for the moment :)
20:29 < wzzrd> mmcgrath, skvidal: ok, no real-time
20:29 < wzzrd> just making sure
20:29 < mmcgrath> epylog can't look at multiple files?
20:29 < skvidal> mmcgrath: yes, it can - but it requires editingits configs
20:29 < skvidal> mmcgrath: remember what I was saying on the infrastructure list about mimicing the file structure of /var/log
20:30 < skvidal> mmcgrath: I wasn't making that up :)
20:30 < mmcgrath> skvidal: just edits though? not like some crazy bastadrization?
20:30 < skvidal> mmcgrath: significant edits
20:30 < mmcgrath> skvidal: I still have no idea what you're talking about with that?
20:30 < skvidal> look in /var/log on your laptop/desktop
20:30 < mmcgrath> you mean I should have a /var/log/messages that has all messages from all of our hosts going int o it?
20:30 < skvidal> no
20:30 < mmcgrath> so I should have a /var/log/hosts/bastion/messages?
20:30 < mmcgrath> I know what is in /var/log/ on my laptop
20:30 < skvidal> you understand there are certain files that commonly exist in /var/log
20:30 < skvidal> good
20:30 < mmcgrath> yeah?
20:31 < mmcgrath> like funcd
20:31 < mmcgrath> which won't exist on log1
20:31 < skvidal> and those files are expected to havecontent consistentwith a lot of log parsers
20:31 < wzzrd> skvidal: epylog.conf doesn't allow it afais, and I don't think it comes with a module that allows for the parsing of multiple files
20:31 < skvidal> funcd doesn't log via syslog in that way
20:31 < mmcgrath> isn't that creating just the opposite of what wzzrd is talking about though?
20:31 < skvidal> wzzrd: It really does.
20:31 -!- mether [n=Rahul(a)115.240.11.90] has joined #fedora-meeting
20:31 < skvidal> mmcgrath: no
20:31 < mmcgrath> you're talking about creating more files and he's needing less
20:31 < skvidal> no
20:31 < skvidal> I'm not
20:31 < skvidal> if y'all would let me explain
20:32 < skvidal> instead of peppering me with remarks
20:32 < skvidal> It might help
20:32 * wzzrd shuts up
20:32 < skvidal> we want logs per-host
20:32 < skvidal> but we also want logs per-by-service/group
20:32 < skvidal> so let's say all of the app servers belong to the appgroup
20:33 < skvidal> we can setup rsyslog so that if a log comes in from app01 (for example) that it gets sent to /var/log/hosts/app01/2010/01/14/ AND to /var/log/groups/app-servers/2010/01/14
20:33 < skvidal> then if we want to do log analysis for the appservers we tell epylog to look at /var/log/groups/app-servers/2010/01/14
20:34 < skvidal> if we want it to do analysis for a specific app server we tell it to look at: /var/log/hosts/app01/2010/01/14/
20:34 < mmcgrath> epylog doesn't understand /var/log/hosts/app* ?
20:34 < skvidal> inside each of those dirs will be the syslog files normally generated by /var/log
20:35 < skvidal> mmcgrath: not when it is parsing log files - it expects the log files to be in the normal location relative to the base log path
20:35 < mmcgrath> I guess I'm not understanding how what is in my laptop in /var/log isn't what's in /var/log/hosts/app01/2010/01/14
20:35 < skvidal> okay
20:35 < skvidal> let's look at an example
20:36 < skvidal> login to bastion
20:36 < skvidal> cd /var/log
20:36 < skvidal> ls *log
20:36 < skvidal> ls *log
20:36 < skvidal> anaconda.log boot.log ha-log ldirectord.log sa-update.log yum.log
20:36 < skvidal> anaconda.syslog faillog lastlog maillog tallylog
20:37 < skvidal> the files syslog is writing are the only ones we can deal with for a remote logging server
20:37 -!- mbonnet_ [n=nmikeb@nat/redhat/x-mffdfhflbfhsdlsy] has quit "Getting off stoned server - dircproxy 1.2.0"
20:37 < skvidal> so messages, maillog, spooler, boot.log and cron
20:37 < skvidal> that's all we have access to
20:37 -!- ajax_ [n=ajackson@nat/redhat/x-dufagrlcirofytgc] has joined #fedora-meeting
20:37 < skvidal> now
20:37 -!- ldimagg__ [n=ldimaggi@nat/redhat/x-raflashvigxdpfbm] has joined #fedora-meeting
20:37 < skvidal> if you look on log1
20:37 -!- pjones [n=nnnnnnnn@fedora/pjones] has quit Read error: 54 (Connection reset by peer)
20:37 < skvidal> in
20:37 < skvidal> /var/log/hosts/bastion01/2010/01/14
20:37 < skvidal> for example
20:37 -!- halfline_ [n=rstrode@nat/redhat/x-orzrcxfjwbumvjdj] has joined #fedora-meeting
20:37 -!- ldimaggi___ [n=ldimaggi@nat/redhat/x-iylrghkpugcssyrv] has joined #fedora-meeting
20:37 < skvidal> cron.log kernel.log mail.log messages.log secure.log
20:37 < skvidal> you have all of those files
20:37 -!- ajax__ [n=ajackson@nat/redhat/x-ciqdvigcdiekghja] has joined #fedora-meeting
20:38 < skvidal> do you see how those files do not match the filenames and separation that are normally in /var/og?
20:38 -!- mbonnet_ [n=nnmikeb@nat/redhat/x-kjygxqitpvxutbom] has joined #fedora-meeting
20:38 < skvidal> ie mail.log vs maillog
20:38 < skvidal> kernel.log existing AT ALL
20:38 < skvidal> messages.log vs messages
20:38 < smooge> oi
20:38 < mmcgrath> yeah, so you're talking, mostly, about renaming the files?
20:38 < skvidal> that's what I mean by the difference
20:38 < mmcgrath> k
20:38 < skvidal> mmcgrath: and what content goes into them
20:39 < skvidal> ie: kernel.log shouldn
20:39 < skvidal> 't exist at all really
20:39 < skvidal> it's content should be in messages
20:39 < wzzrd> skvidal: crap you were right, i was looking in the wrong place... i didn't quite grasp how epylog's internals worked yet, i suppose...
20:39 < mmcgrath> k, I don't see any problem with that.
20:39 < skvidal> that's what I mean about fixing the structure of our remote logs
20:39 < skvidal> then once we do that
20:39 -!- pjones- [n=pjones@fedora/pjones] has joined #fedora-meeting
20:39 < skvidal> and we log by 'type of server/service'
20:39 < mmcgrath> wzzrd: k, you want to continue working and learning epylog?
20:40 < skvidal> then we can run generic log tools like epylog and generate lovely results
20:40 -!- halfline [n=rstrode@nat/redhat/x-bhhgxvexpbwoodqd] has quit Read error: 60 (Operation timed out)
20:40 < wzzrd> mmcgrath: sure, eager to help out
20:40 < skvidal> w/o having to beat our brains out modifying epylog to access logs we don't want
20:40 < mmcgrath> skvidal: so in theory the amount of logs we store is going to about double. Which do you think we should keep longer? the host level logs or the service level logs?
20:40 -!- pjones- is now known as pjones
20:40 < skvidal> mmcgrath: it's not going to double
20:40 < skvidal> right now we've made the mistake of doing *.* from syslog.conf on our logclients
20:41 < skvidal> instead of trimming the crap out
20:41 < skvidal> no one needs spooler.debug sent remotely
20:41 < mmcgrath> how do you know what to include and what not to?
20:41 < skvidal> years of experience?
20:41 < skvidal> :)
20:41 < skvidal> seriously- you keep warning and above
20:41 < skvidal> and drop a lot of the info and debug crap
20:41 < mmcgrath> skvidal: but still, lets say we only sent what warning and above right this second.
20:41 < skvidal> we can get rid of a lot of crap that's not helpful
20:41 < mmcgrath> when we start storing services too, we're storing all logs twice right?
20:42 < skvidal> <shrug> sure but it's just not that much content
20:42 < mmcgrath> skvidal: so, for example, we wouldn't be sending mail logs?
20:42 < skvidal> I think we should send maillogs
20:42 -!- lmacken_ [n=lmacken@nat/redhat/x-mwvftkmgxvqfjsea] has joined #fedora-meeting
20:42 < skvidal> unless you want to do mailog analysis ON the mailservers
20:42 < skvidal> which seems like a bad use of their cpu time
20:42 < mmcgrath> I've never setup a central logger that didn't store everything, skvidal do you happen to want to take lead on trimming that stuff down?
20:43 < dgilmore> i should just stop getting email and there would be alot less log data
20:43 < mmcgrath> <nod> I'd prefer to keep log analysis on the logger.
20:43 < mmcgrath> dgilmore: that's very true :)
20:43 < skvidal> mmcgrath: sure
20:43 < skvidal> mmcgrath: smooge you wanna work w/me on this?
20:43 < smooge> I am happy to. I love log analysis
20:43 < mmcgrath> skvidal: yup yup.
20:43 < skvidal> smooge: I'm glad I'm not alone :)
20:43 < mmcgrath> so on this same topic... there's still one thing I'd like to get converted.
20:44 < mmcgrath> most of our hosts are still not using rsyslog
20:44 < mmcgrath> I'd like to convert them to rsyslog, some have been but not most.
20:44 < smooge> mmcgrath, on my list of things to fix after ntpd
20:44 * mmcgrath is just mentioning that.
20:44 < skvidal> mmcgrath: for rhel5?
20:44 < mmcgrath> it could be as easy as yum install rsyslgo
20:44 < skvidal> did rhel5 switch to rsyslog?
20:44 < mmcgrath> skvidal: yeah
20:44 < skvidal> okie doke
20:44 < smooge> mmcgrath, skvidal it is basically 5 commands
20:44 < mmcgrath> skvidal: it does for fresh installs, but if you updated, it didn't do a replace.
20:44 < smooge> 1) yum install rsylog
20:44 < mmcgrath> as of 5.3 I think.
20:44 < skvidal> smooge: no problem - I just hadn't heard the switch was official in rhel5
20:45 < smooge> its an alternate
20:45 < smooge> syslogd is still prefered because of age
20:45 < smooge> oh wait.. I missed the new isntall part
20:46 < mmcgrath> smooge: is this something that's going to be possible / easy in puppet or are we goign to have to get func involved?
20:46 < smooge> func migth be easiest
20:46 * mmcgrath thinks perhaps he's been overthinking it.
20:46 < mmcgrath> smooge: what about new installs?
20:46 < mmcgrath> well, we can figure that after the meeting I guess :)
20:46 < smooge> yeah..
20:47 < mmcgrath> wzzrd: any other questions on your side?
20:47 < mmcgrath> does anyone have any thoughts about exactly what we're looking for in these reports?
20:47 < wzzrd> well, im not sure whether this ok to ask
20:48 < mmcgrath> you can ask whatever you want. If it's for the root password though we probably won't answer.
20:48 < wzzrd> i think it would be easy to have some sort of mentor,
20:48 < wzzrd> you know, to ask some questions to
20:48 < skvidal> mmcgrath: for mail- I'm looking for errors and to make sure we don't have too much overrun/disk/cpu issues, for the rest of systems I'm looking to start getting a baseline on what 'normal' looks like and then fixing up problems
20:48 < skvidal> wzzrd: ask me
20:48 < wzzrd> skvidal: great, thanks!
20:48 < skvidal> I'm around often and I know a fair bit about the epylog code base
20:48 -!- ajax [n=ajackson@nat/redhat/x-doivlmfizynhhosw] has quit Read error: 60 (Operation timed out)
20:49 < skvidal> and I know the author of epylog personally and am willing to annoy him
20:49 < wzzrd> skvidal: you seem to be pretty well informed in this logging business :)
20:49 -!- tomspur [n=Tom(a)p54A54B74.dip.t-dialin.net] has joined #fedora-meeting
20:49 < wzzrd> lol
20:49 < mmcgrath> wzzrd: yeah for epylog ask skvidal I have no experience with it. If you have questions about fedora or how we're doing something just ask anyone in #fedora-admin, skvidal smooge and I are almost always in there.
20:49 < wzzrd> ok appreciate it
20:49 -!- ldimaggi_ [n=ldimaggi@nat/redhat/x-gjpbecmbtnvijmjl] has quit Read error: 60 (Operation timed out)
20:49 < dgilmore> wzzrd: skvidal, mmcgrath, myself, ricky, smooge and others will be more than happy to answer questions
20:50 < mmcgrath> Ok, anyone hav any other questions on logging?
20:50 < smooge> not me
20:50 < mmcgrath> ok, with that I'll open the floor for anything and everything
20:51 < mmcgrath> #topic Infrastructuer -- Open FLoor
20:51 -!- zodbot changed the topic of #fedora-meeting to: Infrastructuer -- Open FLoor (Meeting topic: Infrastructure)
20:51 < mmcgrath> jds2001: you around?
20:51 < mmcgrath> our newest -main member has been pretty quiet.
20:51 < dgilmore> mmcgrath: i wore him out on Saturday
20:51 < mmcgrath> hehe
20:51 < mmcgrath> oh! i know one thing.
20:51 < skvidal> dgilmore: umm - that sounds
20:51 < skvidal> umm
20:51 < skvidal> wrong
20:52 < mmcgrath> I'm still in the process of getting our secondary,alt,archive stuff to download.fedora.redhat.com
20:52 < dgilmore> skvidal: sure.
20:52 < skvidal> :)
20:52 < dgilmore> skvidal: i made him work alot on saturday while migrating the lists
20:52 < dgilmore> ;)
20:52 < mmcgrath> After the move we have root squashed for all /pub content (which is good)
20:52 < dgilmore> that better
20:52 < mmcgrath> but now I can't do wnything with my dirs that are there becaues they're root owned.
20:53 < ricky> Heh, yow :-)
20:53 < dgilmore> opps
20:53 < mmcgrath> yeah
20:53 -!- lmacken [n=lmacken@fedora/lmacken] has quit Read error: 113 (No route to host)
20:53 < mmcgrath> I do still have several concerns.
20:53 < mmcgrath> same as smooge
20:53 < mmcgrath> will the disks be able to keep up?
20:53 < mmcgrath> will the snap-mirror process work correctly with the new load?
20:54 < mmcgrath> Ok, anyone have anything else?
20:54 < mmcgrath> If not we'll close the meeting in 30
20:54 -!- ajax_ [n=ajackson@nat/redhat/x-dufagrlcirofytgc] has quit Connection timed out
20:54 < smooge> concerns? I have no concerns.. I have reality
20:54 < smooge> cold clear that it probably wont keep up
20:54 < mmcgrath> smooge: yeah, I have a feeling we're in for a ride here.
20:54 -!- ldimagg__ [n=ldimaggi@nat/redhat/x-raflashvigxdpfbm] has quit Read error: 110 (Connection timed out)
20:55 < mmcgrath> but who knows, we might get surprised :)
20:55 < nirik> mmcgrath: will this change what I have to do to push the nightly live composes?
20:55 < smooge> yeah.. I am buying lotto tickets just in case
20:55 < mmcgrath> nirik: it'll just change were you write to
20:55 < mmcgrath> nirik: hopefully for the better because it'll be all mirrored and stuff
20:56 < nirik> cool.
20:56 < mmcgrath> ok, with that!
20:56 < mmcgrath> #meetingend
20:56 < mmcgrath> #endmeeting
20:56 -!- zodbot changed the topic of #fedora-meeting to: Channel is used by various Fedora groups and committees for their regular meetings | Note that meetings often get logged | For questions about using Fedora please ask in #fedora | See http://fedoraproject.org/wiki/Meeting_channel for meeting schedule
20:56 < zodbot> Meeting ended Thu Jan 14 20:56:53 2010 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot .
20:56 < zodbot> Minutes: http://meetbot.fedoraproject.org/fedora-meeting/2010-01-14/fedora-meeting...
20:56 < zodbot> Minutes (text): http://meetbot.fedoraproject.org/fedora-meeting/2010-01-14/fedora-meeting...
20:56 < mmcgrath> damnit
20:56 < zodbot> Log: http://meetbot.fedoraproject.org/fedora-meeting/2010-01-14/fedora-meeting...
20:56 < mmcgrath> Thanks for coming everyone!
14 years, 3 months
Log management
by Maxim Burgerhout
After last weeks meeting, Mike asked me to look into a solution for
log management, reporting and alerting. As the new guy, I happily
jumped in and so here's a short status report on what I found and what
some of the options are. I would appreciate any input.
First of all, there's a lot of different projects trying to deliver
this functionality. A big part of them though, are either dead,
commercial and / or mainly aimed at parsing Apache logfiles.
I am assuming we want an open source tool, so, skipping the commercial
products like Splunk and skipping the dead projects, it boils down to
these as most promising. If I turn out to be missing some great
project, please tell me and I'll look into it.
1. octopussy (http://www.8pussy.org/dokuwiki/doku.php)
Log aggregation tool, providing a one stop web UI to view all logs for
a bunch of servers, complete with alerting to email, jabber or nagios,
creation of graphs etc. Imports data into a MySQL database. The
downsides are that it's developer base is pretty narrow, it is
relatively complex to configure and it is Debian centric in both
documentation and available packaging. On the other hand, octopussy is
what comes closest to Splunk in the open source world that I know of.
Sadly, all other Splunk-like apps are commercial. Last release:
December 2009, last commit a couple of days ago.
2. logreport / lire (http://www.logreport.org/)
Log parser that runs from cron or manually; parses logs from many
different applications and generates html, txt or pdf that can
optionally be mailed to people. Slightly odd curses configuration
frontend. Works by importing log files into what is called the 'dlf
store' and renders periodic reports from that data. Can receive
logfiles over mail. Downside is that development is pretty slow, even
if it is backed by a foundation (about one release per year; last one
was in March 2009). Low mailinglist traffic. Does not do alerting and
real time parsing. DLF has sqlite as a backend, afaict, and I'm not
sure how well that scales in the long run. Last release: March 2009,
no activity on mailinglist and CVS since then.
3. epylog (https://fedorahosted.org/epylog/)
Has some fans in the Infra group already, it seems. Modular log
parser, run from cron. Stores offsets for already parsed files, to the
next run can start where the previous one left off. Generates reports
in html in a configurable location or sends them out per mail. Custom
report delivery methods can be configured. Does not do alerting and
real time parsing. Hosted on Fedorahosted, written in Python (the
other two apps are mainly in Perl). Very easy to write custom modules
for. Last release: none recently, but subversion is active.
Of the projects, only epylog is packaged for Fedora already and
Octopussy is the only one that does realtime log parsing and alerting.
What should be the next step in this? I probably need to do some more
testing, but before I do that, let me know what you think is important
in a log monitoring and reporting application? Any specific tests you
would like to see done for these apps?
Let me know what you think.
Maxim Burgerhout
maxim(a)wzzrd.com
----------------
GPG Fingerprint
EB11 5E56 E648 9D99 E8EF 05FB C513 6FD4 1302 B48A
14 years, 3 months
Jon Stanley
by Mike McGrath
I am happy to announce that Jon Stanley, proud cubs fan[1], has been
promoted to sysadmin-main. He's been working with Fedora for years and
he's been working in several infrastructure groups for a long time. Most
recently of which was the mail list migration.
So welcome to syadmin-main Jon!
-Mike
[1] I actually think he's a cardinals fan, I could be wrong. No, I think
I am wrong, yeah he's a cardinals fan :)
14 years, 3 months