On 11/25/2009 03:04 PM, Denys Vlasenko wrote:
On Tue, 2009-11-24 at 11:24 +0100, Jiri Moskovcak wrote:
On 11/23/2009 05:28 PM, Denys Vlasenko wrote:
On Thu, 2009-11-19 at 15:44 +0100, Jiri Moskovcak wrote:
I was thinking about catching crashes in abrtd (I know is pretty stable, but still...). We decided not to catch it, because when the daemon is not running there is no quota for dumpdir size. So here is my proposal:
We can hardwire the /usr/sbin/abrtd to be handled specially by the hook so it will be saved into the same dir (like /var/cache/abrt/abrt-dump/) and overwriting the previous coredump. this way we'll avoid filling up the HDD. But there could still be a problem when if abrtd crashes in a loop then creating coredump might be I/O time-consuming, this can be solved by checking the timestamp of the last crash and setting some threshold.
Good idea. I added the code (see attached) which saves abrtd's coredump to /var/cache/abrt/abrtd-coredump _file_, not dir.
Thanks, I read the patch and seems ok to me, but I'd rather handle the abrt crash the same way as any other crash -> save it to some dir and let abrtd process it, so users can easily report it. This should be safe as abrtd is not respawned automatically,
You can't be sure of that, it depends on how admins run it. For one, on my home machine I run most services under daemontools, IOW: a service is restarted when it exits.
As I see it, we made a step in the right direction: we were not saving abrtd crash, now we do. When, and *if*, this proves to be not enough, we may make abrtd coredump "visible" to abrtd itself.
OK, let's see how this will work for users (abrtd seems to be quite stable, so in that few cases we can ask users to send the coredump/generate BT manually)
so there is no danger of abrt dumping it self forever (this can happen only for the hook). The only reason why abrtd has been ignored until now is because ccpphook doesn't dump anything if abrtd is not running and there's no one to watch the dump-quota.
We may need to add quota watching to ccpp for other reason: users report "dump storms" filling up their partitions:
Agree, I noticed, that the latest kernel 2.6.32 is not dumping the core even if the helper is set in core_pattern and there is a log in syslog saying smth about RLIMIT_CORE is set to 0 for process 'foo', so I'm wondering if we can temporary disable coredumping per process.
Jirka
-- vda