[PATCH] udev-rules: Restart kdump service on cpu ADD/REMOVE events

WANG Chao chaowang at redhat.com
Mon Sep 15 13:00:10 UTC 2014


On 09/05/14 at 04:16pm, Vivek Goyal wrote:
> This patch changes restart of kdump service from cpu online/offline events
> to cpu add/remove events.
> 
> Some people have complained that they are running cpu online/offline tests
> at high frequency and kdump restarts at high frequency and systemd disables
> the service. As a temporary fix, we committed a patch to never disable 
> kdump service.
> 
> In general it probably is a good idea to restart kdump service on cpu
> add/remove events.
> 
> Toshi Kani confirmed following.
> 
> - File for /sys/devices/system/cpu/cpuX/crash_notes will be created first
>   before ADD event goes out. That means we can not miss creating EFL notes
>   for newly created cpu.
> 
> - For REMOVE event files under /sys/devices/system/cpu/cpuX/ are removed
>   first and then REMOVE event goes out. That means we will remove the elf
>   note header for removed cpu.
> 
> - There are some race conditions like a cpu is removed but system crashes
>   before kdump service restarts. In that case vmcore.c has to be more robust
>   to be able to inspect elf notes and discard empty ones.
> 
>   Also it is possible that after cpu remove, crash notes memory got reused
>   for something else and after crash vmcore.c might see some random data.
>   It does basic size checks and discards elf notes if checks don't pass.
> 
>   Above rance conditions can happen even with OFFLINE event and there is
>   no good way to remove these altogether. So making vmcore.c more robust
>   is the right solution here.
> 
> Signed-off-by: Vivek Goyal <vgoyal at redhat.com>

Restarting kdump service on ADD/REMOVE seems to be more reliable. And
because vmcore can discard empty note at runtime, we don't have to
rebuild elf note.

Acked-by: WANG Chao <chaowang at redhat.com>

> ---
>  98-kexec.rules |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: kexec-tools-fedora/98-kexec.rules
> ===================================================================
> --- kexec-tools-fedora.orig/98-kexec.rules	2014-06-03 13:19:04.813120747 -0400
> +++ kexec-tools-fedora/98-kexec.rules	2014-09-04 10:59:59.093304225 -0400
> @@ -1,4 +1,4 @@
> -SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump.service"
> -SUBSYSTEM=="cpu", ACTION=="offline", PROGRAM="/bin/systemctl try-restart kdump.service"
> +SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/bin/systemctl try-restart kdump.service"
> +SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/bin/systemctl try-restart kdump.service"
>  SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/bin/systemctl try-restart kdump.service"
>  SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/bin/systemctl try-restart kdump.service"


More information about the kexec mailing list