On 05/13/2017 at 04:35 AM, Jerry Hoemann wrote:
On Tue, May 09, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
> We met a problem on AMD machines, when using "nr_cpus=4" for
> kdump, and crash happens on cpus other than cpu0, kdump kernel
> will fail to boot and eventually reset.
>
> After some debugging, we found that it stuck at the kernel path
> do_boot_cpu()-> ... ->wakeup_secondary_cpu_via_init():
> apic_icr_write(APIC_INT_LEVELTRIG|APIC_INT_ASSERT|APIC_DM_INIT,
> phys_apicid);
> that is, it stuck at sending INIT from AP to BP and reset, which
> is actually what "disable_cpu_apicid=X" tries to solve. Printing
> the value of @phys_apicid showed that it was the value of "apicid"
> other that of "initial apicid" showed by /proc/cpuinfo.
>
> As described in x86 specification:
> "In MP systems, the local APIC ID is also used as a processor ID by the
> BIOS and the operating system. Some processors permit software to modify
> the APIC ID. However, the ability of software to modify the APIC ID is
> processor model specific. Because of this, operating system software
> should avoid writing to the local APIC ID register. The value returned by
> bits 31-24 of the EBX register (when the CPUID instruction is executed with a
> source operand value of 1 in the EAX register) is always the Initial APIC ID
> (determined by the platform initialization). This is true even if software
> has changed the value in the Local APIC ID register."
>
> From kernel commit 151e0c7de("x86, apic, kexec: Add disable_cpu_apicid
> kernel parameter"), we can see in generic_processor_info(), it uses
> a)read_apic_id() and b)@apicid to compare with @disabled_cpu_apicid.
>
Do you plan to clarify the kernel documentation:
Documentation/admin-guide/kernel-parameters.txt?
Yes, will do after this patch is finalized.
Regards,
Xunlei
Thanks
Jerry
> a)@apicid which is actually @phys_apicid above-mentioned is from the
> following calltrace(on the problematic AMD machine):
> generic_processor_info+0x37/0x300
> acpi_register_lapic+0x30/0x90
> acpi_parse_lapic+0x40/0x50
> acpi_table_parse_entries_array+0x171/0x1de
> acpi_boot_init+0xed/0x50f
> The value of @apicid(from acpi MADT) is equal to the value of "apicid"
> showed by /proc/cpuinfo as proved by our debug printk.
> b)read_apic_id() gets the value from LAPIC ID register which is "apicid"
> as well.
>
> While the value of "initial apicid" is from cpuid instruction.
>
> One example of "apicid" and "initial apicid" of cpu0 from
/proc/cpuinfo
> on AMD machine:
> apicid : 32
> initial apicid : 0
>
> Therefore, we should assign /proc/cpuifo "apicid" to
"disable_cpu_apicid=X".
>
> We've never met such issue before, because we usually tested
"nr_cpus=1",
> and mostly on Intel machines, and "apicid" and "initial apicid"
have the
> same value in most cases on Intel machines.
>
> Signed-off-by: Xunlei Pang <xlpang(a)redhat.com>
> ---
> kdumpctl | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/kdumpctl b/kdumpctl
> index 4d6b3e8..46b65d2 100755
> --- a/kdumpctl
> +++ b/kdumpctl
> @@ -77,15 +77,15 @@ remove_cmdline_param()
> }
>
> #
> -# This function returns the "initial apicid" of the
> -# boot cpu (cpu 0) if present.
> +# This function returns the "apicid" of the boot
> +# cpu (cpu 0) if present.
> #
> -get_bootcpu_initial_apicid()
> +get_bootcpu_apicid()
> {
> awk ' \
> BEGIN { CPU = "-1"; } \
> $1=="processor" && $2==":" { CPU = $NF; } \
> - CPU=="0" && /initial apicid/ { print $NF; } \
> + CPU=="0" && /^apicid/ { print $NF; } \
> ' \
> /proc/cpuinfo
> }
> @@ -206,7 +206,7 @@ prepare_cmdline()
>
> cmdline="${cmdline} ${KDUMP_COMMANDLINE_APPEND}"
>
> - id=`get_bootcpu_initial_apicid`
> + id=`get_bootcpu_apicid`
> if [ ! -z ${id} ] ; then
> cmdline=`append_cmdline "${cmdline}" disable_cpu_apicid ${id}`
> fi
> --
> 1.8.3.1
> _______________________________________________
> kexec mailing list -- kexec(a)lists.fedoraproject.org
> To unsubscribe send an email to kexec-leave(a)lists.fedoraproject.org