On 01/06/17 at 05:10pm, Xunlei Pang wrote:
On 01/06/2017 at 04:46 PM, Dave Young wrote:
> Hi, Xunlei
>
> Thanks for the patch, a few comments replied inline.
> On 01/06/17 at 02:37pm, Xunlei Pang wrote:
>> Check the number of cpus for x86_64 kdump kernel to boot with.
>> We met an issue for x86_64: kdump runs out of vectors with the
>> default "nr_cpus=1", when requesting tons of irqs.
>>
>> This patch detects such situation and warns users about the risk.
>>
>> Signed-off-by: Xunlei Pang <xlpang(a)redhat.com>
>> ---
>> v1->v2:
>> - When detecting risky cpu vectors, we just warn users instead of
>> modifying "nr_cpus=X" forcely.
>> - Improved code comments.
>> - Replaced nr_old with nr_origin, and improved some logic.
>>
>> kdumpctl | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 81 insertions(+)
>>
>> diff --git a/kdumpctl b/kdumpctl
>> index b2068cc..b6fc1f9 100755
>> --- a/kdumpctl
>> +++ b/kdumpctl
>> @@ -105,6 +105,85 @@ append_cmdline()
>> echo $cmdline
>> }
>>
>> +# Check the number of cpus for kdump kernel to boot with.
>> +# We met an issue for x86_64: kdump runs out of vectors with
>> +# "nr_cpus=1" when requesting tons of irqs, so here we check
>> +# "nr_cpus=X" and warn users if kdump probably can't work.
>> +check_kdump_cpus()
>> +{
>> + local nr nr_search nr_origin nr_min nr_max
>> + local arch=$(uname -m) cmdline=$KDUMP_COMMANDLINE
>> +
>> + # Special treatment for x86_64 only currently.
>> + if [ $arch != "x86_64" ]; then
>> + return
>> + fi
>> +
>> + # We only care about "nr_cpus=X" format for x86.
>> + nr_search=$(echo $cmdline | grep -o "nr_cpus=[0-9]*" | wc -l)
>> + if [ $nr_search -eq 0 ] ; then
>> + # Do not need to process if no valid "nr_cpus=X" specified.
> This comment sounds not necessary..
ok, will remove it.
>
>> + return
>> + fi
>> +
>> + # Get value X of "nr_cpus=X"
> Ditto.
ok
>
>> + nr_search=$(echo $cmdline | grep -o "nr_cpus=[0-9]*" | cut -d
"=" -f2 | grep "[0-9]" | sort)
> Is it ok to check $nr_search -eq 0 here and drop the previous chunk?
It's on purpose, a little different if there is wrong "nr_cpus=" without
numbers.
How about only check for "nr_cpus=1" as we set it as default, if one set
as other value, they must have tested it. So that we do not need these
corner cases checking, then all the error handling can be dropped.
>
>> + # In case there are multiple "nr_cpus=X", get the mininum value.
>> + for nr in $nr_search; do
>> + if [ $nr -gt 0 ]; then
>> + nr_origin=$nr
>> + break
>> + fi
>> + done
> What is the convention in kernel for these duplicated params, is it
> using the last one? We'd better to use same logic as kernel uses..
The one with the lowest valid number will be used.
>
>> + if [ -z "$nr_origin" ]; then
>> + echo "Warning: Wrong \"nr_cpus=\" kernel cmdline
detected"
> In previous code "Do not need to process if no valid nr_cpus=X
> specified", we could just return without warning. And is it possible to
> detect this earlier without an additional if else?
This is for detecting "nr_cpus=0".
>
>> + return
>> + fi
>> +
>> + # Online cpus in first kernel.
>> + nr_max=$(nproc)
> The manpage says it is for:
> "print the number of processing units available"
> So does it means some policy like cgroup can control the value? If so
> maybe we should get the number from other place like /proc/cpuinfo.
Good point, will modify it.
>
>> +
>> + # Calculate estimated minium cpus required by irqs(vectors).
>> + nr_min=$(ls /proc/irq/ -l | grep ^d | wc -l)
>> +
>> + # We roughly use 256-32(see kernel FIRST_EXTERNAL_VECTOR)=224 as
>> + # maximum supported vectors can be allocated to io devices percpu.
>> + # As nr_min is a ballpart figure, also some high-numbered vectors
>> + # are consumed by the kernel(see FIRST_SYSTEM_VECTOR), we need a
>> + # variance for safety.
>> + #
>> + # We got a large machine with 240 cpus, 6TB memory, 8 iommus, and
>> + # 12 io-apics, 132 irqs under /proc/irq/, it can boot successfully
>> + # with "nr_cpus=1". (256-32-132)=92, so choosing 64 as the variance
>> + # seems ok. Then we get the max external irqs supported per cpu:
>> + # (256-32-64)=160 as the dividend.
>> + nr_min=$(($nr_min + 160 - 1))
>> + nr_min=$(($nr_min / 160))
>> + if [ $nr_min -gt 1 ]; then
>> + # The system seems to have tons of interrupts. while interrupts
>> + # with multiple-cpu affinity can consume multiple vectors, with
>> + # one vector for each cpu within the affinity mask. Fortunately
>> + # for x2apic which is widely used on large modern machines, in
>> + # default case of boot, device bringup etc will use a single cpu
>> + # for the interrupt affinity to minimize vector pressure.
>> + #
>> + # For further safety, we add one more cpu and round it up to an
>> + # even number which is commonly-used.
>> + nr_min=$(($nr_min + 1))
>> + nr_min=$(($nr_min + $nr_min % 2))
>> + fi
>> +
>> + if [ $nr_min -gt $nr_max ]; then
>> + nr_min=$nr_max
>> + fi
>> +
>> + if [ $nr_origin -ge $nr_min ]; then
>> + return
>> + fi
>> +
>> + echo "Warning: CPU vectors under pressure with
\"nr_cpus=$nr_origin\", please try \"nr_cpus=$nr_min\" or more"
> Drop "please" and the tech details should be better. How about below:
> Warning: nr_cpus=$nr_origin is not enough for kdump kernel boot-up, try
> nr_cpus=$nr_min or larger numbers instead.
ok, will update. Thanks!
Regards,
Xunlei
>
>> +}
>> +
>> # This function performs a series of edits on the command line.
>> # Store the final result in global $KDUMP_COMMANDLINE.
>> prepare_cmdline()
>> @@ -134,6 +213,8 @@ prepare_cmdline()
>> fi
>>
>> KDUMP_COMMANDLINE=$cmdline
>> +
>> + check_kdump_cpus
>> }
>>
>>
>> --
>> 1.8.3.1
>> _______________________________________________
>> kexec mailing list -- kexec(a)lists.fedoraproject.org
>> To unsubscribe send an email to kexec-leave(a)lists.fedoraproject.org
Thanks
Dave