On 11/22/17 at 02:34pm, Baoquan He wrote:
On 11/22/17 at 01:30pm, Dave Young wrote:
> On 11/22/17 at 01:18pm, Baoquan He wrote:
> > On 11/22/17 at 12:40pm, Dave Young wrote:
> > > On 11/22/17 at 11:25am, Baoquan He wrote:
> > > > On 11/22/17 at 11:17am, Dave Young wrote:
> > > > > Hi Baoquan,
> > > > >
> > > > > On 11/21/17 at 05:07pm, Baoquan He wrote:
> > > > > > In commit:
> > > > > >
> > > > > > commit 2040103bd7 ("kdumpctl: sanity check of
nr_cpus for x86_64 in case running out of vectors")
> > > > > >
> > > > > > ... function check_kdump_cpus() was introduced to check if
number of cpu
> > > > > > is enough to boot to provide enough interrupt vectors.
> > > > > >
> > > > > > As we know, there are 256 interrupt vectors on each cpu to
correspond to
> > > > > > irq. Except of those system reserved vectors, only part of
256 will be
> > > > > > used for device. So if there are many IO devices on a
system, E.g more
> > > > > > than 256 irqs are needed to map, then specifying
'nr_cpus=1' in kernel
> > > > > > cmdline may not work. Especially on large NUMA system, with
multiple
> > > > > > cpus and multiple devices.
> > > > > >
> > > > > > However, usually firmware engineer will consider the irq
number
> > > > > > corresponding to IO devices and CPU number, so it's not
a problem for
> > > > > > normal kernel. In kdump kernel, it may be a problem since
we default to
> > > > > > specify 'nr_cpus=1' in kernel cmdline. Or maybe
not, because for many
> > > > > > devices which is not needed for vmcore dumping, drivers
related to them
> > > > > > are not added to kdump kernel, then the message
'Warning: nr_cpus=1 may
> > > > > > not be enough for kdump boot' could be false positive
to a large extent.
> > > > > > So in this patch, change 'Warning: ...' to
'Note: ...' so that QA or
> > > > > > customers won't open bug for a false positive report.
> > > > > >
> > > > > > Signed-off-by: Baoquan He <bhe(a)redhat.com>
> > > > > > ---
> > > > > > kdumpctl | 2 +-
> > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/kdumpctl b/kdumpctl
> > > > > > index d4e00503a221..012369da8879 100755
> > > > > > --- a/kdumpctl
> > > > > > +++ b/kdumpctl
> > > > > > @@ -171,7 +171,7 @@ check_kdump_cpus()
> > > > > > return
> > > > > > fi
> > > > > >
> > > > > > - echo -n "Warning: nr_cpus=1 may not be enough for
kdump boot,"
> > > > > > + echo -n "[Note]: nr_cpus=1 may not be enough for
kdump boot,"
> > > > >
> > > > > It looks better to remove the brackets.
> > > >
> > > > OK, will change. thx
> > > >
> > > > >
> > > > > > echo " try nr_cpus=$nr_min or larger instead"
> > > > >
> > > > > this sentence can also be changed to for example:
> > > > > try nr_cpus=$nr_min or larger instead if nr_cpus=1 does not
work.
> > > >
> > > > This is one sentence with the above 'Note' one since the
above printing
> > > > uses 'echo -n'.
> > >
> > > Even if it is one sentence, it still sounds like leading user to use
nr_cpu=X
> > > without a test. It should be better to notify people to use larger
> > > nr_cpus only when nr_cpus=1 does not work in a test.
> > >
> > > Original sentence like below, it looks to suggest people to use
> > > nr_cpus=$nr_min once he see this note message.
> > >
> > I don't get what you mean. The origin sentence is:
> >
> > "nr_cpus=1 may not be enough for kdump boot, try nr_cpus=$nr_min or larger
instead"
> >
> > This happens when kernel boot if kdump.service is enabled by default or
> > run systemctl restart kdump. People see the 'Note' message, and they
may
> > do a test to check it. And the former half sentence has told if
> > 'nr_cpus=1 may not be enough', then .... Are you suggesting to change
it
> > as
> >
> > "If nr_cpus=1 does not work for kdump boot, try nr_cpus=$nr_min or larger
instead"
> >
> > Otherwise the original half one and the adding one you suggested are
> > duuplicate. I can change it as above one.
>
> Hmm, seems I did not describe it clearly, let me retry :)
>
> I means original "nr_cpus=1 may not be enough for kdump boot, try
nr_cpus=$nr_min or larger instead"
> seems suggest people to use nr_cpus=$nr_min once they see this note
> message even without a test. So maybe say it more clear like below:
> "nr_cpus=1 may not be enough for kdump boot, try nr_cpus=$nr_min or larger
instead in case nr_cpus=1 does not work"
This looks too long when print in screen. I personaly think the old
one is good enough, or adjust as I said. But I am fine if you insist.
I did not notice the 'echo -n' so original message is over 80 chars
already..
>
> About the nr_cpus=1 test in 1st kernel, it can prove the algorithm is
> correct if 1st kernel can not boot with nr_cpus=1
>
> Since xunlei did not find a machine to do a real test, so it is better
> to test it since we see such message now on a real machine.
Well, I checked the formula corresponding to kernel code, and think it's
reasonable. It's not a precise calculation, just a estimation value. So
I personally think the testing in 1st kernel doesn't make sense, since
'nr_cpus=1' working in 1st kernel wont' tell the calculation not good,
and 'nr_cpus=1' not working in 1st kernel won't prove it's very precise.
My personal thought. If you insist, I can take a test.
It is an estimation, but personally I think a test is still worth..
We missed another thing about this nr_cpus issue, that is for example
one already set nr_cpus=4 and 4>=nr_min, but we still tell that
"nr_cpus=1 may not be enough" it sounds odd.
How about move this function and warning out of kdumpctl as another
script like /usr/bin/kdump_estimate_nr_cpus.sh, if people need they can
call it by themselves, we just document it in kdump howto but do not run
it in kdumpctl automatically?
> >
> > > >
> > > > >
> > > > > > }
> > > > > >
> > > > > > --
> > > > > > 2.5.5
> > > > > > _______________________________________________
> > > > > > kexec mailing list -- kexec(a)lists.fedoraproject.org
> > > > > > To unsubscribe send an email to
kexec-leave(a)lists.fedoraproject.org
> > > > >
> > > > > Thanks
> > > > > Dave
> > >
> > > Thanks
> > > Dave
>
> Thanks
> Dave
Thanks
Dave