Re: [PATCH] Avoid falling into infinite loop restart when using a problematic system

Tuesday, 25 December 2018

On 12/25/18 at 10:24am, lijiang wrote:
...

 在 2018年12月24日 12:17, Dave Young 写道:
 > On 12/21/18 at 07:38pm, Dave Young wrote:
 >> On 12/21/18 at 04:47pm, Buland Singh wrote:
 >>> On 12/21/18 4:14 PM, Dave Young wrote:
 >>>> On 12/21/18 at 12:59pm, Buland Singh wrote:
 >>>>> On 12/21/18 12:29 PM, Kairui Song wrote:
 >>>>>> Hi, Dave, Lianbo
 >>>>>>
 >>>>>> My concern is that crash loop may generate tons of dump cores,
and the
 >>>>>> dump target may get filled up by dump cores,
 >>>>>> that may have larger potential risk. Else I think it's good
to leave
 >>>>>> it as it is.
 >>>>>>
 >>>>>> On Fri, Dec 21, 2018 at 2:05 PM lijiang
<lijiang(a)redhat.com&gt; wrote:
 >>>>>>>
 >>>>>>> 在 2018年12月21日 10:49, Dave Young 写道:
 >>>>>>>> + more people
 >>>>>>>> On 12/20/18 at 04:49pm, lijiang wrote:
 >>>>>>>>> 在 2018年12月20日 13:57, Dave Young 写道:
 >>>>>>>>>> On 12/20/18 at 01:06pm, Lianbo Jiang wrote:
 >>>>>>>>>>> By default, early kdump reboots the system
after capturing the vmcore.
 >>>>>>>>>>> If the problematic system is continuously
crashing due to some issue
 >>>>>>>>>>> during early boot stage, the system may fall
into infinite loop restart
 >>>>>>>>>>> like this:
 >>>>>>>>>>>
 >>>>>>>>>>>       boot -----> crash -----> early
kdump (dump vmcore)
 >>>>>>>>>>>         ^                              |
 >>>>>>>>>>>        
'.........(reboot).............'
 >>>>>>>>>>>
 >>>>>>>>>>> But now, the system crash at early stage is
only captured by early kdump,
 >>>>>>>>>>> and the rest is captured by normal kdump.
That to say, when normal kdump
 >>>>>>>>>>> service starts, it will load it again and
override early kdump. It is
 >>>>>>>>>>> helpful to control the logic of early kdump
and normal kdump separately
 >>>>>>>>>>> in final action(it is called by
kdump-capture.service). For example,
 >>>>>>>>>>> early kdump always passes the
'rd.earlykdump' to the second kernel when
 >>>>>>>>>>> early kdump is enabled, but normal kdump
doesn't pass the 'rd.earlykdump'
 >>>>>>>>>>> to the second kernel at any time. So they
can be distinguished in the
 >>>>>>>>>>> second kernel.
 >>>>>>>>>>
 >>>>>>>>>> Hmm, I'm confused about the param passing
above.
 >>>>>>>>>>
 >>>>>>>>>
 >>>>>>>>> I copy some messages from another email, please
refer to this one:
 >>>>>>>>> [--->
 >>>>>>>>> The rd.earlykdump is added to kernel command line in
grub.cfg. However, early kdump
 >>>>>>>>> and normal kdump can get the same parameters from
/proc/cmdline in the first kernel.
 >>>>>>>>>
 >>>>>>>>> Early kdump passes the rd.earlykdump to the second
kernel, but normal kdump doesn't
 >>>>>>>>> need it, normal kdump needs to remove the
rd.earlykdump.
 >>>>>>>>>
 >>>>>>>>> So which can distinguish early kdump and normal
kdump in the second kernel. It helps
 >>>>>>>>> to control the logic of kdump capture service. For
example: default action/final action.
 >>>>>>>>> ]
 >>>>>>>>
 >>>>>>>> The description is confusing, "ealy kdump passes
... to the second
 >>>>>>>> kernel", for example about this,  the real thing is
one person adds the
 >>>>>>>> param in 1st kernel cmdline, kexec-tools takes/inherits
and pass to 2nd
 >>>>>>>> kernel.
 >>>>>>>>
 >>>>>>>
 >>>>>>> Yes. Good point. Thanks for your explanation.
 >>>>>>>
 >>>>>>>> Anyway this is patch log issue.
 >>>>>>>>
 >>>>>>>>>
 >>>>>>>>>> Early or non early just means about the service
loading phase, in
 >>>>>>>>>
 >>>>>>>>> Yes. This patch used the same method what you said.
When normal kdump service starts,
 >>>>>>>>> it will reload. At the same time, early kdump will
be overwritten by normal kdump.
 >>>>>>>>
 >>>>>>>> Probably "early kdump load" is better than
"early kdump" in words.
 >>>>>>>>
 >>>>>>>>>
 >>>>>>>>>> initramfs or not, I notice dracut/systemd will
print some message about
 >>>>>>>>>> they are running in initramfs, so probably you
can check how to get it
 >>>>>>>>>> with same way,  if this is not initramfs then
just unload before the
 >>>>>>>>>> check in kdump loading.
 >>>>>>>>>>
 >>>>>>>>>> The picture like below:
 >>>>>>>>>>
 >>>>>>>>>> Kernel boot ->
 >>>>>>>>>>
 >>>>>>>>>>      initramfs ---
 >>>>>>>>>>           early kdump load
 >>>>>>>>>> ---- Mark A  ----
 >>>>>>>>>>      initramfs switch root
 >>>>>>>>>>
 >>>>>>>>>>           system startup (real root fs)
 >>>>>>>>>>                  service a
 >>>>>>>>>>                  service b ... (eg. networking
etc.)
 >>>>>>>>>>                  kdump service start
 >>>>>>>>>> -----Mark B -----
 >>>>>>>>>>                           load kdump kernel
again
 >>>>>>>>>>
 >>>>>>>>>>
 >>>>>>>>>> The problem will happen between Mark A and Mark
B, during this period,
 >>>>>>>>>> there could be repeated crash ->
earlykdump_load,  there might be some
 >>>>>>>>>> random crash as well since during the real root
fs service startup,
 >>>>>>>>>> for example after network is ready if some
network workload cause a
 >>>>>>>>>> panic, it maybe not 100% reproducible,  so it
seems we still need to
 >>>>>>>>>> make the poweroff configurable. eg.
 >>>>>>>>>>
 >>>>>>>>>> default is poweroff, but one can choose if he
can.
 >>>>>>>>>>
 >>>>>>>>>
 >>>>>>>>> Yes, default is poweroff for early kdump. Unless
kdump capture service
 >>>>>>>>> happens error or enters the emergency service, one
can choose the default
 >>>>>>>>> action.(configure default=xxx in kdump.conf)
 >>>>>>>>
 >>>>>>>> For default action instead of final action if you
hardcode it, then even if
 >>>>>>>> one set default as reboot it still poweroff.
 >>>>>>>>
 >>>>>>>
 >>>>>>> If really need, that can be improved.
 >>>>>>>
 >>>>>>>> [snip]
 >>>>>>>>
 >>>>>>>>>>>
 >>>>>>>>>>> +check_rd_earlykdump()
 >>>>>>>>>>> +{
 >>>>>>>>>>> +    egrep "rd.earlykdump"
/proc/cmdline
 >>>>>>>>>>> +}
 >>>>>>>>>>> +
 >>>>>>>>>>>    start()
 >>>>>>>>>>>    {
 >>>>>>>>>>>      check_dump_feasibility
 >>>>>>>>>>> @@ -969,7 +974,13 @@ start()
 >>>>>>>>>>>      check_current_status
 >>>>>>>>>>>      if [ $? == 0 ]; then
 >>>>>>>>>>>              echo "Kdump already
running: [WARNING]"
 >>>>>>>>>>> -          return 0
 >>>>>>>>>>> +          check_rd_earlykdump
 >>>>>>>>>>> +          #if earlykdump loaded, it will
stop and start.
 >>>>>>>>>>> +          if [ $? -eq 0 ]; then
 >>>>>>>>>>> +                  stop
 >>>>>>>>>>
 >>>>>>>>>> kdumpctl start can run not only by system
startup services, one can also
 >>>>>>>>>> run it manually or in udev rule.
 >>>>>>>>>>
 >>>>>>>>>> The checking of kernel cmdline seems not
enough.
 >>>>>>>>>>
 >>>>>>>>>
 >>>>>>>>> Here it means that if kdump has beend loaded, check
whether early kdump did it.
 >>>>>>>>> If yes, let normal kdump load it again, otherwise no
need to do anything.
 >>>>>>>>
 >>>>>>>> As we discussed you do not get my points here :)
 >>>>>>>>
 >>>>>>>> check_rd_earlykdump will be always true once kernel
bootup, so there is
 >>>>>>>> no way to get the first time of normal kdump load and
the later one.
 >>>>>>>>
 >>>>>>>> The early boot time panic to address for this patch is
the 100%
 >>>>>>>> reproducible panic,  for this kinds of panic admin
should be able to see
 >>>>>>>> it when he boot the machine. So rethinking about this
the best way may
 >>>>>>>> be just a wontfix.
 >>>>>>>>
 >>>>>>>> Let's consider below use cases:
 >>>>>>>>
 >>>>>>>> * First install:
 >>>>>>>>
 >>>>>>>> install os ->
 >>>>>>>> ---A
 >>>>>>>>       reboot ->
 >>>>>>>>           ...
 >>>>>>>> ---B
 >>>>>>>>           kdump service start
 >>>>>>>>              -> create kdump initrd
 >>>>>>>>           ...
 >>>>>>>>           boot finished
 >>>>>>>>           recreate default initrd and enable early
kdump
 >>>>>>>>           goto A
 >>>>>>>>
 >>>>>>>> Panic happened between A and B, if it is predictable,
eg. 100%
 >>>>>>>> reproducible, then admin should already see it, then
he/she can control
 >>>>>>>> and stop the repeating crash/kdump loop
 >>>>>>>> if the panic is not 100% reproducible then use reboot as
final action is just fine
 >>>>>>>>
 >>>>>>>> * Other use cases eg. updating kernel or some other
components:
 >>>>>>>> It is similar with the intall os use case because if one
update kernel
 >>>>>>>> or critical components it is likely they need regenerate
kdump initrd,
 >>>>>>>> and then repack it into early kdump default initrd, in
this case admin
 >>>>>>>> should be able to see the panic loop and handle it.
 >>>>>>>> Also if the panic is not 100% reproducible then we are
just fine.
 >>>>>>>>
 >>>>>>>> If we choose to split early and late kdump load, there
could be other
 >>>>>>>> side effects, and make the logic even complicated.
 >>>>>>>>
 >>>>>>>> So...  the better way may be just leave it as is, and
maybe add some
 >>>>>>>> documentation.
 >>>>>>>>
 >>>>>>>
 >>>>>>> It is a good way to document the risks that may exist.
 >>>>>>>
 >>>>>>> Just like the public transportation, we all know that it has
the risk, but we still choose it.
 >>>>>>>
 >>>>>>> Thanks.
 >>>>>>>
 >>>>>>>> Thoughts?
 >>>>>>>>
 >>>>>>>> Thanks
 >>>>>>>> Dave
 >>>>>>>>
 >>>>>>
 >>>>>
 >>>>> Hello Dave et al.
 >>>>>
 >>>>> Kindly check the below condition, assumption and share your
thoughts.
 >>>>
 >>>> Buland, thank you for the reply.
 >>>>
 >>>>>
 >>>>> Condition:
 >>>>>
 >>>>> [0x1] System is up and running with the kernel version X.
 >>>>> [0x2] Admin performed the kernel Y upgrade.
 >>>>> [0x3] Running kernel X crashed.
 >>>>> [0x4] Normal kdump rebooted the system and captured the kernel crash
dump of the kernel X.
 >>>>> [0x5] System rebooted with the newly installed kernel Y.
 >>>>> [0x6] Let's assume that due to some unknown reason the booting
kernel Y also crashed (assume that the panic is 100% reproducible).
 >>>>> [0x7] Early kdump started dumping the kernel crash dump of the
booting kernel Y.
 >>>>> [0x8] Early kdump rebooted the system and stuck in the loop.
 >>>>>
 >>>>> Assumption: 1
 >>>>>
 >>>>> What if the problematic system is in data center and admin is not
aware of this situation?
 >>>>>
 >>>>> [0x1] The dump target will be filled with the multiple copies of the
kernel crash dump?
 >>>>>
 >>>>>        [ 1 kernel crash dump of the kernel X and 'n' kernel
crash dump of the kernel Y]
 >>>>>
 >>>>> [0x2] The system will reboot in the loop?
 >>>>
 >>>> It is hard to define,  for original kdump service without early kdump
 >>>> load, it is also possible after one replaced a kernel the new kernel
panics
 >>>> during boot phase just after kdump get loaded.
 >>>>
 >>>> Admin at least should test a reboot after updating a kernel?
 >>>
 >>> Agree, but not sure if all the admins will follow this rule :)
 >>>
 >>>
 >>>>> Assumption: 2
 >>>>>
 >>>>> What if the dump target in on the local disk?
 >>>>>
 >>>>> [0x1] Admin needs to power off the system manually to retrieve the
kernel crash dump of the kernel X and Y from the resume mode.
 >>>>> [0x2] Admin needs to remove the multiple copies of the kernel crash
dump of the booting kernel Y.
 >>>>> [0x3] Admin might get confused while differentiating between the
kernel crash dump of the kernel X and Kernel Y.
 >>>>
 >>>> As for the worst case we all admit this is a problem,  we are more than
 >>>> happy to make admin be easier and fix it :)
 >>>>
 >>>> As I said we are exporting about this problem see if we have a good
 >>>> solution.
 >>>>
 >>>> But if we can assume predictable panic can be avoid then the situation
 >>>> will be better.
 >>>
 >>> One suggestion, can we have a separate default behavior for normal kdump and
early kdump?
 >>
 >> Yes if we can.
 >>
 >> Kdump service start either early or late we just use a syscall to load
 >> another kernel/initrd into pre-reserved memory.
 >>
 >> So once kdump service started we can not differenciate kdump kernel was
 >> loaded early or late.
 > 
 > But still not sure about it.  As we can see for normal kdump there will
 > be similar issue existed, eg. between C and D if a reproducible panic
 > also happens every time during late boot phase
 > 
 > So it is hard to define this is early kdump only, just more likely for
 > early kdump.
 > 
 > ---A
 > initramfs kdump load
 >    switch root
 >       other services startup
 > ---B
 >       kdump service start
 > ---C
 >       other services start up
 >       ...
 > ---D
 >       boot finished
 > 
 > If we consider this as a early kdump only/must-fix issue, thinking about
 > it we should split into two issues:
 > 
 > 1. how to determine early load and then reload while kdumpctl start 
 > 
 > If "kdump" service is not active but kdump kernel loaded then
 > it should have been "early loaded".  Then something will like this:
 > 

 Thanks for your suggestion.

 If we executed the command: systemctl restart kdump.service and then start kdump.service
again,
 it won't exactly distinguish early kdump and normal kdump.

 I'm also looking for other solutions. 
As we talked in meeting, I means "is not active" below pseudo code
missed a "!"

It should be good if no better way.

> 
> > kdumpctl start()
> >   if kdump is loaded:
> >   	if systemd kdump.service is active
> > 		# early loaded
> > 		stop and continue to load again
> > 	else
> > 		print a warning service is already running and then
> > 		return
> >   else
> > 	go ahead to load and start
> > 
> > 2. how to set the reboot action for early and late loading:
> > 
> > use an cmdline like rd.earlykdump.noreboot,  default value is true, if
> > one want to use alternate he/she can add rd.earlykdump.noreboot=0 
> > 
> > Add extra config optioins is also a choice but seems much complicated,
> > not only final action, also default action,  for default action in case
> > partially saved vmcore it still will occupy the whole disk.  So seems we
> > just should replace reboot with shutdown for any cases.
> > 
> >>
> >>>
> >>> Eg:
> >>>
> >>> In /etc/kdump.conf file.
> >>> default  reboot     (Default action to be taken by normal kdump and
early kdump if dumping fails)
> >>> ndefault reboot     (Default action to be taken once the dumping is
successful by normal kdump)
> >>> edefault poweroff   (Default action to be taken once the dumping is
successful by early kdump)
> >>>
> >>> Note: Create 'ndefault' & 'edefault' options
identical to 'default' option so that an admin can alter the behavior as per the
requirement.
> >>>
> >>> [0x1] If kernel crash dump is captured by normal kdump then
'reboot' the system (by default) or take action as per 'ndefault' value.
> >>> [0x2] If kernel crash dump is captured by early kdump then
'poweroff' the system 9by default) or take action as per 'edefault'
value.
> >>>
> >>
> >> We can introduce something so that early kdump service load and set
> >> poweroff as default,  but when normal kdump service start we have two
> >> choices:
> >>
> >> 1. go ahead to use the early loaded setup without reload like we have in
> >> the code now, in this case normal kdump will also use poweroff no matter
> >> what is set in /etc/kdump.conf
> >>
> >> 2. reload kernel/initrd with normal setup in /etc/kdump.conf.  In this
> >> way, we need get if this is a late service startup.  Because if we
> >> blindly reload then it will affect later udev triggered kdump restart.
> >>
> >> For example manually start the service like kdumpctl start, origianlly
> >> it will just print a msg about the service is already running.  But now
> >> the "start" == "stop, then start", this may be not a big
problem but
> >> looks odd.
> >>
> >> Correct my self about the udev hotplug triggered events, it should be ok
> >> because it will call restart that means stop then start, this change
> >> will not affects it.
> >>
> >> BTW, the default action here should be "final action",  there is
a
> >> "default" action can be configured in kdump.conf which is used for
kdump
> >> kernel to do after vmcore saving failed, it is more like a failsafe
> >> action.  But the default "default" is also "reboot". 
When kdump
> >> successfully saves a vmcore it will go to "final action"
(==reboot)
> >> which is not configurable.
> >>
> >>> --
> >>> Buland
> >>>
> >>>
> >>
> >> Thanks
> >> Dave

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [PATCH] Avoid falling into infinite loop restart when using a problematic system