Re: [PATCH] Avoid falling into infinite loop restart when using a problematic system

Tuesday, 25 December 2018

在 2018年12月26日 09:17, Dave Young 写道:
...
 On 12/25/18 at 10:24am, lijiang wrote:
>
> 在 2018年12月24日 12:17, Dave Young 写道:
>> On 12/21/18 at 07:38pm, Dave Young wrote:
>>> On 12/21/18 at 04:47pm, Buland Singh wrote:
>>>> On 12/21/18 4:14 PM, Dave Young wrote:
>>>>> On 12/21/18 at 12:59pm, Buland Singh wrote:
>>>>>> On 12/21/18 12:29 PM, Kairui Song wrote:
>>>>>>> Hi, Dave, Lianbo
>>>>>>>
>>>>>>> My concern is that crash loop may generate tons of dump
cores, and the
>>>>>>> dump target may get filled up by dump cores,
>>>>>>> that may have larger potential risk. Else I think it's
good to leave
>>>>>>> it as it is.
>>>>>>>
>>>>>>> On Fri, Dec 21, 2018 at 2:05 PM lijiang
<lijiang(a)redhat.com&gt; wrote:
>>>>>>>>
>>>>>>>> 在 2018年12月21日 10:49, Dave Young 写道:
>>>>>>>>> + more people
>>>>>>>>> On 12/20/18 at 04:49pm, lijiang wrote:
>>>>>>>>>> 在 2018年12月20日 13:57, Dave Young 写道:
>>>>>>>>>>> On 12/20/18 at 01:06pm, Lianbo Jiang wrote:
>>>>>>>>>>>> By default, early kdump reboots the
system after capturing the vmcore.
>>>>>>>>>>>> If the problematic system is continuously
crashing due to some issue
>>>>>>>>>>>> during early boot stage, the system may
fall into infinite loop restart
>>>>>>>>>>>> like this:
>>>>>>>>>>>>
>>>>>>>>>>>>       boot -----> crash ----->
early kdump (dump vmcore)
>>>>>>>>>>>>         ^                              |
>>>>>>>>>>>>        
'.........(reboot).............'
>>>>>>>>>>>>
>>>>>>>>>>>> But now, the system crash at early stage
is only captured by early kdump,
>>>>>>>>>>>> and the rest is captured by normal kdump.
That to say, when normal kdump
>>>>>>>>>>>> service starts, it will load it again and
override early kdump. It is
>>>>>>>>>>>> helpful to control the logic of early
kdump and normal kdump separately
>>>>>>>>>>>> in final action(it is called by
kdump-capture.service). For example,
>>>>>>>>>>>> early kdump always passes the
'rd.earlykdump' to the second kernel when
>>>>>>>>>>>> early kdump is enabled, but normal kdump
doesn't pass the 'rd.earlykdump'
>>>>>>>>>>>> to the second kernel at any time. So they
can be distinguished in the
>>>>>>>>>>>> second kernel.
>>>>>>>>>>>
>>>>>>>>>>> Hmm, I'm confused about the param passing
above.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I copy some messages from another email, please
refer to this one:
>>>>>>>>>> [--->
>>>>>>>>>> The rd.earlykdump is added to kernel command line
in grub.cfg. However, early kdump
>>>>>>>>>> and normal kdump can get the same parameters from
/proc/cmdline in the first kernel.
>>>>>>>>>>
>>>>>>>>>> Early kdump passes the rd.earlykdump to the
second kernel, but normal kdump doesn't
>>>>>>>>>> need it, normal kdump needs to remove the
rd.earlykdump.
>>>>>>>>>>
>>>>>>>>>> So which can distinguish early kdump and normal
kdump in the second kernel. It helps
>>>>>>>>>> to control the logic of kdump capture service.
For example: default action/final action.
>>>>>>>>>> ]
>>>>>>>>>
>>>>>>>>> The description is confusing, "ealy kdump passes
... to the second
>>>>>>>>> kernel", for example about this,  the real thing
is one person adds the
>>>>>>>>> param in 1st kernel cmdline, kexec-tools
takes/inherits and pass to 2nd
>>>>>>>>> kernel.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes. Good point. Thanks for your explanation.
>>>>>>>>
>>>>>>>>> Anyway this is patch log issue.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Early or non early just means about the
service loading phase, in
>>>>>>>>>>
>>>>>>>>>> Yes. This patch used the same method what you
said. When normal kdump service starts,
>>>>>>>>>> it will reload. At the same time, early kdump
will be overwritten by normal kdump.
>>>>>>>>>
>>>>>>>>> Probably "early kdump load" is better than
"early kdump" in words.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> initramfs or not, I notice dracut/systemd
will print some message about
>>>>>>>>>>> they are running in initramfs, so probably
you can check how to get it
>>>>>>>>>>> with same way,  if this is not initramfs then
just unload before the
>>>>>>>>>>> check in kdump loading.
>>>>>>>>>>>
>>>>>>>>>>> The picture like below:
>>>>>>>>>>>
>>>>>>>>>>> Kernel boot ->
>>>>>>>>>>>
>>>>>>>>>>>      initramfs ---
>>>>>>>>>>>           early kdump load
>>>>>>>>>>> ---- Mark A  ----
>>>>>>>>>>>      initramfs switch root
>>>>>>>>>>>
>>>>>>>>>>>           system startup (real root fs)
>>>>>>>>>>>                  service a
>>>>>>>>>>>                  service b ... (eg.
networking etc.)
>>>>>>>>>>>                  kdump service start
>>>>>>>>>>> -----Mark B -----
>>>>>>>>>>>                           load kdump kernel
again
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The problem will happen between Mark A and
Mark B, during this period,
>>>>>>>>>>> there could be repeated crash ->
earlykdump_load,  there might be some
>>>>>>>>>>> random crash as well since during the real
root fs service startup,
>>>>>>>>>>> for example after network is ready if some
network workload cause a
>>>>>>>>>>> panic, it maybe not 100% reproducible,  so it
seems we still need to
>>>>>>>>>>> make the poweroff configurable. eg.
>>>>>>>>>>>
>>>>>>>>>>> default is poweroff, but one can choose if he
can.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes, default is poweroff for early kdump. Unless
kdump capture service
>>>>>>>>>> happens error or enters the emergency service,
one can choose the default
>>>>>>>>>> action.(configure default=xxx in kdump.conf)
>>>>>>>>>
>>>>>>>>> For default action instead of final action if you
hardcode it, then even if
>>>>>>>>> one set default as reboot it still poweroff.
>>>>>>>>>
>>>>>>>>
>>>>>>>> If really need, that can be improved.
>>>>>>>>
>>>>>>>>> [snip]
>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +check_rd_earlykdump()
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    egrep "rd.earlykdump"
/proc/cmdline
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>>    start()
>>>>>>>>>>>>    {
>>>>>>>>>>>>      check_dump_feasibility
>>>>>>>>>>>> @@ -969,7 +974,13 @@ start()
>>>>>>>>>>>>      check_current_status
>>>>>>>>>>>>      if [ $? == 0 ]; then
>>>>>>>>>>>>              echo "Kdump already
running: [WARNING]"
>>>>>>>>>>>> -          return 0
>>>>>>>>>>>> +          check_rd_earlykdump
>>>>>>>>>>>> +          #if earlykdump loaded, it will
stop and start.
>>>>>>>>>>>> +          if [ $? -eq 0 ]; then
>>>>>>>>>>>> +                  stop
>>>>>>>>>>>
>>>>>>>>>>> kdumpctl start can run not only by system
startup services, one can also
>>>>>>>>>>> run it manually or in udev rule.
>>>>>>>>>>>
>>>>>>>>>>> The checking of kernel cmdline seems not
enough.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here it means that if kdump has beend loaded,
check whether early kdump did it.
>>>>>>>>>> If yes, let normal kdump load it again, otherwise
no need to do anything.
>>>>>>>>>
>>>>>>>>> As we discussed you do not get my points here :)
>>>>>>>>>
>>>>>>>>> check_rd_earlykdump will be always true once kernel
bootup, so there is
>>>>>>>>> no way to get the first time of normal kdump load and
the later one.
>>>>>>>>>
>>>>>>>>> The early boot time panic to address for this patch
is the 100%
>>>>>>>>> reproducible panic,  for this kinds of panic admin
should be able to see
>>>>>>>>> it when he boot the machine. So rethinking about this
the best way may
>>>>>>>>> be just a wontfix.
>>>>>>>>>
>>>>>>>>> Let's consider below use cases:
>>>>>>>>>
>>>>>>>>> * First install:
>>>>>>>>>
>>>>>>>>> install os ->
>>>>>>>>> ---A
>>>>>>>>>       reboot ->
>>>>>>>>>           ...
>>>>>>>>> ---B
>>>>>>>>>           kdump service start
>>>>>>>>>              -> create kdump initrd
>>>>>>>>>           ...
>>>>>>>>>           boot finished
>>>>>>>>>           recreate default initrd and enable early
kdump
>>>>>>>>>           goto A
>>>>>>>>>
>>>>>>>>> Panic happened between A and B, if it is predictable,
eg. 100%
>>>>>>>>> reproducible, then admin should already see it, then
he/she can control
>>>>>>>>> and stop the repeating crash/kdump loop
>>>>>>>>> if the panic is not 100% reproducible then use reboot
as final action is just fine
>>>>>>>>>
>>>>>>>>> * Other use cases eg. updating kernel or some other
components:
>>>>>>>>> It is similar with the intall os use case because if
one update kernel
>>>>>>>>> or critical components it is likely they need
regenerate kdump initrd,
>>>>>>>>> and then repack it into early kdump default initrd,
in this case admin
>>>>>>>>> should be able to see the panic loop and handle it.
>>>>>>>>> Also if the panic is not 100% reproducible then we
are just fine.
>>>>>>>>>
>>>>>>>>> If we choose to split early and late kdump load,
there could be other
>>>>>>>>> side effects, and make the logic even complicated.
>>>>>>>>>
>>>>>>>>> So...  the better way may be just leave it as is, and
maybe add some
>>>>>>>>> documentation.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It is a good way to document the risks that may exist.
>>>>>>>>
>>>>>>>> Just like the public transportation, we all know that it
has the risk, but we still choose it.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Dave
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Hello Dave et al.
>>>>>>
>>>>>> Kindly check the below condition, assumption and share your
thoughts.
>>>>>
>>>>> Buland, thank you for the reply.
>>>>>
>>>>>>
>>>>>> Condition:
>>>>>>
>>>>>> [0x1] System is up and running with the kernel version X.
>>>>>> [0x2] Admin performed the kernel Y upgrade.
>>>>>> [0x3] Running kernel X crashed.
>>>>>> [0x4] Normal kdump rebooted the system and captured the kernel
crash dump of the kernel X.
>>>>>> [0x5] System rebooted with the newly installed kernel Y.
>>>>>> [0x6] Let's assume that due to some unknown reason the
booting kernel Y also crashed (assume that the panic is 100% reproducible).
>>>>>> [0x7] Early kdump started dumping the kernel crash dump of the
booting kernel Y.
>>>>>> [0x8] Early kdump rebooted the system and stuck in the loop.
>>>>>>
>>>>>> Assumption: 1
>>>>>>
>>>>>> What if the problematic system is in data center and admin is not
aware of this situation?
>>>>>>
>>>>>> [0x1] The dump target will be filled with the multiple copies of
the kernel crash dump?
>>>>>>
>>>>>>        [ 1 kernel crash dump of the kernel X and 'n'
kernel crash dump of the kernel Y]
>>>>>>
>>>>>> [0x2] The system will reboot in the loop?
>>>>>
>>>>> It is hard to define,  for original kdump service without early
kdump
>>>>> load, it is also possible after one replaced a kernel the new kernel
panics
>>>>> during boot phase just after kdump get loaded.
>>>>>
>>>>> Admin at least should test a reboot after updating a kernel?
>>>>
>>>> Agree, but not sure if all the admins will follow this rule :)
>>>>
>>>>
>>>>>> Assumption: 2
>>>>>>
>>>>>> What if the dump target in on the local disk?
>>>>>>
>>>>>> [0x1] Admin needs to power off the system manually to retrieve
the kernel crash dump of the kernel X and Y from the resume mode.
>>>>>> [0x2] Admin needs to remove the multiple copies of the kernel
crash dump of the booting kernel Y.
>>>>>> [0x3] Admin might get confused while differentiating between the
kernel crash dump of the kernel X and Kernel Y.
>>>>>
>>>>> As for the worst case we all admit this is a problem,  we are more
than
>>>>> happy to make admin be easier and fix it :)
>>>>>
>>>>> As I said we are exporting about this problem see if we have a good
>>>>> solution.
>>>>>
>>>>> But if we can assume predictable panic can be avoid then the
situation
>>>>> will be better.
>>>>
>>>> One suggestion, can we have a separate default behavior for normal kdump
and early kdump?
>>>
>>> Yes if we can.
>>>
>>> Kdump service start either early or late we just use a syscall to load
>>> another kernel/initrd into pre-reserved memory.
>>>
>>> So once kdump service started we can not differenciate kdump kernel was
>>> loaded early or late.
>>
>> But still not sure about it.  As we can see for normal kdump there will
>> be similar issue existed, eg. between C and D if a reproducible panic
>> also happens every time during late boot phase
>>
>> So it is hard to define this is early kdump only, just more likely for
>> early kdump.
>>
>> ---A
>> initramfs kdump load
>>    switch root
>>       other services startup
>> ---B
>>       kdump service start
>> ---C
>>       other services start up
>>       ...
>> ---D
>>       boot finished
>>
>> If we consider this as a early kdump only/must-fix issue, thinking about
>> it we should split into two issues:
>>
>> 1. how to determine early load and then reload while kdumpctl start 
>>
>> If "kdump" service is not active but kdump kernel loaded then
>> it should have been "early loaded".  Then something will like this:
>>
>
> Thanks for your suggestion.
>
> If we executed the command: systemctl restart kdump.service and then start
kdump.service again,
> it won't exactly distinguish early kdump and normal kdump.
>
> I'm also looking for other solutions.

 As we talked in meeting, I means "is not active" below pseudo code
 missed a "!"

 It should be good if no better way.

Previously, early kdump was called by the dracut cmdline hook, systemd doesn't know
early kdump's
status. The call trace is like this: systemd -> dracut cmdline hook -> early kdump.

Now, i plan to do some tests, and let systemd directly call early kdump like this: systemd
-> early kdump

             	systemd-journal.socket                          systemd-journal.socket
                         |                                                |
                         v                                                v
                dracut-cmdline.service ->early kdump   ------>      early   kdump
                         |                                                |
                         v                                                v
                dracut-pre-udev.service                          dracut-cmdline.service
                         |                                                |
                         v                                                v
                systemd-udevd.service                            dracut-pre-udev.service
			......						  |
								          v
								 systemd-udevd.service
									......
So systemd can know early kdump's status, when normal kdump starts, we can check early
kdump's status by
systemctl status earlykdump.

Above two solutions, i still need to do some test.

Thanks.

>>
>>> kdumpctl start()
>>>   if kdump is loaded:
>>>   	if systemd kdump.service is active
>>> 		# early loaded
>>> 		stop and continue to load again
>>> 	else
>>> 		print a warning service is already running and then
>>> 		return
>>>   else
>>> 	go ahead to load and start
>>>
>>> 2. how to set the reboot action for early and late loading:
>>>
>>> use an cmdline like rd.earlykdump.noreboot,  default value is true, if
>>> one want to use alternate he/she can add rd.earlykdump.noreboot=0 
>>>
>>> Add extra config optioins is also a choice but seems much complicated,
>>> not only final action, also default action,  for default action in case
>>> partially saved vmcore it still will occupy the whole disk.  So seems we
>>> just should replace reboot with shutdown for any cases.
>>>
>>>>
>>>>>
>>>>> Eg:
>>>>>
>>>>> In /etc/kdump.conf file.
>>>>> default  reboot     (Default action to be taken by normal kdump and
early kdump if dumping fails)
>>>>> ndefault reboot     (Default action to be taken once the dumping is
successful by normal kdump)
>>>>> edefault poweroff   (Default action to be taken once the dumping is
successful by early kdump)
>>>>>
>>>>> Note: Create 'ndefault' & 'edefault' options
identical to 'default' option so that an admin can alter the behavior as per the
requirement.
>>>>>
>>>>> [0x1] If kernel crash dump is captured by normal kdump then
'reboot' the system (by default) or take action as per 'ndefault' value.
>>>>> [0x2] If kernel crash dump is captured by early kdump then
'poweroff' the system 9by default) or take action as per 'edefault'
value.
>>>>>
>>>>
>>>> We can introduce something so that early kdump service load and set
>>>> poweroff as default,  but when normal kdump service start we have two
>>>> choices:
>>>>
>>>> 1. go ahead to use the early loaded setup without reload like we have in
>>>> the code now, in this case normal kdump will also use poweroff no matter
>>>> what is set in /etc/kdump.conf
>>>>
>>>> 2. reload kernel/initrd with normal setup in /etc/kdump.conf.  In this
>>>> way, we need get if this is a late service startup.  Because if we
>>>> blindly reload then it will affect later udev triggered kdump restart.
>>>>
>>>> For example manually start the service like kdumpctl start, origianlly
>>>> it will just print a msg about the service is already running.  But now
>>>> the "start" == "stop, then start", this may be not a
big problem but
>>>> looks odd.
>>>>
>>>> Correct my self about the udev hotplug triggered events, it should be ok
>>>> because it will call restart that means stop then start, this change
>>>> will not affects it.
>>>>
>>>> BTW, the default action here should be "final action",  there
is a
>>>> "default" action can be configured in kdump.conf which is used
for kdump
>>>> kernel to do after vmcore saving failed, it is more like a failsafe
>>>> action.  But the default "default" is also "reboot". 
When kdump
>>>> successfully saves a vmcore it will go to "final action"
(==reboot)
>>>> which is not configurable.
>>>>
>>>>> --
>>>>> Buland
>>>>>
>>>>>
>>>>
>>>> Thanks
>>>> Dave

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [PATCH] Avoid falling into infinite loop restart when using a problematic system