在 2018年12月26日 09:17, Dave Young 写道:
On 12/25/18 at 10:24am, lijiang wrote:
>
> 在 2018年12月24日 12:17, Dave Young 写道:
>> On 12/21/18 at 07:38pm, Dave Young wrote:
>>> On 12/21/18 at 04:47pm, Buland Singh wrote:
>>>> On 12/21/18 4:14 PM, Dave Young wrote:
>>>>> On 12/21/18 at 12:59pm, Buland Singh wrote:
>>>>>> On 12/21/18 12:29 PM, Kairui Song wrote:
>>>>>>> Hi, Dave, Lianbo
>>>>>>>
>>>>>>> My concern is that crash loop may generate tons of dump
cores, and the
>>>>>>> dump target may get filled up by dump cores,
>>>>>>> that may have larger potential risk. Else I think it's
good to leave
>>>>>>> it as it is.
>>>>>>>
>>>>>>> On Fri, Dec 21, 2018 at 2:05 PM lijiang
<lijiang(a)redhat.com> wrote:
>>>>>>>>
>>>>>>>> 在 2018年12月21日 10:49, Dave Young 写道:
>>>>>>>>> + more people
>>>>>>>>> On 12/20/18 at 04:49pm, lijiang wrote:
>>>>>>>>>> 在 2018年12月20日 13:57, Dave Young 写道:
>>>>>>>>>>> On 12/20/18 at 01:06pm, Lianbo Jiang wrote:
>>>>>>>>>>>> By default, early kdump reboots the
system after capturing the vmcore.
>>>>>>>>>>>> If the problematic system is continuously
crashing due to some issue
>>>>>>>>>>>> during early boot stage, the system may
fall into infinite loop restart
>>>>>>>>>>>> like this:
>>>>>>>>>>>>
>>>>>>>>>>>> boot -----> crash ----->
early kdump (dump vmcore)
>>>>>>>>>>>> ^ |
>>>>>>>>>>>>
'.........(reboot).............'
>>>>>>>>>>>>
>>>>>>>>>>>> But now, the system crash at early stage
is only captured by early kdump,
>>>>>>>>>>>> and the rest is captured by normal kdump.
That to say, when normal kdump
>>>>>>>>>>>> service starts, it will load it again and
override early kdump. It is
>>>>>>>>>>>> helpful to control the logic of early
kdump and normal kdump separately
>>>>>>>>>>>> in final action(it is called by
kdump-capture.service). For example,
>>>>>>>>>>>> early kdump always passes the
'rd.earlykdump' to the second kernel when
>>>>>>>>>>>> early kdump is enabled, but normal kdump
doesn't pass the 'rd.earlykdump'
>>>>>>>>>>>> to the second kernel at any time. So they
can be distinguished in the
>>>>>>>>>>>> second kernel.
>>>>>>>>>>>
>>>>>>>>>>> Hmm, I'm confused about the param passing
above.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I copy some messages from another email, please
refer to this one:
>>>>>>>>>> [--->
>>>>>>>>>> The rd.earlykdump is added to kernel command line
in grub.cfg. However, early kdump
>>>>>>>>>> and normal kdump can get the same parameters from
/proc/cmdline in the first kernel.
>>>>>>>>>>
>>>>>>>>>> Early kdump passes the rd.earlykdump to the
second kernel, but normal kdump doesn't
>>>>>>>>>> need it, normal kdump needs to remove the
rd.earlykdump.
>>>>>>>>>>
>>>>>>>>>> So which can distinguish early kdump and normal
kdump in the second kernel. It helps
>>>>>>>>>> to control the logic of kdump capture service.
For example: default action/final action.
>>>>>>>>>> ]
>>>>>>>>>
>>>>>>>>> The description is confusing, "ealy kdump passes
... to the second
>>>>>>>>> kernel", for example about this, the real thing
is one person adds the
>>>>>>>>> param in 1st kernel cmdline, kexec-tools
takes/inherits and pass to 2nd
>>>>>>>>> kernel.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes. Good point. Thanks for your explanation.
>>>>>>>>
>>>>>>>>> Anyway this is patch log issue.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Early or non early just means about the
service loading phase, in
>>>>>>>>>>
>>>>>>>>>> Yes. This patch used the same method what you
said. When normal kdump service starts,
>>>>>>>>>> it will reload. At the same time, early kdump
will be overwritten by normal kdump.
>>>>>>>>>
>>>>>>>>> Probably "early kdump load" is better than
"early kdump" in words.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> initramfs or not, I notice dracut/systemd
will print some message about
>>>>>>>>>>> they are running in initramfs, so probably
you can check how to get it
>>>>>>>>>>> with same way, if this is not initramfs then
just unload before the
>>>>>>>>>>> check in kdump loading.
>>>>>>>>>>>
>>>>>>>>>>> The picture like below:
>>>>>>>>>>>
>>>>>>>>>>> Kernel boot ->
>>>>>>>>>>>
>>>>>>>>>>> initramfs ---
>>>>>>>>>>> early kdump load
>>>>>>>>>>> ---- Mark A ----
>>>>>>>>>>> initramfs switch root
>>>>>>>>>>>
>>>>>>>>>>> system startup (real root fs)
>>>>>>>>>>> service a
>>>>>>>>>>> service b ... (eg.
networking etc.)
>>>>>>>>>>> kdump service start
>>>>>>>>>>> -----Mark B -----
>>>>>>>>>>> load kdump kernel
again
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The problem will happen between Mark A and
Mark B, during this period,
>>>>>>>>>>> there could be repeated crash ->
earlykdump_load, there might be some
>>>>>>>>>>> random crash as well since during the real
root fs service startup,
>>>>>>>>>>> for example after network is ready if some
network workload cause a
>>>>>>>>>>> panic, it maybe not 100% reproducible, so it
seems we still need to
>>>>>>>>>>> make the poweroff configurable. eg.
>>>>>>>>>>>
>>>>>>>>>>> default is poweroff, but one can choose if he
can.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes, default is poweroff for early kdump. Unless
kdump capture service
>>>>>>>>>> happens error or enters the emergency service,
one can choose the default
>>>>>>>>>> action.(configure default=xxx in kdump.conf)
>>>>>>>>>
>>>>>>>>> For default action instead of final action if you
hardcode it, then even if
>>>>>>>>> one set default as reboot it still poweroff.
>>>>>>>>>
>>>>>>>>
>>>>>>>> If really need, that can be improved.
>>>>>>>>
>>>>>>>>> [snip]
>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +check_rd_earlykdump()
>>>>>>>>>>>> +{
>>>>>>>>>>>> + egrep "rd.earlykdump"
/proc/cmdline
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> start()
>>>>>>>>>>>> {
>>>>>>>>>>>> check_dump_feasibility
>>>>>>>>>>>> @@ -969,7 +974,13 @@ start()
>>>>>>>>>>>> check_current_status
>>>>>>>>>>>> if [ $? == 0 ]; then
>>>>>>>>>>>> echo "Kdump already
running: [WARNING]"
>>>>>>>>>>>> - return 0
>>>>>>>>>>>> + check_rd_earlykdump
>>>>>>>>>>>> + #if earlykdump loaded, it will
stop and start.
>>>>>>>>>>>> + if [ $? -eq 0 ]; then
>>>>>>>>>>>> + stop
>>>>>>>>>>>
>>>>>>>>>>> kdumpctl start can run not only by system
startup services, one can also
>>>>>>>>>>> run it manually or in udev rule.
>>>>>>>>>>>
>>>>>>>>>>> The checking of kernel cmdline seems not
enough.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here it means that if kdump has beend loaded,
check whether early kdump did it.
>>>>>>>>>> If yes, let normal kdump load it again, otherwise
no need to do anything.
>>>>>>>>>
>>>>>>>>> As we discussed you do not get my points here :)
>>>>>>>>>
>>>>>>>>> check_rd_earlykdump will be always true once kernel
bootup, so there is
>>>>>>>>> no way to get the first time of normal kdump load and
the later one.
>>>>>>>>>
>>>>>>>>> The early boot time panic to address for this patch
is the 100%
>>>>>>>>> reproducible panic, for this kinds of panic admin
should be able to see
>>>>>>>>> it when he boot the machine. So rethinking about this
the best way may
>>>>>>>>> be just a wontfix.
>>>>>>>>>
>>>>>>>>> Let's consider below use cases:
>>>>>>>>>
>>>>>>>>> * First install:
>>>>>>>>>
>>>>>>>>> install os ->
>>>>>>>>> ---A
>>>>>>>>> reboot ->
>>>>>>>>> ...
>>>>>>>>> ---B
>>>>>>>>> kdump service start
>>>>>>>>> -> create kdump initrd
>>>>>>>>> ...
>>>>>>>>> boot finished
>>>>>>>>> recreate default initrd and enable early
kdump
>>>>>>>>> goto A
>>>>>>>>>
>>>>>>>>> Panic happened between A and B, if it is predictable,
eg. 100%
>>>>>>>>> reproducible, then admin should already see it, then
he/she can control
>>>>>>>>> and stop the repeating crash/kdump loop
>>>>>>>>> if the panic is not 100% reproducible then use reboot
as final action is just fine
>>>>>>>>>
>>>>>>>>> * Other use cases eg. updating kernel or some other
components:
>>>>>>>>> It is similar with the intall os use case because if
one update kernel
>>>>>>>>> or critical components it is likely they need
regenerate kdump initrd,
>>>>>>>>> and then repack it into early kdump default initrd,
in this case admin
>>>>>>>>> should be able to see the panic loop and handle it.
>>>>>>>>> Also if the panic is not 100% reproducible then we
are just fine.
>>>>>>>>>
>>>>>>>>> If we choose to split early and late kdump load,
there could be other
>>>>>>>>> side effects, and make the logic even complicated.
>>>>>>>>>
>>>>>>>>> So... the better way may be just leave it as is, and
maybe add some
>>>>>>>>> documentation.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It is a good way to document the risks that may exist.
>>>>>>>>
>>>>>>>> Just like the public transportation, we all know that it
has the risk, but we still choose it.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Dave
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Hello Dave et al.
>>>>>>
>>>>>> Kindly check the below condition, assumption and share your
thoughts.
>>>>>
>>>>> Buland, thank you for the reply.
>>>>>
>>>>>>
>>>>>> Condition:
>>>>>>
>>>>>> [0x1] System is up and running with the kernel version X.
>>>>>> [0x2] Admin performed the kernel Y upgrade.
>>>>>> [0x3] Running kernel X crashed.
>>>>>> [0x4] Normal kdump rebooted the system and captured the kernel
crash dump of the kernel X.
>>>>>> [0x5] System rebooted with the newly installed kernel Y.
>>>>>> [0x6] Let's assume that due to some unknown reason the
booting kernel Y also crashed (assume that the panic is 100% reproducible).
>>>>>> [0x7] Early kdump started dumping the kernel crash dump of the
booting kernel Y.
>>>>>> [0x8] Early kdump rebooted the system and stuck in the loop.
>>>>>>
>>>>>> Assumption: 1
>>>>>>
>>>>>> What if the problematic system is in data center and admin is not
aware of this situation?
>>>>>>
>>>>>> [0x1] The dump target will be filled with the multiple copies of
the kernel crash dump?
>>>>>>
>>>>>> [ 1 kernel crash dump of the kernel X and 'n'
kernel crash dump of the kernel Y]
>>>>>>
>>>>>> [0x2] The system will reboot in the loop?
>>>>>
>>>>> It is hard to define, for original kdump service without early
kdump
>>>>> load, it is also possible after one replaced a kernel the new kernel
panics
>>>>> during boot phase just after kdump get loaded.
>>>>>
>>>>> Admin at least should test a reboot after updating a kernel?
>>>>
>>>> Agree, but not sure if all the admins will follow this rule :)
>>>>
>>>>
>>>>>> Assumption: 2
>>>>>>
>>>>>> What if the dump target in on the local disk?
>>>>>>
>>>>>> [0x1] Admin needs to power off the system manually to retrieve
the kernel crash dump of the kernel X and Y from the resume mode.
>>>>>> [0x2] Admin needs to remove the multiple copies of the kernel
crash dump of the booting kernel Y.
>>>>>> [0x3] Admin might get confused while differentiating between the
kernel crash dump of the kernel X and Kernel Y.
>>>>>
>>>>> As for the worst case we all admit this is a problem, we are more
than
>>>>> happy to make admin be easier and fix it :)
>>>>>
>>>>> As I said we are exporting about this problem see if we have a good
>>>>> solution.
>>>>>
>>>>> But if we can assume predictable panic can be avoid then the
situation
>>>>> will be better.
>>>>
>>>> One suggestion, can we have a separate default behavior for normal kdump
and early kdump?
>>>
>>> Yes if we can.
>>>
>>> Kdump service start either early or late we just use a syscall to load
>>> another kernel/initrd into pre-reserved memory.
>>>
>>> So once kdump service started we can not differenciate kdump kernel was
>>> loaded early or late.
>>
>> But still not sure about it. As we can see for normal kdump there will
>> be similar issue existed, eg. between C and D if a reproducible panic
>> also happens every time during late boot phase
>>
>> So it is hard to define this is early kdump only, just more likely for
>> early kdump.
>>
>> ---A
>> initramfs kdump load
>> switch root
>> other services startup
>> ---B
>> kdump service start
>> ---C
>> other services start up
>> ...
>> ---D
>> boot finished
>>
>> If we consider this as a early kdump only/must-fix issue, thinking about
>> it we should split into two issues:
>>
>> 1. how to determine early load and then reload while kdumpctl start
>>
>> If "kdump" service is not active but kdump kernel loaded then
>> it should have been "early loaded". Then something will like this:
>>
>
> Thanks for your suggestion.
>
> If we executed the command: systemctl restart kdump.service and then start
kdump.service again,
> it won't exactly distinguish early kdump and normal kdump.
>
> I'm also looking for other solutions.
As we talked in meeting, I means "is not active" below pseudo code
missed a "!"
It should be good if no better way.
Previously, early kdump was called by the dracut cmdline hook, systemd doesn't know
early kdump's
status. The call trace is like this: systemd -> dracut cmdline hook -> early kdump.
Now, i plan to do some tests, and let systemd directly call early kdump like this: systemd
-> early kdump
systemd-journal.socket systemd-journal.socket
| |
v v
dracut-cmdline.service ->early kdump ------> early kdump
| |
v v
dracut-pre-udev.service dracut-cmdline.service
| |
v v
systemd-udevd.service dracut-pre-udev.service
...... |
v
systemd-udevd.service
......
So systemd can know early kdump's status, when normal kdump starts, we can check early
kdump's status by
systemctl status earlykdump.
Above two solutions, i still need to do some test.
Thanks.
>>
>>> kdumpctl start()
>>> if kdump is loaded:
>>> if systemd kdump.service is active
>>> # early loaded
>>> stop and continue to load again
>>> else
>>> print a warning service is already running and then
>>> return
>>> else
>>> go ahead to load and start
>>>
>>> 2. how to set the reboot action for early and late loading:
>>>
>>> use an cmdline like rd.earlykdump.noreboot, default value is true, if
>>> one want to use alternate he/she can add rd.earlykdump.noreboot=0
>>>
>>> Add extra config optioins is also a choice but seems much complicated,
>>> not only final action, also default action, for default action in case
>>> partially saved vmcore it still will occupy the whole disk. So seems we
>>> just should replace reboot with shutdown for any cases.
>>>
>>>>
>>>>>
>>>>> Eg:
>>>>>
>>>>> In /etc/kdump.conf file.
>>>>> default reboot (Default action to be taken by normal kdump and
early kdump if dumping fails)
>>>>> ndefault reboot (Default action to be taken once the dumping is
successful by normal kdump)
>>>>> edefault poweroff (Default action to be taken once the dumping is
successful by early kdump)
>>>>>
>>>>> Note: Create 'ndefault' & 'edefault' options
identical to 'default' option so that an admin can alter the behavior as per the
requirement.
>>>>>
>>>>> [0x1] If kernel crash dump is captured by normal kdump then
'reboot' the system (by default) or take action as per 'ndefault' value.
>>>>> [0x2] If kernel crash dump is captured by early kdump then
'poweroff' the system 9by default) or take action as per 'edefault'
value.
>>>>>
>>>>
>>>> We can introduce something so that early kdump service load and set
>>>> poweroff as default, but when normal kdump service start we have two
>>>> choices:
>>>>
>>>> 1. go ahead to use the early loaded setup without reload like we have in
>>>> the code now, in this case normal kdump will also use poweroff no matter
>>>> what is set in /etc/kdump.conf
>>>>
>>>> 2. reload kernel/initrd with normal setup in /etc/kdump.conf. In this
>>>> way, we need get if this is a late service startup. Because if we
>>>> blindly reload then it will affect later udev triggered kdump restart.
>>>>
>>>> For example manually start the service like kdumpctl start, origianlly
>>>> it will just print a msg about the service is already running. But now
>>>> the "start" == "stop, then start", this may be not a
big problem but
>>>> looks odd.
>>>>
>>>> Correct my self about the udev hotplug triggered events, it should be ok
>>>> because it will call restart that means stop then start, this change
>>>> will not affects it.
>>>>
>>>> BTW, the default action here should be "final action", there
is a
>>>> "default" action can be configured in kdump.conf which is used
for kdump
>>>> kernel to do after vmcore saving failed, it is more like a failsafe
>>>> action. But the default "default" is also "reboot".
When kdump
>>>> successfully saves a vmcore it will go to "final action"
(==reboot)
>>>> which is not configurable.
>>>>
>>>>> --
>>>>> Buland
>>>>>
>>>>>
>>>>
>>>> Thanks
>>>> Dave