On Thu, Feb 28, 2019 at 08:41:53PM +0100, Gerhard Wiesinger wrote:
> On 28.02.2019 10:35, Maxime Ripard wrote:
>> On Wed, Feb 27, 2019 at 07:58:14PM +0100, Gerhard Wiesinger wrote:
>>> On 27.02.2019 10:20, Maxime Ripard wrote:
>>>> On Sun, Feb 24, 2019 at 09:04:57AM +0100, Gerhard Wiesinger wrote:
>>>>> Hello,
>>>>>
>>>>> I've 3 Banana Pi R1, one running with self compiled kernel
>>>>> 4.7.4-200.BPiR1.fc24.armv7hl and old Fedora 25 which is VERY STABLE,
the 2
>>>>> others are running with Fedora 29 latest, kernel
4.20.10-200.fc29.armv7hl. I
>>>>> tried a lot of kernels between of around 4.11
>>>>> (kernel-4.11.10-200.fc25.armv7hl) until 4.20.10 but all had crashes
without
>>>>> any output on the serial console or kernel panics after a short time
of
>>>>> period (minutes, hours, max. days)
>>>>>
>>>>> Latest known working and stable self compiled kernel: kernel
>>>>> 4.7.4-200.BPiR1.fc24.armv7hl:
>>>>>
>>>>>
https://www.wiesinger.com/opensource/fedora/kernel/BananaPi-R1/
>>>>>
>>>>> With 4.8.x the DSA b53 switch infrastructure has been introduced
which
>>>>> didn't work (until ca8931948344c485569b04821d1f6bcebccd376b and
kernel
>>>>> 4.18.x):
>>>>>
>>>>>
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/dri...
>>>>>
>>>>>
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/driv...
>>>>>
>>>>>
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/d...
>>>>>
>>>>> I has been fixed with kernel 4.18.x:
>>>>>
>>>>>
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/driv...
>>>>>
>>>>>
>>>>> So current status is, that kernel crashes regularly, see some samples
below.
>>>>> It is typically a "Unable to handle kernel paging request at
virtual addres"
>>>>>
>>>>> Another interesting thing: A Banana Pro works well (which has also
an
>>>>> Allwinner A20 in the same revision) running same Fedora 29 and
latest
>>>>> kernels (e.g. kernel 4.20.10-200.fc29.armv7hl.).
>>>>>
>>>>> Since it happens on 2 different devices and with different power
supplies
>>>>> (all with enough power) and also the same type which works well on
the
>>>>> working old kernel) a hardware issue is very unlikely.
>>>>>
>>>>> I guess it has something to do with virtual memory.
>>>>>
>>>>> Any ideas?
>>>>> [47322.960193] Unable to handle kernel paging request at virtual
addres 5675d0
>>>> That line is a bit suspicious
>>>>
>>>> Anyway, cpufreq is known to cause those kind of errors when the
>>>> voltage / frequency association is not correct.
>>>>
>>>> Given the stack trace and that the BananaPro doesn't have cpufreq
>>>> enabled, my first guess would be that it's what's happening.
Could you
>>>> try using the performance governor and see if it's more stable?
>>>>
>>>> If it is, then using this:
>>>>
https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test
>>>>
>>>> will help you find the offending voltage-frequency couple.
>>> For me it looks like they have all the same config regarding cpu governor
>>> (Banana Pro, old kernel stable one, new kernel unstable ones)
>> The Banana Pro doesn't have a regulator set up, so it will only change
>> the frequency, not the voltage.
>>
>>> They all have the ondemand governor set:
>>>
>>> I set on the 2 unstable "new kernel Banana Pi R1":
>>>
>>> # Set to max performance
>>> echo "performance" >
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>>> echo "performance" >
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
>> What are the results?
> Stable since more than around 1,5 days. Normally they have been crashed for
> such a long uptime. So it looks that the performance governor fixes it.
>
> I guess crashes occour because of changing CPU voltage and clock changes and
> invalid data (e.g. also invalid RAM contents might be read, register
> problems, etc).
>
> Any ideas how to fix it for ondemand mode, too?
Run
https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test
> But it doesn't explaing that it works with kernel 4.7.4 without any
> problems.
My best guess would be that cpufreq wasn't enabled at that time, or
without voltage scaling.
Where can I see the voltage scaling parameters?
on DTS I don't see any difference between kernel 4.7.4 and 4.20.10
regarding voltage:
dtc -I dtb -O dts -o
/boot/dtb-4.20.10-200.fc29.armv7hl/sun7i-a20-lamobo-r1.dts
/boot/dtb-4.20.10-200.fc29.armv7hl/sun7i-a20-lamobo-r1.dtb
There is another strange thing (tested with
kernel-5.0.0-0.rc8.git1.1.fc31.armv7hl, kernel-4.19.8-300.fc29.armv7hl,
kernel-4.20.13-200.fc29.armv7hl, kernel-4.20.10-200.fc29.armv7hl):
There is ALWAYS high CPU of around 10% in kworker:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18722 root 20 0 0 0 0 I 9.5 0.0 0:47.52
[kworker/1:3-events_freezable_power_]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
776 root 20 0 0 0 0 I 8.6 0.0 0:02.77
[kworker/0:4-events]
Therefore CPU doesn't switch to low frequencies (see below).
Any ideas?
BTW: Still stable at aboout 2,5days on both devices. So solution IS the
performance governor.
Ciao,
Gerhard
================================================================================================================================================================
# monitor frequency
while true; do echo "========================================"; echo -n
"CPU_FREQ0: "; cat
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq; echo -n
"CPU_FREQ1: "; cat
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_cur_freq; sleep 1; done
================================================================================================================================================================
# Kernel 4.7.4:
========================================
CPU_FREQ0: 144000
CPU_FREQ1: 144000
========================================
CPU_FREQ0: 144000
CPU_FREQ1: 144000
========================================
CPU_FREQ0: 144000
CPU_FREQ1: 144000
========================================
# Kernel 4.20.10
========================================
CPU_FREQ0: 864000
CPU_FREQ1: 720000
========================================
CPU_FREQ0: 960000
CPU_FREQ1: 960000
========================================
CPU_FREQ0: 960000
CPU_FREQ1: 960000
========================================
CPU_FREQ0: 144000
CPU_FREQ1: 144000
========================================
CPU_FREQ0: 720000
CPU_FREQ1: 960000
========================================
CPU_FREQ0: 960000
CPU_FREQ1: 864000
========================================
CPU_FREQ0: 720000
CPU_FREQ1: 864000
========================================
CPU_FREQ0: 528000
CPU_FREQ1: 864000