On 05.03.2019 10:28, Maxime Ripard wrote:
On Sat, Mar 02, 2019 at 09:42:08AM +0100, Gerhard Wiesinger wrote:
> On 01.03.2019 10:30, Maxime Ripard wrote:
>> On Thu, Feb 28, 2019 at 08:41:53PM +0100, Gerhard Wiesinger wrote:
>>> On 28.02.2019 10:35, Maxime Ripard wrote:
>>>> On Wed, Feb 27, 2019 at 07:58:14PM +0100, Gerhard Wiesinger wrote:
>>>>> On 27.02.2019 10:20, Maxime Ripard wrote:
>>>>>> On Sun, Feb 24, 2019 at 09:04:57AM +0100, Gerhard Wiesinger
wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I've 3 Banana Pi R1, one running with self compiled
kernel
>>>>>>> 4.7.4-200.BPiR1.fc24.armv7hl and old Fedora 25 which is VERY
STABLE, the 2
>>>>>>> others are running with Fedora 29 latest, kernel
4.20.10-200.fc29.armv7hl. I
>>>>>>> tried a lot of kernels between of around 4.11
>>>>>>> (kernel-4.11.10-200.fc25.armv7hl) until 4.20.10 but all had
crashes without
>>>>>>> any output on the serial console or kernel panics after a
short time of
>>>>>>> period (minutes, hours, max. days)
>>>>>>>
>>>>>>> Latest known working and stable self compiled kernel: kernel
>>>>>>> 4.7.4-200.BPiR1.fc24.armv7hl:
>>>>>>>
>>>>>>>
https://www.wiesinger.com/opensource/fedora/kernel/BananaPi-R1/
>>>>>>>
>>>>>>> With 4.8.x the DSA b53 switch infrastructure has been
introduced which
>>>>>>> didn't work (until
ca8931948344c485569b04821d1f6bcebccd376b and kernel
>>>>>>> 4.18.x):
>>>>>>>
>>>>>>>
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/dri...
>>>>>>>
>>>>>>>
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/driv...
>>>>>>>
>>>>>>>
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/d...
>>>>>>>
>>>>>>> I has been fixed with kernel 4.18.x:
>>>>>>>
>>>>>>>
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/driv...
>>>>>>>
>>>>>>>
>>>>>>> So current status is, that kernel crashes regularly, see some
samples below.
>>>>>>> It is typically a "Unable to handle kernel paging
request at virtual addres"
>>>>>>>
>>>>>>> Another interesting thing: A Banana Pro works well (which has
also an
>>>>>>> Allwinner A20 in the same revision) running same Fedora 29
and latest
>>>>>>> kernels (e.g. kernel 4.20.10-200.fc29.armv7hl.).
>>>>>>>
>>>>>>> Since it happens on 2 different devices and with different
power supplies
>>>>>>> (all with enough power) and also the same type which works
well on the
>>>>>>> working old kernel) a hardware issue is very unlikely.
>>>>>>>
>>>>>>> I guess it has something to do with virtual memory.
>>>>>>>
>>>>>>> Any ideas?
>>>>>>> [47322.960193] Unable to handle kernel paging request at
virtual addres 5675d0
>>>>>> That line is a bit suspicious
>>>>>>
>>>>>> Anyway, cpufreq is known to cause those kind of errors when the
>>>>>> voltage / frequency association is not correct.
>>>>>>
>>>>>> Given the stack trace and that the BananaPro doesn't have
cpufreq
>>>>>> enabled, my first guess would be that it's what's
happening. Could you
>>>>>> try using the performance governor and see if it's more
stable?
>>>>>>
>>>>>> If it is, then using this:
>>>>>>
https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test
>>>>>>
>>>>>> will help you find the offending voltage-frequency couple.
>>>>> For me it looks like they have all the same config regarding cpu
governor
>>>>> (Banana Pro, old kernel stable one, new kernel unstable ones)
>>>> The Banana Pro doesn't have a regulator set up, so it will only
change
>>>> the frequency, not the voltage.
>>>>
>>>>> They all have the ondemand governor set:
>>>>>
>>>>> I set on the 2 unstable "new kernel Banana Pi R1":
>>>>>
>>>>> # Set to max performance
>>>>> echo "performance" >
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>>>>> echo "performance" >
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
>>>> What are the results?
>>> Stable since more than around 1,5 days. Normally they have been crashed for
>>> such a long uptime. So it looks that the performance governor fixes it.
>>>
>>> I guess crashes occour because of changing CPU voltage and clock changes and
>>> invalid data (e.g. also invalid RAM contents might be read, register
>>> problems, etc).
>>>
>>> Any ideas how to fix it for ondemand mode, too?
>> Run
https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test
>>
>>> But it doesn't explaing that it works with kernel 4.7.4 without any
>>> problems.
>> My best guess would be that cpufreq wasn't enabled at that time, or
>> without voltage scaling.
>>
> Where can I see the voltage scaling parameters?
>
> on DTS I don't see any difference between kernel 4.7.4 and 4.20.10 regarding
> voltage:
>
> dtc -I dtb -O dts -o
> /boot/dtb-4.20.10-200.fc29.armv7hl/sun7i-a20-lamobo-r1.dts
> /boot/dtb-4.20.10-200.fc29.armv7hl/sun7i-a20-lamobo-r1.dtb
This can be also due to configuration being changed, driver support, etc.
Where will the voltages for scaling then be set in detail (drivers, etc.)?
> There is another strange thing (tested with
> kernel-5.0.0-0.rc8.git1.1.fc31.armv7hl, kernel-4.19.8-300.fc29.armv7hl,
> kernel-4.20.13-200.fc29.armv7hl, kernel-4.20.10-200.fc29.armv7hl):
>
> There is ALWAYS high CPU of around 10% in kworker:
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 18722 root 20 0 0 0 0 I 9.5 0.0 0:47.52
> [kworker/1:3-events_freezable_power_]
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 776 root 20 0 0 0 0 I 8.6 0.0 0:02.77
> [kworker/0:4-events]
The first one looks like it's part of the workqueue code.
Any guessed reason for that?
> Therefore CPU doesn't switch to low frequencies (see below).
You said previously that those crashes were happening when the board
was changing frequency, so I'm confused?
For the ondemand setting: due to the high load of kworker, the frequency
is not changing often to lower values (but does some time and crashes
also regularly)
For the performance setting: frequency is fixed (to maximum in the
current configuration) and is stable
> Any ideas?
Run the cpustress program I told you to use already twice.
Had no time to try it yet. Will do. See also my comment below regarding
idle CPU and high CPU.
> BTW: Still stable at aboout 2,5days on both devices. So solution IS the
> performance governor.
No, the performance governor prevents any change in frequency. My
guess is that a lower frequency operating point is not working and is
crashing the CPU.
Yes, there might at least 2 scenarios:
1.) Frequency switching itself is the problem
2.) lower frequency/voltage operating points are not stable.
For both scenarios: it might be possible that the crash happens on idle
CPU, high CPU load or just randomly. Therefore just "waiting" might be
better than 100% CPU utilization.But will test also 100% CPU.
Therefore it would be good to see where the voltages for different
frequencies for the SoC are defined (to compare).
I'm currently testing 2 different settings on the 2 new Banana Pi R1
with newest kernel (see below), so 2 static frequencies:
# Set to specific frequency 144000 (currently testing on Banana Pi R1 #1)
# Set to specific frequency 312000 (currently testing on Banana Pi R1 #2)
If that's fine I'll test also further frequencies (with different loads).
Thnx.
Ciao,
Gerhard
# Set to max performance (stable)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "144000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "144000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
# Set to ondemand (not stable)
echo "ondemand" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "ondemand" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "144000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "144000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
# Set to specific frequency 144000 (currently testing on Banana Pi R1 #1)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "144000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "144000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "144000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "144000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
# Set to specific frequency 312000 (currently testing on Banana Pi R1 #2)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "312000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "312000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "312000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "312000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
# Set to specific frequency 528000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "528000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "528000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "528000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "528000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
# Set to specific frequency 720000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "720000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "720000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "720000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "720000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
# Set to specific frequency 864000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "864000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "864000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "864000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "864000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
# Set to specific frequency 912000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "912000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "912000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "912000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "912000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq
# Set to specific frequency 960000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "960000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "960000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq