Hi,
Starting here: http://fedoraproject.org/wiki/KernelRegressionTestGuidelines I grab the current test from git and run as root 'sh runtests.sh -t stress' and I experience the following, each of which is confusing so I don't know if it's expected behavior, or a bug, or what to do with this information if anything. The kernel is 4.4.0-1.fc24.x86_64, on otherwise updated Fedora 23 systems (an old Mac and a new NUC).
1. One system, dropcaches fails non-deterministically. I can't tell what the pattern is. When it fails the log reports: Starting test ./default/cachedrop 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 0.0516614 s, 2.0 GB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.00891013 s, 1.2 GB/s TestError: Can't free dentries and inodes and pagecache 484736 484724 Could not run tests
2. The sysfs-perms test always fails on both systems: Starting test ./default/sysfs-perms Found world-writable files in sysfs. ./runtest.sh: line 9: ignore-files.sh: command not found
3. There are quit a few selinux AVC denials during the selinux DAC test, but at about the same time I also see these segfaults. Are they expected? [128460.313903] anonmap[12806]: segfault at 7fabf1da4000 ip 00007fabf1da4000 sp 00007fff3ab17538 error 15 [128460.936435] execbss[12811]: segfault at 6020b0 ip 00000000006020b0 sp 00007fffaeade218 error 15 in execbss[602000+1000] [128461.135687] execdata[12819]: segfault at 6020a0 ip 00000000006020a0 sp 00007ffecc006378 error 15 in execdata[602000+1000] [128461.309108] execheap[12827]: segfault at eea130 ip 0000000000eea130 sp 00007ffe8f2f9ae8 error 15 [128461.502195] execstack[12835]: segfault at 7fff102b3810 ip 00007fff102b3810 sp 00007fff102b3808 error 15 [128461.701582] shlibbss[12840]: segfault at 7f0b168a4060 ip 00007f0b168a4060 sp 00007ffc0c983188 error 15 in shlibtest2.so[7f0b168a3000+2000] [128461.903294] shlibdata[12846]: segfault at 7f3c0ae25040 ip 00007f3c0ae25040 sp 00007ffd7752ee38 error 15 in shlibtest2.so[7f3c0ae25000+2000] [128462.192413] mprotheap[12862]: segfault at 2555130 ip 0000000002555130 sp 00007ffd64752c88 error 15
4. The stress test log file on both systems is 95M and the upload page won't accept that, so I take it only the minimal and default tests are uploadable and the stress test isn't something that's really interesting to the kernel team? The browser message when trying to upload is: Request Entity Too Large The data value transmitted exceeds the capacity limit. Even if it's bzip'd, it's 3+MB and that's still too big for the web interface.
5. fedora_submit.py fails: [chris@f23m kernel-tests]$ python fedora_submit.py -u chrismurphy -p pw -l logs/kernel-test-1452835548.log.txt Traceback (most recent call last): File "fedora_submit.py", line 45, in <module> password=password File "/usr/lib/python2.7/site-packages/fedora/client/openidbaseclient.py", line 283, in login openid_insecure=self.openid_insecure) File "/usr/lib/python2.7/site-packages/fedora/client/openidproxyclient.py", line 138, in openid_login raise AuthError(output['message']) fedora.client.AuthError: Authentication failed
No idea what to do with that. Uploading this file (default sized) through the web interface does work.
6. Both systems skip the module signing test. Starting test ./default/modsign Module signing not enabled Could not run tests
This makes sense on the EFI system that doesn't support secure boot, but the other one does. $ mokutil --sb-state SecureBoot enabled So... why is module signing not enabled?
7. The -t stress option, testing hasn't completed after 15+ hours on both systems, and both were unresponsive to local and remote login so I ended up hard resetting them. How long should it run and are there some tests where it's expected the system is unresponsive for more than an hour at a time?
For one machine, the kernel-test log modification time was ~ 6 hours older than the time of the hard reset, so the system may have just locked up. Both journals are unrevealing, they lack any entries for those last 6 or more hours (I'm somewhat regularly hitting bug 1295612, so I've started running rsyslog as of today to see if it'll write out what's either not written to the journal or is getting corrupted and can't be viewed by journalctl). The last 10 lines of one kernel-test log look like this: ipcrm -m 720896 ipcrm -m 229377 ipcrm -m 917510 ipcrm -m 720896 ipcrm -m 229377 ipcrm -m 917510 ipc_str complete ipcrm -m 720896 ipcrm -m 229377 ipcrm -m 917510
Thanks,
Chris Murphy
On Fri, Jan 15, 2016 at 1:52 PM, Chris Murphy chrismurphy@fedoraproject.org wrote:
Hi,
Starting here: http://fedoraproject.org/wiki/KernelRegressionTestGuidelines I grab the current test from git and run as root 'sh runtests.sh -t stress' and I experience the following, each of which is confusing so I don't know if it's expected behavior, or a bug, or what to do with this information if anything. The kernel is 4.4.0-1.fc24.x86_64, on otherwise updated Fedora 23 systems (an old Mac and a new NUC).
- One system, dropcaches fails non-deterministically. I can't tell what the pattern is. When it fails the log reports:
Starting test ./default/cachedrop 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 0.0516614 s, 2.0 GB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.00891013 s, 1.2 GB/s TestError: Can't free dentries and inodes and pagecache 484736 484724 Could not run tests
This one does fail rather interestingly, it seems to be a race condition that we have not tracked down just yet. We used to see it frequently on the autotest guests, and it has gone away, I don't remember the last time I saw it. I know we increased the memory on the guests. I will see if I can track it down.
- The sysfs-perms test always fails on both systems:
Starting test ./default/sysfs-perms Found world-writable files in sysfs. ./runtest.sh: line 9: ignore-files.sh: command not found
Do you have an ignore-files.sh in the default/sysfs-perms/ subdir of the checkout? It should be there.
- There are quit a few selinux AVC denials during the selinux DAC test, but at about the same time I also see these segfaults. Are they expected?
[128460.313903] anonmap[12806]: segfault at 7fabf1da4000 ip 00007fabf1da4000 sp 00007fff3ab17538 error 15 [128460.936435] execbss[12811]: segfault at 6020b0 ip 00000000006020b0 sp 00007fffaeade218 error 15 in execbss[602000+1000] [128461.135687] execdata[12819]: segfault at 6020a0 ip 00000000006020a0 sp 00007ffecc006378 error 15 in execdata[602000+1000] [128461.309108] execheap[12827]: segfault at eea130 ip 0000000000eea130 sp 00007ffe8f2f9ae8 error 15 [128461.502195] execstack[12835]: segfault at 7fff102b3810 ip 00007fff102b3810 sp 00007fff102b3808 error 15 [128461.701582] shlibbss[12840]: segfault at 7f0b168a4060 ip 00007f0b168a4060 sp 00007ffc0c983188 error 15 in shlibtest2.so[7f0b168a3000+2000] [128461.903294] shlibdata[12846]: segfault at 7f3c0ae25040 ip 00007f3c0ae25040 sp 00007ffd7752ee38 error 15 in shlibtest2.so[7f3c0ae25000+2000] [128462.192413] mprotheap[12862]: segfault at 2555130 ip 0000000002555130 sp 00007ffd64752c88 error 15
Expected and correct, this means the test is working.
- The stress test log file on both systems is 95M and the upload page won't accept that, so I take it only the minimal and default tests are uploadable and the stress test isn't something that's really interesting to the kernel team? The browser message when trying to upload is: Request Entity Too Large The data value transmitted exceeds the capacity limit. Even if it's bzip'd, it's 3+MB and that's still too big for the web interface.
The upload log system is not meant to accept stress logs. The stress test never actually ends, and these logs can get huge. It doesn't mean we are not interested in your results if you see something change or wrong here, but we have to keep the web interface manageable, which means pretty much default.
- fedora_submit.py fails:
[chris@f23m kernel-tests]$ python fedora_submit.py -u chrismurphy -p pw -l logs/kernel-test-1452835548.log.txt Traceback (most recent call last): File "fedora_submit.py", line 45, in <module> password=password File "/usr/lib/python2.7/site-packages/fedora/client/openidbaseclient.py", line 283, in login openid_insecure=self.openid_insecure) File "/usr/lib/python2.7/site-packages/fedora/client/openidproxyclient.py", line 138, in openid_login raise AuthError(output['message']) fedora.client.AuthError: Authentication failed
No idea what to do with that. Uploading this file (default sized) through the web interface does work.
This seems to be a problem with the openidproxyclient that has changed recently. I will look into this next week.
- Both systems skip the module signing test.
Starting test ./default/modsign Module signing not enabled Could not run tests
This makes sense on the EFI system that doesn't support secure boot, but the other one does. $ mokutil --sb-state SecureBoot enabled So... why is module signing not enabled?
This is not actually dependent on secureboot at all. It should be fixed now (git pull). Thanks for the report!
- The -t stress option, testing hasn't completed after 15+ hours on both systems, and both were unresponsive to local and remote login so I ended up hard resetting them. How long should it run and are there some tests where it's expected the system is unresponsive for more than an hour at a time?
For one machine, the kernel-test log modification time was ~ 6 hours older than the time of the hard reset, so the system may have just locked up. Both journals are unrevealing, they lack any entries for those last 6 or more hours (I'm somewhat regularly hitting bug 1295612, so I've started running rsyslog as of today to see if it'll write out what's either not written to the journal or is getting corrupted and can't be viewed by journalctl). The last 10 lines of one kernel-test log look like this: ipcrm -m 720896 ipcrm -m 229377 ipcrm -m 917510 ipcrm -m 720896 ipcrm -m 229377 ipcrm -m 917510 ipc_str complete ipcrm -m 720896 ipcrm -m 229377 ipcrm -m 917510
The stress test is a stress test, though it shouldn't actually crash the machine, it should just run forever. If it is crashing, it would be good to find out why.
I appreciate your feedback, it is responsible for getting the modsign test fixed, and will remind me to look into the openidproxyclient bit next week. Please let me know if you run across anything else.
Justin
Do you have an ignore-files.sh in the default/sysfs-perms/ subdir of the checkout? It should be there.
Yes. Is the line malformed? It looks different than line 5 which works.
5 COUNT=$(find /sys -type f -perm 666 | ./ignore-files.sh | wc -l)
9 find /sys -type f -perm 666 | ignore-files.sh
I'm going to guess the script is looking in kernel-tests for ignore-files.sh because of this, rather than in the same directory as runtest.sh. If I run the command manually:
[root@f23s ~]# find /sys -type f -perm 666 | /home/chris/kernel-tests/default/sysfs-perms/ignore-files.sh /sys/kernel/debug/btrfs/test
So it looks like btrfs/test might need to be added to ignore-files.sh and line 9 should be ./ignore-files.sh (?)
The stress test is a stress test, though it shouldn't actually crash the machine, it should just run forever. If it is crashing, it would be good to find out why.
A clue may get written into /var/log/messages. I'll let it run overnight and check in the afternoon tomorrow, right now it's cooperating rather well.
I appreciate your feedback, it is responsible for getting the modsign test fixed, and will remind me to look into the openidproxyclient bit next week. Please let me know if you run across anything else.
Sure thing.
Chris Murphy
Yes. Is the line malformed? It looks different than line 5 which works.
5 COUNT=$(find /sys -type f -perm 666 | ./ignore-files.sh | wc -l) 9 find /sys -type f -perm 666 | ignore-files.shI'm going to guess the script is looking in kernel-tests for ignore-files.sh because of this, rather than in the same directory as runtest.sh. If I run the command manually:
[root@f23s ~]# find /sys -type f -perm 666 | /home/chris/kernel-tests/default/sysfs-perms/ignore-files.sh /sys/kernel/debug/btrfs/test
So it looks like btrfs/test might need to be added to ignore-files.sh and line 9 should be ./ignore-files.sh (?)
I tried this change on a 2nd system: - find /sys -type f -perm 666 | ignore-files.s + find /sys -type f -perm 666 | ./ignore-files.s
And I no longer get the './runtest.sh: line 9: ignore-files.sh: command not found' message. But I do get an extra result compared to the 1st system:
[root@f23m sysfs-perms]# sh runtest.sh Found world-writable files in sysfs. /sys/kernel/debug/ieee80211/phy0/rc/fixed_rate_idx /sys/kernel/debug/btrfs/test
Chris Murphy
Interesting, I will look into both of those on Monday.
Thanks, Justin
On Sat, Jan 16, 2016 at 11:08 AM, Chris Murphy chrismurphy@fedoraproject.org wrote:
Yes. Is the line malformed? It looks different than line 5 which works.
5 COUNT=$(find /sys -type f -perm 666 | ./ignore-files.sh | wc -l) 9 find /sys -type f -perm 666 | ignore-files.shI'm going to guess the script is looking in kernel-tests for ignore-files.sh because of this, rather than in the same directory as runtest.sh. If I run the command manually:
[root@f23s ~]# find /sys -type f -perm 666 | /home/chris/kernel-tests/default/sysfs-perms/ignore-files.sh /sys/kernel/debug/btrfs/test
So it looks like btrfs/test might need to be added to ignore-files.sh and line 9 should be ./ignore-files.sh (?)
I tried this change on a 2nd system:
find /sys -type f -perm 666 | ignore-files.s
find /sys -type f -perm 666 | ./ignore-files.sAnd I no longer get the './runtest.sh: line 9: ignore-files.sh: command not found' message. But I do get an extra result compared to the 1st system:
[root@f23m sysfs-perms]# sh runtest.sh Found world-writable files in sysfs. /sys/kernel/debug/ieee80211/phy0/rc/fixed_rate_idx /sys/kernel/debug/btrfs/test
Chris Murphy _______________________________________________ kernel mailing list kernel@lists.fedoraproject.org http://lists.fedoraproject.org/admin/lists/kernel@lists.fedoraproject.org
kernel@lists.fedoraproject.org