Hi Adam,
Thanks for your response.
On Wed, Jul 4, 2018 at 4:33 PM, Adam Williamson adamwill@fedoraproject.org wrote:
On Thu, 2018-06-28 at 13:22 -0400, Jeff Backus wrote:
Hi folks,
I'm going to take a whack at triaging x86-related bugs. I see a few
listed
on the x86 tracker bug and the x86 exclude arch bug, which I'll take a closer look at.
Before I start searching for anything i686-related, are there any issues you'd like us to take a look at first?
Rawhide's been completely broken on x86 since...about 20180611.n.0, definitely 20180627.n.0. Installer images don't seem to make it out of dracut. I haven't had time to look into the problem in any detail, but all the openQA tests are failing, all the time.
Yeah, based on my cursory study of the logs, looks like the 6/10 image started failing for some form of core dump during initial boot that affected at least the Workstation image, and by 6/27 all i686 images were affected. I'm going to see if I can narrow it down.
jeff
On Thu, Jul 5, 2018 at 12:56 PM, Jeff Backus jeff.backus@gmail.com wrote:
Hi Adam,
Thanks for your response.
On Wed, Jul 4, 2018 at 4:33 PM, Adam Williamson < adamwill@fedoraproject.org> wrote:
On Thu, 2018-06-28 at 13:22 -0400, Jeff Backus wrote:
Hi folks,
I'm going to take a whack at triaging x86-related bugs. I see a few
listed
on the x86 tracker bug and the x86 exclude arch bug, which I'll take a closer look at.
Before I start searching for anything i686-related, are there any issues you'd like us to take a look at first?
Rawhide's been completely broken on x86 since...about 20180611.n.0, definitely 20180627.n.0. Installer images don't seem to make it out of dracut. I haven't had time to look into the problem in any detail, but all the openQA tests are failing, all the time.
Yeah, based on my cursory study of the logs, looks like the 6/10 image started failing for some form of core dump during initial boot that affected at least the Workstation image, and by 6/27 all i686 images were affected. I'm going to see if I can narrow it down.
I was able to get to a dracut prompt with the 6/30 image. Looks like the udev Kernel Device Manager is trying to core q dump? I'm seeing this message several times in the log after trying to start systemd-udevd.service: systemd-coredump[1663]: Failed to connect to coredump service: No such file or directory
Interestingly, I am seeing the following right after attempting to start systemd-udevd: Assertion 'clock_gettime(map_clock_id(clock_id), &ts) == 0' failed at ../src/basic/time-util.c:53, function now(). Abortion.
I'll try to get the log off of the machine. Thoughts or suggestions?
Thanks! jeff
On Thu, Jul 05, 2018 at 01:55:46PM -0400, Jeff Backus wrote:
On Thu, Jul 5, 2018 at 12:56 PM, Jeff Backus jeff.backus@gmail.com wrote:
Hi Adam,
Thanks for your response.
On Wed, Jul 4, 2018 at 4:33 PM, Adam Williamson < adamwill@fedoraproject.org> wrote:
On Thu, 2018-06-28 at 13:22 -0400, Jeff Backus wrote:
Hi folks,
I'm going to take a whack at triaging x86-related bugs. I see a few
listed
on the x86 tracker bug and the x86 exclude arch bug, which I'll take a closer look at.
Before I start searching for anything i686-related, are there any issues you'd like us to take a look at first?
Rawhide's been completely broken on x86 since...about 20180611.n.0, definitely 20180627.n.0. Installer images don't seem to make it out of dracut. I haven't had time to look into the problem in any detail, but all the openQA tests are failing, all the time.
Yeah, based on my cursory study of the logs, looks like the 6/10 image started failing for some form of core dump during initial boot that affected at least the Workstation image, and by 6/27 all i686 images were affected. I'm going to see if I can narrow it down.
I was able to get to a dracut prompt with the 6/30 image. Looks like the udev Kernel Device Manager is trying to core q dump? I'm seeing this message several times in the log after trying to start systemd-udevd.service: systemd-coredump[1663]: Failed to connect to coredump service: No such file or directory
Interestingly, I am seeing the following right after attempting to start systemd-udevd: Assertion 'clock_gettime(map_clock_id(clock_id), &ts) == 0' failed at ../src/basic/time-util.c:53, function now(). Abortion.
I'll try to get the log off of the machine. Thoughts or suggestions?
Might be a problem in systemd code to select clock type (monotonic/realtime/etc.). You can try taking an older image, and compiling systemd, and running the tests. Even something as simple as sudo dnf build-dep systemd git clone https://github.com/systemd/systemd cd systemd meson -Dman=false build && ninja -C build && ninja -C build test
If that passes, then the next step would be to take that older image, install just the newer kernel, and repeat the tests.
Zbyszek
On Thu, Jul 5, 2018 at 5:07 PM, Zbigniew Jędrzejewski-Szmek < zbyszek@in.waw.pl> wrote:
On Thu, Jul 05, 2018 at 01:55:46PM -0400, Jeff Backus wrote:
On Thu, Jul 5, 2018 at 12:56 PM, Jeff Backus jeff.backus@gmail.com
wrote:
Hi Adam,
Thanks for your response.
On Wed, Jul 4, 2018 at 4:33 PM, Adam Williamson < adamwill@fedoraproject.org> wrote:
On Thu, 2018-06-28 at 13:22 -0400, Jeff Backus wrote:
Hi folks,
I'm going to take a whack at triaging x86-related bugs. I see a few
listed
on the x86 tracker bug and the x86 exclude arch bug, which I'll
take a
closer look at.
Before I start searching for anything i686-related, are there any
issues
you'd like us to take a look at first?
Rawhide's been completely broken on x86 since...about 20180611.n.0, definitely 20180627.n.0. Installer images don't seem to make it out of dracut. I haven't had time to look into the problem in any detail, but all the openQA tests are failing, all the time.
Yeah, based on my cursory study of the logs, looks like the 6/10 image started failing for some form of core dump during initial boot that affected at least the Workstation image, and by 6/27 all i686 images
were
affected. I'm going to see if I can narrow it down.
I was able to get to a dracut prompt with the 6/30 image. Looks like the udev Kernel Device Manager is trying to core q dump? I'm seeing this message several times in the log after trying to start systemd-udevd.service: systemd-coredump[1663]: Failed to connect to coredump service: No such
file
or directory
Interestingly, I am seeing the following right after attempting to start systemd-udevd: Assertion 'clock_gettime(map_clock_id(clock_id), &ts) == 0' failed at ../src/basic/time-util.c:53, function now(). Abortion.
I'll try to get the log off of the machine. Thoughts or suggestions?
Might be a problem in systemd code to select clock type (monotonic/realtime/etc.). You can try taking an older image, and compiling systemd, and running the tests. Even something as simple as sudo dnf build-dep systemd git clone https://github.com/systemd/systemd cd systemd meson -Dman=false build && ninja -C build && ninja -C build test
If that passes, then the next step would be to take that older image, install just the newer kernel, and repeat the tests.
Zbyszek
Thanks! I'll give it a whirl tomorrow.
jeff
On Thu, Jul 5, 2018 at 5:47 PM, Jeff Backus jeff.backus@gmail.com wrote:
On Thu, Jul 5, 2018 at 5:07 PM, Zbigniew Jędrzejewski-Szmek < zbyszek@in.waw.pl> wrote:
Might be a problem in systemd code to select clock type (monotonic/realtime/etc.). You can try taking an older image, and compiling systemd, and running the tests. Even something as simple as sudo dnf build-dep systemd git clone https://github.com/systemd/systemd cd systemd meson -Dman=false build && ninja -C build && ninja -C build test
If that passes, then the next step would be to take that older image, install just the newer kernel, and repeat the tests.
Zbyszek
Here's a quick update, since I may not be able to get back to this until early next week. I was able to build the version of systemd that is in the master branch of the GH repo, SHA: 8255430. All tests that were executed pass under kernel 4.16.16-300. About 10 tests were skipped. The following skipped tests seem to be of particular interest: * test-udev * test-boot-timestamps
I was able to insert an "assert_se(1==0)" after the line that triggered the assertion when booting from the installer, and numerous tests fail. So that code is at least getting exercised.
Installed the latest version of the kernel available for F28, 4.17.3-200, and test results were the same. I am now trying to build 4.18.0-0.rc3, since that appears to be the latest version in rawhide.
jeff
On Fri, Jul 6, 2018 at 6:32 PM, Jeff Backus jeff.backus@gmail.com wrote:
On Thu, Jul 5, 2018 at 5:47 PM, Jeff Backus jeff.backus@gmail.com wrote:
On Thu, Jul 5, 2018 at 5:07 PM, Zbigniew Jędrzejewski-Szmek < zbyszek@in.waw.pl> wrote:
Might be a problem in systemd code to select clock type (monotonic/realtime/etc.). You can try taking an older image, and compiling systemd, and running the tests. Even something as simple as sudo dnf build-dep systemd git clone https://github.com/systemd/systemd cd systemd meson -Dman=false build && ninja -C build && ninja -C build test
If that passes, then the next step would be to take that older image, install just the newer kernel, and repeat the tests.
Zbyszek
Here's a quick update, since I may not be able to get back to this until early next week. I was able to build the version of systemd that is in the master branch of the GH repo, SHA: 8255430. All tests that were executed pass under kernel 4.16.16-300. About 10 tests were skipped. The following skipped tests seem to be of particular interest:
- test-udev
- test-boot-timestamps
I was able to insert an "assert_se(1==0)" after the line that triggered the assertion when booting from the installer, and numerous tests fail. So that code is at least getting exercised.
Installed the latest version of the kernel available for F28, 4.17.3-200, and test results were the same. I am now trying to build 4.18.0-0.rc3, since that appears to be the latest version in rawhide.
I updated to 4.18.0-0.rc3 and the problem showed up. I didn't have long to glance at the logs, but I am seeing the systemd segfault related to udev. This is with whatever version my F28 system had installed, not the version I compiled from GH. So, looks like an issue in the kernel. I'll try to extract more info from the journal when I get a chance. Are there any boot options, etc. that would produce additional helpful info?
Thanks! jeff
On Fri, Jul 06, 2018 at 07:51:18PM -0400, Jeff Backus wrote:
On Fri, Jul 6, 2018 at 6:32 PM, Jeff Backus jeff.backus@gmail.com wrote:
On Thu, Jul 5, 2018 at 5:47 PM, Jeff Backus jeff.backus@gmail.com wrote:
On Thu, Jul 5, 2018 at 5:07 PM, Zbigniew Jędrzejewski-Szmek < zbyszek@in.waw.pl> wrote:
Might be a problem in systemd code to select clock type (monotonic/realtime/etc.). You can try taking an older image, and compiling systemd, and running the tests. Even something as simple as sudo dnf build-dep systemd git clone https://github.com/systemd/systemd cd systemd meson -Dman=false build && ninja -C build && ninja -C build test
If that passes, then the next step would be to take that older image, install just the newer kernel, and repeat the tests.
Zbyszek
Here's a quick update, since I may not be able to get back to this until early next week. I was able to build the version of systemd that is in the master branch of the GH repo, SHA: 8255430. All tests that were executed pass under kernel 4.16.16-300. About 10 tests were skipped. The following skipped tests seem to be of particular interest:
- test-udev
- test-boot-timestamps
I was able to insert an "assert_se(1==0)" after the line that triggered the assertion when booting from the installer, and numerous tests fail. So that code is at least getting exercised.
Installed the latest version of the kernel available for F28, 4.17.3-200, and test results were the same. I am now trying to build 4.18.0-0.rc3, since that appears to be the latest version in rawhide.
I updated to 4.18.0-0.rc3 and the problem showed up. I didn't have long to glance at the logs, but I am seeing the systemd segfault related to udev. This is with whatever version my F28 system had installed, not the version I compiled from GH. So, looks like an issue in the kernel. I'll try to extract more info from the journal when I get a chance. Are there any boot options, etc. that would produce additional helpful info?
See https://www.freedesktop.org/wiki/Software/systemd/Debugging/.
Zbyszek
kernel@lists.fedoraproject.org