Hey folks! Just a heads up that I merged a branch I've had lying around for weeks that enables *some* of the openQA desktop tests on the aarch64 Workstation image.
I left out some with known issues. desktop_notifications_postinstall needs to boot to runlevel 3, which is actually a bit tricky because of UEFI - we don't get the boot menu with a usable timeout and the trick we use to work around that on x86_64 (where we run the tests on BIOS) doesn't work on UEFI because the setting is in the UEFI vars which are not part of the main disk image.
To fix this I think I need to have the 'image deploy' test upload its UEFI vars disk image and have the desktop_notifications_postinstall test attach that as well as the main disk image, but I need to look into the ins and outs of that a bit and this is my last work day of the year, so I'll do it next year.
The desktop_login test is generally fragile (things tending to get stuck or time out), but if it gets that far, it *always* fails when it tests locking the screen; on aarch64 this seems to cause the VM to permanently stop updating the display, or something. Again I haven't had time to look into this, and I want to enable the other tests without waiting for it.
The desktop_browser test is also failing, but I left that in because it's not a test bug, it's a distro bug. Firefox builds have just been disabled on aarch64 since 2020-11-20, so current composes don't have Firefox in them on aarch64 at all. There's a bug related to this: https://bugzilla.redhat.com/show_bug.cgi?id=1897675 which I just gave a bit of a bump, because it shouldn't really be acceptable to just turn off our default browser's build on one of our primary arches for weeks at a time :(
This also breaks several of the other tests which use Firefox, like the Cockpit and FreeIPA browser tests.
The tests should run on openQA prod from the next compose. The branch has been deployed on openQA lab (staging) for weeks (including the broken tests), so you can see how it's been behaving there.
Thanks folks!
Hey Adam,
Sorry for the delayed reply here. I wanted to reply properly, then it got lost in my inbox.
Hey folks! Just a heads up that I merged a branch I've had lying around for weeks that enables *some* of the openQA desktop tests on the aarch64 Workstation image.
Thanks for this, it's really awesome!
I left out some with known issues. desktop_notifications_postinstall needs to boot to runlevel 3, which is actually a bit tricky because of UEFI - we don't get the boot menu with a usable timeout and the trick we use to work around that on x86_64 (where we run the tests on BIOS) doesn't work on UEFI because the setting is in the UEFI vars which are not part of the main disk image.
Can you use "efibootmgr --timeout XX" to set a usable timeout at some point during the process?
To fix this I think I need to have the 'image deploy' test upload its UEFI vars disk image and have the desktop_notifications_postinstall test attach that as well as the main disk image, but I need to look into the ins and outs of that a bit and this is my last work day of the year, so I'll do it next year.
The desktop_login test is generally fragile (things tending to get stuck or time out), but if it gets that far, it *always* fails when it tests locking the screen; on aarch64 this seems to cause the VM to permanently stop updating the display, or something. Again I haven't had time to look into this, and I want to enable the other tests without waiting for it.
Huh, weird, I would have thought given it's basically the same driver stack it would have been closest here.
The desktop_browser test is also failing, but I left that in because it's not a test bug, it's a distro bug. Firefox builds have just been disabled on aarch64 since 2020-11-20, so current composes don't have Firefox in them on aarch64 at all. There's a bug related to this: https://bugzilla.redhat.com/show_bug.cgi?id=1897675 which I just gave a bit of a bump, because it shouldn't really be acceptable to just turn off our default browser's build on one of our primary arches for weeks at a time :(
That's been an ongoing issue with the firefox maintenance for years sadly :(
This also breaks several of the other tests which use Firefox, like the Cockpit and FreeIPA browser tests.
The tests should run on openQA prod from the next compose. The branch has been deployed on openQA lab (staging) for weeks (including the broken tests), so you can see how it's been behaving there.
How are they generally after a few weeks running?
Peter
On Tue, 2021-01-19 at 23:30 +0000, Peter Robinson wrote:
Hey Adam,
Sorry for the delayed reply here. I wanted to reply properly, then it got lost in my inbox.
Hey folks! Just a heads up that I merged a branch I've had lying around for weeks that enables *some* of the openQA desktop tests on the aarch64 Workstation image.
Thanks for this, it's really awesome!
I left out some with known issues. desktop_notifications_postinstall needs to boot to runlevel 3, which is actually a bit tricky because of UEFI - we don't get the boot menu with a usable timeout and the trick we use to work around that on x86_64 (where we run the tests on BIOS) doesn't work on UEFI because the setting is in the UEFI vars which are not part of the main disk image.
Can you use "efibootmgr --timeout XX" to set a usable timeout at some point during the process?
Well...
To fix this I think I need to have the 'image deploy' test upload its UEFI vars disk image and have the desktop_notifications_postinstall test attach that as well as the main disk image, but I need to look into the ins and outs of that a bit and this is my last work day of the year, so I'll do it next year.
...IIRC (I didn't refresh my memory on this since the shutdown) we can set it "at some point in the process", but that point is during the "image deploy" test...but it gets written *to the efi vars storage*, I believe, and we don't currently hand that off from the "image deploy" test to the other tests that follow it. We hand off the *main 'system disk' image*, which for BIOS boot includes the boot manager config, but we've just never set things up so the EFI var storage gets stashed by the "image deploy" test and reused by subsequent tests. So we can set it, sure, but the setting effectively gets "lost" between being set and the point where we'd actually need it.
Now I've been reminded of this, I'll try and get back to it soon...I've said that about five things already this week...sigh.
The desktop_login test is generally fragile (things tending to get stuck or time out), but if it gets that far, it *always* fails when it tests locking the screen; on aarch64 this seems to cause the VM to permanently stop updating the display, or something. Again I haven't had time to look into this, and I want to enable the other tests without waiting for it.
Huh, weird, I would have thought given it's basically the same driver stack it would have been closest here.
Yeah, it's odd for sure. Again haven't found the roundtuits to look into it further yet.
The desktop_browser test is also failing, but I left that in because it's not a test bug, it's a distro bug. Firefox builds have just been disabled on aarch64 since 2020-11-20, so current composes don't have Firefox in them on aarch64 at all. There's a bug related to this: https://bugzilla.redhat.com/show_bug.cgi?id=1897675 which I just gave a bit of a bump, because it shouldn't really be acceptable to just turn off our default browser's build on one of our primary arches for weeks at a time :(
That's been an ongoing issue with the firefox maintenance for years sadly :(
Yeah, we'll just have to keep asking mstransky not to turn off aarch64 :(
This also breaks several of the other tests which use Firefox, like the Cockpit and FreeIPA browser tests.
The tests should run on openQA prod from the next compose. The branch has been deployed on openQA lab (staging) for weeks (including the broken tests), so you can see how it's been behaving there.
How are they generally after a few weeks running?
Good question! Rawhide's a bit flaky in general ATM, but the answer seems to be..."a bit flaky" :) I'm seeing quite a few failures that are likely performance-related; here are the ones from yesterday for instance:
https://openqa.fedoraproject.org/tests/758486 - desktop_terminal, failed because root auth failed, likely a dropped keystroke https://openqa.fedoraproject.org/tests/758487 - desktop_update_graphical , system seems to have crashed after updates were installed and system rebooted, not sure about that one; i'm rerunning it https://openqa.fedoraproject.org/tests/758485 - desktop_browser , seems like when openQA "saw" the "addon_add" needle the browser was actually kinda lagging a bit, and so openQA's click probably got eaten
I'll try and throw in some mitigations, but all we can really do is slow the tests down - make them wait after matching needles, make them type slower and slower - and you know how that can snowball. One thing I'll be interested to see is whether the tests get less flaky after we switch away from debug kernels for F34; IIRC Justin was thinking about getting away from debug kernels entirely, so if that happens and they are an issue here, it might help in future.