dustymabe reported a new issue against the project: `releng` that you are following: `` We are seeing network issues when pulling the ostree content during the image builds for f29AH (run during bodhi updates composes). This seems to be getting more consistent lately. I'm opening this bug to track the issue.
Here are current cases where it has happened:
- [Fedora-29-updates-testing-20190602.0](https://pagure.io/dusty/failed-composes/issue/1948) - [Fedora-29-updates-20190531.0](https://pagure.io/dusty/failed-composes/issue/1937#comment-573321) - [ Fedora-29-updates-20190601.0](https://pagure.io/dusty/failed-composes/issue/1941) ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
dustymabe added a new comment to an issue you are following: `` cc @sinnykumari - do you think you can run a test image build on a ppc64le machine to see if you can reproduce the issue? ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
sinnykumari added a new comment to an issue you are following: `` Did AH image build on ppc64le from Fedora-29-updates-testing-20190602.0 and Fedora-29-updates-20190601.0 configs. Didn't observe any issue, build succeeded fine. ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
dustymabe added a new comment to an issue you are following: `` [Fedora-29-updates-20190624.0](https://pagure.io/dusty/failed-composes/issue/2019) ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
sinnykumari added a new comment to an issue you are following: `` Something is wrong on ppc64le builders , it took ~10 hours for both Fedora-29-updates-20190624.0 and Fedora-29-updates-testing-20190624.0 compose. Among that ppc64le took around 7 hours to finish ppc64le ISO creation https://koji.fedoraproject.org/koji/taskinfo?taskID=35780975 ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
sinnykumari added a new comment to an issue you are following: `` @kevin @sharkcz thoughts on why we are having these ppc64le related failures and delay? ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
sharkcz added a new comment to an issue you are following: `` uff, I wonder what made it so slow ... Looks like the IO to the nested VM's disk was super slow. ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
dustymabe added a new comment to an issue you are following: ``
Looks like the IO to the nested VM's disk was super slow.
ouch :(
uff, I wonder what made it so slow ...
Do you think you could investigate? We'd like to do a Fedora Atomic Host release this week and we'd like to ship the ppc64le artifacts as part of it. ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
kevin added a new comment to an issue you are following: `` I think a first thing to do would be to update/reboot the power9 hosts. They are Fedora 30 and now on a older kernel. I'll try and do that today. ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
dustymabe added a new comment to an issue you are following: ``
I think a first thing to do would be to update/reboot the power9 hosts. They are Fedora 30 and now on a older kernel. I'll try and do that today.
Thanks.. after you do that can you re-run a koji task (for example 35785502 from Fedora-29-updates-20190624.0) and see if it completes successfully? ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
kevin added a new comment to an issue you are following: `` resubmitting failed image compose tasks isn't a good idea... there's already a record of it failing, if it worked it could cause confusion, also inputs it uses could change.
Can I just fire off another f29-updates-testing compose? or f29-updates?
``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
dustymabe added a new comment to an issue you are following: ``
resubmitting failed image compose tasks isn't a good idea... there's already a record of it failing, if it worked it could cause confusion, also inputs it uses could change. Can I just fire off another f29-updates-testing compose? or f29-updates?
yeah - that works ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
sharkcz added a new comment to an issue you are following: `` I suspect it's not only the composes, the last kernel builds are also taking way too many hours on ppc64le. ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
sinnykumari added a new comment to an issue you are following: `` Compose [Fedora-29-updates-20190625.0](https://kojipkgs.fedoraproject.org/compose/updates/Fedora-29-updates-2019062...) finished within expected time, looks like rebooting builders helped. Let's observe for few more days and see if problem comes back ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
kevin added a new comment to an issue you are following: `` So, the power8 boxes have 10,000rpm sas drives. The power9 hosts have 7200 rpm sata drives.
I think seek times may be causing problems when there's a lot of builds going on. I can try reducing density of builders perhaps, or perhaps we can replace storage with ssds or the like.
``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
sinnykumari added a new comment to an issue you are following: `` Observed AH composes from last two days from both F29 updates and updatest-testing, all of them succeeded within expected time ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
dustymabe added a new comment to an issue you are following: `` https://pagure.io/dusty/failed-composes/issue/2042 ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
sinnykumari added a new comment to an issue you are following: `` Yeah, seems like ppc64le issues are showing up again. cloud images are failing and also ISO creation is taking longer time, it took [~5 hrs](https://koji.fedoraproject.org/koji/taskinfo?taskID=35972578) for ISO from Fedora-29-{updates,updates-testing}-20190701.0 ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
jlebon added a new comment to an issue you are following: ``
I can try reducing density of builders perhaps, or perhaps we can replace storage with ssds or the like.
Did either of those two things happen? Or are we still experiencing slow IO on ppc64le builders across various tasks? One task Sinny brought up seems to suggest it's still happening (https://koji.fedoraproject.org/koji/taskinfo?taskID=36279258) but it's hard to gain visibility into what's really going on. ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
mohanboddu added a new comment to an issue you are following: ``
Or are we still experiencing slow IO on ppc64le builders across various tasks?
It is still an issue afaik. ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
sinnykumari added a new comment to an issue you are following: `` @nirik any plan to move ppc64le image builder soon to F30? imagebuild is failing frequently on ppc64le and I am not able to reproduce them locally (running F30 though) with several attempts. My local ppc64le vm is also runs very slow and takes lot of time but it doesn't fail with timeout error during ostree repo pull. ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
sinnykumari added a new comment to an issue you are following: `` F29 AH imagebuild on ppc64le composes are running successfully since 20190724 for both updates and updates-testing.
From IRC conversation with Kevin, we have cache mode unsafe enabled on ppc64le builder which made things about 10x faster.
Thanks Kevin and everyone for the helping in resolving this issue.
We can close this ticket for now! ``
To reply, visit the link below or just reply to this email https://pagure.io/releng/issue/8407
The status of the issue: `tracking issue: ppc64le network failures during f29AH composes` of project: `releng` has been updated to: Closed as Fixed by kevin.
rel-eng@lists.fedoraproject.org