https://bugzilla.redhat.com/show_bug.cgi?id=2450007
Bug ID: 2450007 Summary: Review Request: nvidia-container-toolkit - Build and run containers leveraging NVIDIA GPUs Product: Fedora Version: rawhide Hardware: All OS: Linux Status: NEW Component: Package Review Severity: medium Priority: medium Assignee: nobody@fedoraproject.org Reporter: jonathansteffan@gmail.com QA Contact: extras-qa@fedoraproject.org CC: package-review@lists.fedoraproject.org Target Milestone: --- Classification: Fedora
Spec URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-... SRPM URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-... Description: Build and run containers leveraging NVIDIA GPUs. Fedora Account System Username: jsteffan
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
Fedora Review Service fedora-review-bot@fedoraproject.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |AutomationTriaged
--- Comment #1 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Copr build: https://copr.fedorainfracloud.org/coprs/build/10250149 (succeeded)
Review template: https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-rev...
Please take a look if any issues were found.
--- This comment was created by the fedora-review-service https://github.com/FrostyX/fedora-review-service
If you want to trigger a new Copr build, add a comment containing new Spec and SRPM URLs or [fedora-review-service-build] string.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
Mikel Olasagasti Uranga mikel@olasagasti.info changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |mikel@olasagasti.info
--- Comment #2 from Mikel Olasagasti Uranga mikel@olasagasti.info --- - Add explicit BuildRequires on make
%make_build
- Why custom __golang_extldflags are required? Add a comment
%global __golang_extldflags -Wl,-z,lazy -Wl,--export-dynamic
- Current template for the %prep part does the following:
%prep %goprep -p1 tar -xf %{S:1}
While you're doing:
%goprep -A %setup -q -T -D -a1 %{forgesetupargs} tar xf %{SOURCE3} %autopatch -p1
Feels a pre 1.19 go2rpm template was used and later reapplied somehow. It's purely cosmetic request, but if you could normalize it would be better.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #3 from Mikel Olasagasti Uranga mikel@olasagasti.info --- [fedora-review-service-build]
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #4 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Copr build: https://copr.fedorainfracloud.org/coprs/build/10304665 (succeeded)
Review template: https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-rev...
Please take a look if any issues were found.
--- This comment was created by the fedora-review-service https://github.com/FrostyX/fedora-review-service
If you want to trigger a new Copr build, add a comment containing new Spec and SRPM URLs or [fedora-review-service-build] string.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #5 from Mikel Olasagasti Uranga mikel@olasagasti.info --- - Checking upstream Makefile, NVIDIA sets to:
EXTLDFLAGS = -Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files -Wl,-z,lazy
https://github.com/NVIDIA/nvidia-container-toolkit/blob/main/Makefile#L63
- Version ldflags are not set:
-X $(CLI_VERSION_PACKAGE).gitCommit=$(GIT_COMMIT) -X $(CLI_VERSION_PACKAGE).version=$(CLI_VERSION)"
Export the following after "%global gomodulesmode GO111MODULE=on"
export GO_LDFLAGS="-X github.com/NVIDIA/nvidia-container-toolkit/internal/info.version=%{version} \ -X github.com/NVIDIA/nvidia-container-toolkit/internal/info.gitCommit=%{commit}"
The problem is that gitCommit/commit would need to be modified on each version update. You can have something else written there like "unkonwn" or "Fedora". Check how is printed with the upstream version and if it has to have any known format.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #6 from Jonathan Steffan jonathansteffan@gmail.com --- Changelog:
- Update __golang_extldflags to match upstream, link to upstream - Set info.version and info.gitCommit - Add make BR
Spec URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-... SRPM URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-...
Why I'm doing the awkwardly looking %prep is because of an odd mix between forge, go macros, what I need to patch, and the added sources. I'll experiment a bit to see where this is breaking down.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #7 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Created attachment 2136591 --> https://bugzilla.redhat.com/attachment.cgi?id=2136591&action=edit The .spec file difference from Copr build 10304665 to 10311924
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #8 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Copr build: https://copr.fedorainfracloud.org/coprs/build/10311924 (succeeded)
Review template: https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-rev...
Please take a look if any issues were found.
--- This comment was created by the fedora-review-service https://github.com/FrostyX/fedora-review-service
If you want to trigger a new Copr build, add a comment containing new Spec and SRPM URLs or [fedora-review-service-build] string.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #9 from Jonathan Steffan jonathansteffan@gmail.com --- Changelog:
- Remove patch, instead copy in staged required binary to expected location: https://github.com/NVIDIA/nvidia-container-toolkit/issues/1758 - Update %prep to be more modern
Spec URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-... SRPM URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-...
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #10 from Mikel Olasagasti Uranga mikel@olasagasti.info --- [fedora-review-service-build]
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #11 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Created attachment 2136942 --> https://bugzilla.redhat.com/attachment.cgi?id=2136942&action=edit The .spec file difference from Copr build 10311924 to 10317718
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #12 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Copr build: https://copr.fedorainfracloud.org/coprs/build/10317718 (succeeded)
Review template: https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-rev...
Please take a look if any issues were found.
--- This comment was created by the fedora-review-service https://github.com/FrostyX/fedora-review-service
If you want to trigger a new Copr build, add a comment containing new Spec and SRPM URLs or [fedora-review-service-build] string.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
Tom.Rix@amd.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|nobody@fedoraproject.org |Tom.Rix@amd.com Status|NEW |ASSIGNED CC| |Tom.Rix@amd.com
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #13 from Tom.Rix@amd.com --- (In reply to Mikel Olasagasti Uranga from comment #5)
- Checking upstream Makefile, NVIDIA sets to:
EXTLDFLAGS = -Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files -Wl,-z,lazy
https://github.com/NVIDIA/nvidia-container-toolkit/blob/main/Makefile#L63
rpmlint is also complaining a lot about symbols. What resolves these symbols ?
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #14 from Tom.Rix@amd.com --- Source3: https://github.com/NVIDIA/dgx-selinux/archive/%%7Bdgx_selinux_commit%7D/dgx-...
From the readme https://github.com/NVIDIA/dgx-selinux/blob/master/README.md This is expected to run on bare metal, not just containers.
I think this needs its own package. And it can follow the selinux packaging policy https://fedoraproject.org/wiki/PackagingDrafts/SELinux_Independent_Policy
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #15 from Jonathan Steffan jonathansteffan@gmail.com --- (In reply to Tom.Rix from comment #13)
(In reply to Mikel Olasagasti Uranga from comment #5)
- Checking upstream Makefile, NVIDIA sets to:
EXTLDFLAGS = -Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files -Wl,-z,lazy
https://github.com/NVIDIA/nvidia-container-toolkit/blob/main/Makefile#L63
rpmlint is also complaining a lot about symbols. What resolves these symbols ?
They are resolved at runtime by the linker:
Library location (pkg/lookup/library.go): A First() combinator tries three strategies in order, returning the first hit: 1. Direct dlopen attempt 2. Predefined filesystem search paths 3. Parse /etc/ld.so.cache
ld.so.cache parsing (internal/ldcache/ldcache.go): The binary cache is mmap-ed directly. Each entry2 struct has a Flags field with ELF architecture bits (0x0300 = x86_64, 0x0a00 = aarch64, etc.) used to filter 32-bit vs 64-bit libs. No ldconfig subprocess — raw struct parsing.
dlopen/dlsym (vendor/github.com/NVIDIA/go-nvml/pkg/dl/dl.go): CGO wrapper around libc dlopen/dlsym. Flags: RTLD_LAZY | RTLD_GLOBAL | RTLD_NODELETE. Every call is wrapped with runtime.LockOSThread() for thread safety.
Symlink chain resolution (pkg/lookup/symlinks/symlink.go): Once a library path is found, the full symlink chain is walked (e.g., libcuda.so → libcuda.so.1 → libcuda.so.550.1.0) and all aliases are collected — so containers get every soname variant.
Versioned symbol negotiation (internal/nvsandboxutils/lib.go): After dlopen, updateVersionedSymbols() probes for _v2/_v3 suffixed symbols via dlsym. If found, the function pointer is upgraded — forward compatibility without ABI breaks.
Summary: locate via ldcache mmap → dlopen → dlsym with version probing → expose full symlink chain.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #16 from Jonathan Steffan jonathansteffan@gmail.com --- (In reply to Tom.Rix from comment #14)
Source3: https://github.com/NVIDIA/dgx-selinux/archive/%%7Bdgx_selinux_commit%7D/dgx- selinux-%{dgx_selinux_commit}.tar.gz
From the readme https://github.com/NVIDIA/dgx-selinux/blob/master/README.md This is expected to run on bare metal, not just containers.
I think this needs its own package. And it can follow the selinux packaging policy https://fedoraproject.org/wiki/PackagingDrafts/SELinux_Independent_Policy
Agreed. I'll drop it and add a new policy package for inclusion.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #17 from Jonathan Steffan jonathansteffan@gmail.com --- Changelog:
- Drop the additional SELinux policy, keep the boolean management
Spec URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-... SRPM URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-...
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #18 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Copr build: https://copr.fedorainfracloud.org/coprs/build/10452587 (succeeded)
Review template: https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-rev...
Please take a look if any issues were found.
--- This comment was created by the fedora-review-service https://github.com/FrostyX/fedora-review-service
If you want to trigger a new Copr build, add a comment containing new Spec and SRPM URLs or [fedora-review-service-build] string.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #19 from Jonathan Steffan jonathansteffan@gmail.com --- Changelog:
- Fix the SELinux changes (oops don't remove the %files section :))
Spec URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-... SRPM URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-...
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #20 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Created attachment 2141077 --> https://bugzilla.redhat.com/attachment.cgi?id=2141077&action=edit The .spec file difference from Copr build 10452587 to 10456018
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #21 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Copr build: https://copr.fedorainfracloud.org/coprs/build/10456018 (succeeded)
Review template: https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-rev...
Please take a look if any issues were found.
--- This comment was created by the fedora-review-service https://github.com/FrostyX/fedora-review-service
If you want to trigger a new Copr build, add a comment containing new Spec and SRPM URLs or [fedora-review-service-build] string.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #22 from Tom.Rix@amd.com ---
Symlink chain resolution (pkg/lookup/symlinks/symlink.go): Once a library path is found, the full symlink chain is walked (e.g., libcuda.so → libcuda.so.1 → libcuda.so.550.1.0) and all aliases are collected — so containers get every soname variant.
cuda is of course not open source. What are the legal issues for this method ? I am hoping you can point to a past ruling saying it is ok.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #23 from Jonathan Steffan jonathansteffan@gmail.com --- (In reply to Tom.Rix from comment #22)
Symlink chain resolution (pkg/lookup/symlinks/symlink.go): Once a library path is found, the full symlink chain is walked (e.g., libcuda.so → libcuda.so.1 → libcuda.so.550.1.0) and all aliases are collected — so containers get every soname variant.
cuda is of course not open source. What are the legal issues for this method ? I am hoping you can point to a past ruling saying it is ok.
Dynamically linking is the licensing demarcation point. Everything shipped in this package is licensed with an approved license and should be fine for Fedora. If the relevant DSOs are available in the probed locations, they are exposed inside the container. If not, nothing happens and the CDI configuration is unable to be assembled.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #24 from Jonathan Steffan jonathansteffan@gmail.com --- For the approval, well it's already approved as it's already been included in Fedora: https://src.fedoraproject.org/rpms/golang-github-nvidia-container-toolkit
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #25 from Jonathan Steffan jonathansteffan@gmail.com --- Which reminded me :)
Changelog:
- Add Obsoletes for https://src.fedoraproject.org/rpms/golang-github-nvidia-container-toolkit < 1.18
Spec URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-... SRPM URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-...
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #26 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Created attachment 2141321 --> https://bugzilla.redhat.com/attachment.cgi?id=2141321&action=edit The .spec file difference from Copr build 10456018 to 10461429
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #27 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Copr build: https://copr.fedorainfracloud.org/coprs/build/10461429 (succeeded)
Review template: https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-rev...
Please take a look if any issues were found.
--- This comment was created by the fedora-review-service https://github.com/FrostyX/fedora-review-service
If you want to trigger a new Copr build, add a comment containing new Spec and SRPM URLs or [fedora-review-service-build] string.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
Tom.Rix@amd.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(jonathansteffan@g | |mail.com)
--- Comment #28 from Tom.Rix@amd.com --- (In reply to Jonathan Steffan from comment #24)
For the approval, well it's already approved as it's already been included in Fedora: https://src.fedoraproject.org/rpms/golang-github-nvidia-container-toolkit
Why are we doing a package review then ?
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
Jonathan Steffan jonathansteffan@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(jonathansteffan@g | |mail.com) |
--- Comment #29 from Jonathan Steffan jonathansteffan@gmail.com --- (In reply to Tom.Rix from comment #28)
Why are we doing a package review then ?
It's a new package name and a new spec, because we are changing what the package ships. In addition, this new package uses vendoring.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #30 from Tom.Rix@amd.com --- I am going to recommend the cuda part to be removed.
For licensing I was specifically looking for an nvidia and cuda references that would allow wrapping of cuda in fedora. If we do this and zluda or similar hook it up their own cuda wrapper, will fedora or its users be sued ?
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #31 from Jonathan Steffan jonathansteffan@gmail.com --- I'm not sure I follow. If we don't think this content is permissive in Fedora, I need to file a ticket to get it approved? It would be silly to block this code IMHO.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #32 from Jonathan Steffan jonathansteffan@gmail.com --- Changelog:
- Update to 1.19.1
Spec URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-... SRPM URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-...
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #33 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Created attachment 2142275 --> https://bugzilla.redhat.com/attachment.cgi?id=2142275&action=edit The .spec file difference from Copr build 10461429 to 10501619
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
Fedora Review Service fedora-review-bot@fedoraproject.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords|AutomationTriaged |
--- Comment #34 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Copr build: https://copr.fedorainfracloud.org/coprs/build/10501619 (failed)
Build log: https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-rev...
Please make sure the package builds successfully at least for Fedora Rawhide.
- If the build failed for unrelated reasons (e.g. temporary network unavailability), please ignore it. - If the build failed because of missing BuildRequires, please make sure they are listed in the "Depends On" field
--- This comment was created by the fedora-review-service https://github.com/FrostyX/fedora-review-service
If you want to trigger a new Copr build, add a comment containing new Spec and SRPM URLs or [fedora-review-service-build] string.
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #35 from Jonathan Steffan jonathansteffan@gmail.com --- Changelog:
- Fix test determinism https://github.com/NVIDIA/nvidia-container-toolkit/pull/1846
Spec URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-... SRPM URL: https://jsteffan.fedorapeople.org/nvidia-container-toolkit/nvidia-container-...
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
--- Comment #36 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Created attachment 2142420 --> https://bugzilla.redhat.com/attachment.cgi?id=2142420&action=edit The .spec file difference from Copr build 10501619 to 10504408
https://bugzilla.redhat.com/show_bug.cgi?id=2450007
Fedora Review Service fedora-review-bot@fedoraproject.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |AutomationTriaged
--- Comment #37 from Fedora Review Service fedora-review-bot@fedoraproject.org --- Copr build: https://copr.fedorainfracloud.org/coprs/build/10504408 (succeeded)
Review template: https://download.copr.fedorainfracloud.org/results/@fedora-review/fedora-rev...
Please take a look if any issues were found.
--- This comment was created by the fedora-review-service https://github.com/FrostyX/fedora-review-service
If you want to trigger a new Copr build, add a comment containing new Spec and SRPM URLs or [fedora-review-service-build] string.
package-review@lists.fedoraproject.org