Alexey,
Good evening. I have finally made the time to circle back to this and do some testing.

I found this, which was interesting (I think you were assisting) https://blog.rook.io/prototyping-an-nfs-connection-to-ldap-using-sssd-7c27f624f1a4 

It seemed to share some parallels so I decided to test swapping the order of lookup in the nsswitch.conf for a test stateless instance.

passwd sss files
group sss files

After 15 minutes (exactly) a poll of the mounted NFS file systems reflected resolved users and groups as normal. Without requiring a lookup operation (for any valid user) as before.

I'm having trouble tracking this to the likely sssd timer that may help explain more.


Thoughts?


-- lawrence

On Wed, Feb 19, 2025 at 8:50 AM Lawrence Kearney <hangarbait@gmail.com> wrote:
Alexey,
Please forgive the delay in response. I'm heavily involved with a PS engagement/deployment for the next couple of weeks (this one included) and free time is sparse. This is important though so I will be working on it so again please forgive any delays in response.

We use the daemon for AD user/group resolution, access control, and authentication for cluster users at the edge (AD joined job submission nodes, data transfer nodes, etc.) and internally (compute nodes using LDAP). Users are permitted to authenticate to compute nodes if they have active jobs on. The SLURM "pam_slurm_adopt.so" module controls that access, where AD groups do so on the cluster edge systems. Those same AD groups will be used for SLURM based quality of service settings as well in an internal database. The enterprise provides the AD environment and we have no appetite to implement a shadow AD or LDAP service for the research compute side of things.

As mentioned, I've deployed hundreds of these configurations and this stateless configurations are the only one to behave this way. Very curious but as ephemeral systems are expectantly redeployed as a matter of operations, this nuance could certainly get annoying :-) .


-- lawrence


On Tue, Feb 18, 2025 at 3:14 AM Alexey Tikhonov <atikhono@redhat.com> wrote:
> What is different is these OS instances are Rocky 9.5 Linux containers deployed as stateless systems.

Also out of curiosity: how do you use SSSD in those containers?
What is the use case?