On Fri, Apr 10, 2015 at 03:26:02PM +0200, Thomas HUMMEL wrote:
We tried with sssd 1.12.4 and it doesn't fix the problem
Further on the debug process, we wanted to know if the problem comes from
slurm, glibc or sssd. Here's what we've tried :
1. we hacked slurmd code to add a getgroups() call before and after slurm calls
initgroups() :
debug2("Uncached user/gid: %s/%ld", job->user_name,
(long)job->gid);
debug2("Before initgroups number of groups for %s/%ld : %d",
job->user_name, (long)job->gid, getgroups(0, NULL));
if ((rc = initgroups(job->user_name, job->gid))) {
if ((errno == EPERM) && (getuid() != (uid_t) 0)) {
debug("Error in initgroups(%s, %ld): %m",
job->user_name, (long)job->gid);
} else {
error("Error in initgroups(%s, %ld): %m",
job->user_name, (long)job->gid);
}
return -1;
}
debug2("After initgroups number of groups for %s/%ld : %d",
job->user_name, (long)job->gid, getgroups(0, NULL));
return 0;
-> when the problem occurs (note that slurmd is running as root before dropping
privileges) :
Apr 10 17:10:28 myriad-n407 slurmstepd[7219]: Before initgroups number of groups for
njoly/3044 : 0
Apr 10 17:10:28 myriad-n407 slurmstepd[7219]: After initgroups number of groups for
njoly/3044 : 1
-> when the problem does not occur
Apr 10 17:32:14 myriad-n407 slurmstepd[11075]: Before initgroups number of groups for
njoly/3044 : 0
Apr 10 17:32:14 myriad-n407 slurmstepd[11075]: After initgroups number of groups for
njoly/3044 : 11
So our understanding is that slurm is not to blame
Note : in previous tests where we put a getgroups() elsewhere in the code,
sometimes we noticed that more than one group was retrieved. So sometimes a
subset of the supplementary groups is retrieved.
2. We stopped sssd and remove the cache files (mc/* db/*) and put the user in
/etc/passwd file and all his supplementary (as well as his primary group)
groups in /etc/group :
-> the problem does not occur anymore
So we think that glibc is not to blame either.
Conclusion : it seems to us that it really is an sssd problem. Can you hint us
somewhere in the sssd source code we can start to further investigate because
we are unable to build a test case without slurm.
Thanks
--
Thomas Hummel | Institut Pasteur
<hummel(a)pasteur.fr> | Groupe Exploitation et Infrastructure