I think it would save time on both ends unless we can reproduce
ourselves :-)
We've got a "recipie" and configuration files to reproduce the bug from
scratch,
on a vanilla CentOS 6 distro (the ldap part is inspired from
http://wiki.openiam.com/pages/viewpage.action?pageId=7635198)
# yum install sssd sssd-common openldap-servers openldap-clients perl-LDAP.noarch
# cp /usr/share/openldap-servers/DB_CONFIG.example /var/lib/ldap/DB_CONFIG
# chown -R ldap:ldap /var/lib/ldap
# cd /etc/openldap && mv slapd.d slapd.d.original
# cp /root/slapd-minimal.conf /etc/openldap/slapd.conf # use the one provided
with this message
# chown ldap:ldap /etc/openldap/slapd.conf
# chmod 600 /etc/openldap/slapd.conf
# Add this line is /etc/sysconfig/ldap
SLAPD_OPTIONS="-h \"ldap://127.0.0.1 ldaps://127.0.0.1\""
# service slapd start
# chkconfig slapd on
Check that you can connect (the Manager password is "openldap") :
# ldapsearch -h localhost -x -w openldap -D 'cn=Manager,dc=example,dc=com' -b
'dc=example,dc=com' 'objectclass=*'
Time to populate our ldap server with our provided file (one user "user1" with
password "openldap" belonging to 29 secondary groups):
# ldapadd -h localhost -x -w openldap -D 'cn=Manager,dc=example,dc=com' -f
/root/ldap-init.ldif
You can check that everything went fine with the previous ldapsearch command.
Copy our sssd configuration file:
# cp /root/sssd-minimal.conf /etc/sssd/sssd.conf
# chown root:root /etc/sssd/sssd.conf && chmod 600 /etc/sssd/sssd.conf
# service sssd start
# chkconfig sssd on
# # not sure if the authconfig is strictly necessary here
# authconfig --enablesssd --enablesssdauth --enablelocauthorize
--enablemkhomedir --enablepamaccess --updateall --nostart
# service sssd restart
In /etc/nsswitch.conf, check for :
passwd: files sss
shadow: files sss
group: files sss
# cat /etc/sssd/sssd.conf
[sssd]
config_file_version = 2
services = nss, pam
domains = ldap_local
[nss]
filter_users = root,ldap,named,avahi,haldaemon,dbus,radiusd,news,nscd
override_shell = /bin/bash
[pam]
[domain/ldap_local]
override_homedir = /home/%u
auth_provider = ldap
ldap_schema = rfc2307
ldap_search_base = ou=people,dc=example,dc=com
ldap_group_search_base = ou=group,dc=example,dc=com
id_provider = ldap
ldap_uri = ldap://localhost/
You can now run your script or mine. Just adapt the initgroups.py call or use
the one provided with this message:
python initgroups.py user1 50001 29 $num_proc $delay)
And run:
# ./run_initgroups.sh
Stopping sssd: [ OK ]
Starting sssd: [ OK ]
.wrongs number of secondary groups in process 17626 : 0 instead of 29 (sleep 16ms)
wrongs number of secondary groups in process 17630 : 0 instead of 29 (sleep 26ms)
wrongs number of secondary groups in process 17634 : 0 instead of 29 (sleep 49ms)
wrongs number of secondary groups in process 17615 : 0 instead of 29 (sleep 53ms)
4/24 failed
OR
# ./reproduce.sh
Stopping sssd: [ OK ]
Starting sssd: [ OK ]
wrongs number of secondary groups in process 15664 : 0 instead of 29 (sleep 10ms)
wrongs number of secondary groups in process 15672 : 0 instead of 29 (sleep 9ms)
wrongs number of secondary groups in process 15673 : 0 instead of 29 (sleep 10ms)
3/20 failed
Stopping sssd: [ OK ]
Starting sssd: [ OK ]
wrongs number of secondary groups in process 15747 : 0 instead of 29 (sleep 3ms)
wrongs number of secondary groups in process 15734 : 0 instead of 29 (sleep 4ms)
wrongs number of secondary groups in process 15735 : 0 instead of 29 (sleep 10ms)
wrongs number of secondary groups in process 15748 : 0 instead of 29 (sleep 3ms)
wrongs number of secondary groups in process 15743 : 0 instead of 29 (sleep 7ms)
wrongs number of secondary groups in process 15745 : 0 instead of 29 (sleep 7ms)
wrongs number of secondary groups in process 15736 : 0 instead of 29 (sleep 5ms)
wrongs number of secondary groups in process 15742 : 0 instead of 29 (sleep 4ms)
wrongs number of secondary groups in process 15731 : 0 instead of 29 (sleep 10ms)
wrongs number of secondary groups in process 15732 : 0 instead of 29 (sleep 14ms)
wrongs number of secondary groups in process 15739 : 0 instead of 29 (sleep 4ms)
wrongs number of secondary groups in process 15749 : 0 instead of 29 (sleep 4ms)
Tell me your able to reproduce that. If this is not the case, we have something
on the machine that have weird interaction with sssd. Beside some local
configuration (ntp, network, selinux disabled, some system packages), I don't
see anything.
If you cannot reproduce it, I'll give you access to the machine (ssh or VMware
ova export or something like that if you prefer).
Thank you for helping.
Jean-Baptiste