I ran into a perplexing problem recently:
We have all of our users/groups stored in ipa, including some "service
accounts" that we run services under. As we started migrating to CentOS 7
we came across the issue with some services configured to store their PID
files in /run (or /var/run) which is tmpfs and the services would fail to
start due to missing pid directories.
We learned that we could create a conf in /usr/lib/tmpfiles.d that would
create the necessary directories on startup. Well, it didn't work. It took
us a while to figure out, but the issue is that the user/group ownership of
the directory was set to a user that is looked up from IPA (via sssd) and
was failing with:
systemd-tmpfiles[2171]: [/usr/lib/tmpfiles.d/my-service.conf:1] Unknown
group 'my_group'.
systemd-tmpfiles-setup.service: main process exited, code=exited,
status=1/FAILURE
BUT it seems that because sssd.service relies
on systemd-tmpfiles-setup.service, we have a race condition.
sssd.service +271ms └─basic.target @976ms └─sockets.target @975ms
└─rpcbind.socket @975ms └─sysinit.target @969ms
└─systemd-update-utmp.service @963ms +5ms └─auditd.service @933ms +28ms -->
└─systemd-tmpfiles-setup.service @903ms +29ms └─rhel-import-state.service
@874ms +28ms └─local-fs.target @872ms └─run-user-20137.mount @20.363s
└─local-fs-pre.target @680ms └─lvm2-monitor.service @260ms +418ms
└─lvm2-lvmetad.service @306ms └─lvm2-lvmetad.socket @260ms └─-.slice
At first, I thought it might be due to the order of nsswitch.conf, but I
changed from:
group: files sss
to:
group: sss files
and that didn't seem to make a difference.
Curiously: it is not complaining that it can't find the user, only the
group.
Once the system is up, I can log in and:
getent group my_group
just fine.
So if sssd is waiting on systemd-tmpfiles, how on earth can we ever use
tmpfiles.d with users/groups stored in IPA if sssd isn't "up" yet?
I am not sure how to handle this... just wondering in anyone has come
across this before and if there is a solution.
Thanks