I'm trying to better understand how sssd caches automount maps. I've had everything working for quite some time; autofs will get the maps (which are stored in ldap) from sssd when the system boots and in general there are no issues. However, there are a couple of problems I've been having:
* Autofs is mostly nonfunctional When a system boots without the network.
* Even with a network, autofs sometimes simply fails to start with: automount[917]: setautomntent: lookup(sss): setautomntent: No such file or directory
---
With no network, autofs starts but the only thing which operates is the /net map because that's in /etc/auto.master. The rest of the maps simply aren't there, and if the network returns, autofs doesn't notice (which I know is an issue with autofs).
I thought that sssd would be able to cache the master map entries which come from ldap when the network is offline, but that doesn't seem to be the case. I know that it couldn't actually mount anything, but if it at had access to a cached version the map, it would at least start properly.
---
The startup failure is rather difficult to reproduce. My guess is that autofs is simply coming up before sssd is ready to provide the master map, but I haven't been able to confirm that. I also haven't been able to see how much of autofs is actually functioning in that case (because people keep rebooting their desktops before I can examine one in detail).
---
By skipping ldap and storing the entire master map in /etc/auto.master, everything seems to work better. At least, the first problem goes away entirely, and I haven't yet been able to reproduce the second problem.
With a local auto.master and no network, autofs starts up OK but automount -m shows no data at all for the maps. This also surprises me because I figured they'd be cached. As soon as the network returns the maps appear.
Is this the normal behavior? Am I expecting sssd to cache things it isn't supposed to be caching?
- J<
On Tue, Mar 29, 2016 at 01:14:10PM -0500, Jason L Tibbitts III wrote:
I'm trying to better understand how sssd caches automount maps. I've had everything working for quite some time; autofs will get the maps (which are stored in ldap) from sssd when the system boots and in general there are no issues. However, there are a couple of problems I've been having:
- Autofs is mostly nonfunctional When a system boots without the network.
This is not expected.
- Even with a network, autofs sometimes simply fails to start with: automount[917]: setautomntent: lookup(sss): setautomntent: No such file or directory
This might be OK, it's an annoying error message that we might want to fix, but basically automounter is querying SSSD for more maps until SSSD returns ENOENT. What you're seeing is just strerror(ENOENT), which indicates all maps were enumerated, but not necessarilly that there were no maps.
With no network, autofs starts but the only thing which operates is the /net map because that's in /etc/auto.master. The rest of the maps simply aren't there, and if the network returns, autofs doesn't notice (which I know is an issue with autofs).
I thought that sssd would be able to cache the master map entries which come from ldap when the network is offline, but that doesn't seem to be the case. I know that it couldn't actually mount anything, but if it at had access to a cached version the map, it would at least start properly.
Yes, that's what I would expect, too. It would be nice to see the autofs responder logs. I think you could also simulate this w/o restart by telling SSSD to go offline with SIGUSR1 and then running automounter -m.
The startup failure is rather difficult to reproduce. My guess is that autofs is simply coming up before sssd is ready to provide the master map, but I haven't been able to confirm that. I also haven't been able to see how much of autofs is actually functioning in that case (because people keep rebooting their desktops before I can examine one in detail).
Do you know if SSSD is running at that point? Isn't it 'just' a race between sssd and autofs services?
By the way, this reminds me of https://bugzilla.redhat.com/show_bug.cgi?id=1113639 but that issue only happens with an empty cache, it looks like sssd doesn't even revert to cache in your case.
By skipping ldap and storing the entire master map in /etc/auto.master, everything seems to work better. At least, the first problem goes away entirely, and I haven't yet been able to reproduce the second problem.
With a local auto.master and no network, autofs starts up OK but automount -m shows no data at all for the maps. This also surprises me because I figured they'd be cached. As soon as the network returns the maps appear.
Is this the normal behavior? Am I expecting sssd to cache things it isn't supposed to be caching?
- J<
sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-users@lists.fedorahosted.org
"JH" == Jakub Hrozek jhrozek@redhat.com writes:
JH> Yes, that's what I would expect, too. It would be nice to see the JH> autofs responder logs.
I can crank up the debugging. At least this one is trivial to reproduce (just boot with the network unplugged).
JH> Do you know if SSSD is running at that point? Isn't it 'just' a race JH> between sssd and autofs services?
It's not really easy to tell, but here's a snippet from a log (just the first entry I found on a machine where I know this has happened):
Mar 28 11:45:24 ld65.e.math.uh.edu sssd[745]: Starting up Mar 28 11:45:24 ld65.e.math.uh.edu sssd[be[default]][750]: Starting up Mar 28 11:45:24 ld65.e.math.uh.edu sssd[nss][765]: Starting up Mar 28 11:45:24 ld65.e.math.uh.edu sssd[pam][766]: Starting up Mar 28 11:45:24 ld65.e.math.uh.edu sssd[autofs][764]: Starting up Mar 28 11:45:24 ld65.e.math.uh.edu audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sssd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
The network came up here:
Mar 28 11:45:26 ld65.e.math.uh.edu NetworkManager[722]: <info> (enp3s0): Activation: successful, device activated.
And then autofs startup completed:
Mar 28 11:45:31 ld65.e.math.uh.edu audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=autofs
And the error:
Mar 28 11:45:31 ld65.e.math.uh.edu automount[900]: setautomntent: lookup(sss): setautomntent: No such file or directory
So it looks like sssd was at least started earlier, and the network was up by the time the error was emitted.
JH> By the way, this reminds me of JH> https://bugzilla.redhat.com/show_bug.cgi?id=1113639 but that issue JH> only happens with an empty cache, it looks like sssd doesn't even JH> revert to cache in your case.
It did just occur to me that maybe I've configured the cache time way down:
autofs_provider = ldap ldap_autofs_entry_value = nisMapEntry ldap_autofs_entry_object_class = nisObject ldap_autofs_map_object_class = nisMap ldap_autofs_map_name = nisMapName ldap_autofs_entry_key = cn entry_cache_autofs_timeout = 60
I did this way back when because machines were hanging onto old maps for a really long time after I changed them (usually when moving home directories around). Might this have something to do with the issue?
- J<
On Tue, Mar 29, 2016 at 03:49:00PM -0500, Jason L Tibbitts III wrote:
"JH" == Jakub Hrozek jhrozek@redhat.com writes:
JH> Yes, that's what I would expect, too. It would be nice to see the JH> autofs responder logs.
I can crank up the debugging. At least this one is trivial to reproduce (just boot with the network unplugged).
JH> Do you know if SSSD is running at that point? Isn't it 'just' a race JH> between sssd and autofs services?
It's not really easy to tell, but here's a snippet from a log (just the first entry I found on a machine where I know this has happened):
Mar 28 11:45:24 ld65.e.math.uh.edu sssd[745]: Starting up Mar 28 11:45:24 ld65.e.math.uh.edu sssd[be[default]][750]: Starting up Mar 28 11:45:24 ld65.e.math.uh.edu sssd[nss][765]: Starting up Mar 28 11:45:24 ld65.e.math.uh.edu sssd[pam][766]: Starting up Mar 28 11:45:24 ld65.e.math.uh.edu sssd[autofs][764]: Starting up Mar 28 11:45:24 ld65.e.math.uh.edu audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sssd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
The network came up here:
Mar 28 11:45:26 ld65.e.math.uh.edu NetworkManager[722]: <info> (enp3s0): Activation: successful, device activated.
And then autofs startup completed:
Mar 28 11:45:31 ld65.e.math.uh.edu audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=autofs
And the error:
Mar 28 11:45:31 ld65.e.math.uh.edu automount[900]: setautomntent: lookup(sss): setautomntent: No such file or directory
So it looks like sssd was at least started earlier, and the network was up by the time the error was emitted.
JH> By the way, this reminds me of JH> https://bugzilla.redhat.com/show_bug.cgi?id=1113639 but that issue JH> only happens with an empty cache, it looks like sssd doesn't even JH> revert to cache in your case.
It did just occur to me that maybe I've configured the cache time way down:
autofs_provider = ldap ldap_autofs_entry_value = nisMapEntry ldap_autofs_entry_object_class = nisObject ldap_autofs_map_object_class = nisMap ldap_autofs_map_name = nisMapName ldap_autofs_entry_key = cn entry_cache_autofs_timeout = 60
I did this way back when because machines were hanging onto old maps for a really long time after I changed them (usually when moving home directories around). Might this have something to do with the issue?
I don't think so, even if the cache was expired, then we should try to fetch the maps, but on failure to do so, we should fall back to the cache.
Note I am experiencing the same behaviour - automounter tends to stick with the old maps for quite a long time after I change them in AD. Not sure if it is bug in SSSD though - probably not, as 'automount -m' shows maps correctly.
I mean, eventually automounter does what is expected (i.e. relocate to the new mount), but it just takes quite a long time.
I did this way back when because machines were hanging onto old maps for a really long time after I changed them (usually when moving home directories around). Might this have something to do with the issue?
I don't think so, even if the cache was expired, then we should try to fetch the maps, but on failure to do so, we should fall back to the cache. _______________________________________________ sssd-users mailing list sssd-users@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-users@lists.fedorahosted.org
-----
The information contained in this e-mail and in any attachments is confidential and is designated solely for the attention of the intended recipient(s). If you are not an intended recipient, you must not use, disclose, copy, distribute or retain this e-mail or any part thereof. If you have received this e-mail in error, please notify the sender by return e-mail and delete all copies of this e-mail from your computer system(s). Please direct any additional queries to: communications@s3group.com. Thank You. Silicon and Software Systems Limited (S3 Group). Registered in Ireland no. 378073. Registered Office: South County Business Park, Leopardstown, Dublin 18.
The "setautomntent: No such file or directory" issue just happened again. automount -m shows pretty much what I'd expect it to show; all of the expected maps are there. And /net works (it's the only map on this machine that's specified in auto.master).
Restarting autofs will of course fix things. I'm going to turn up autofs and sssd debugging and then reboot in the hopes that this issue will happen again.
[...]
And it did. How odd. In fact, it seems that once this happens at boot, it will continue to happen through multiple reboots until I restart autofs manually with the system booted.
I set debug_level = 9 in the autofs and domain/default sections of sssd.conf. However, I've no idea what I'm looking for. If I look in sssd_autofs.log for the time when the error did occur, I can see the request and the following bits:
(Tue Mar 29 18:06:18 2016) [sssd[autofs]] [lookup_automntmap_step] (0x2000): Looking up automount maps from the DP (Tue Mar 29 18:06:18 2016) [sssd[autofs]] [setautomntent_send] (0x2000): lookup_automntmap_step is refreshing the cache, re-entering the mainloop (Tue Mar 29 18:06:18 2016) [sssd[autofs]] [sss_dp_get_reply] (0x1000): Got reply from Data Provider - DP error code: 1 errno: 11 error message: Fast reply - offline (Tue Mar 29 18:06:18 2016) [sssd[autofs]] [lookup_automntmap_cache_updated] (0x0020): Unable to get information from Data Provider Error: 1, 11, Fast reply - offline Will try to return what we have in cache
As a note, this is current Fedora 23, sssd-1.13.3-5.fc23.x86_64.
I don't want to spam the list with the full logs but I can send them personally if it would help. I do need to return the machine to service, though, so I'm not going to be able to do any debugging until this happens again.
- J<
On Tue, Mar 29, 2016 at 06:17:22PM -0500, Jason L Tibbitts III wrote:
The "setautomntent: No such file or directory" issue just happened again. automount -m shows pretty much what I'd expect it to show; all of the expected maps are there. And /net works (it's the only map on this machine that's specified in auto.master).
Restarting autofs will of course fix things. I'm going to turn up autofs and sssd debugging and then reboot in the hopes that this issue will happen again.
[...]
And it did. How odd. In fact, it seems that once this happens at boot, it will continue to happen through multiple reboots until I restart autofs manually with the system booted.
I set debug_level = 9 in the autofs and domain/default sections of sssd.conf. However, I've no idea what I'm looking for. If I look in sssd_autofs.log for the time when the error did occur, I can see the request and the following bits:
(Tue Mar 29 18:06:18 2016) [sssd[autofs]] [lookup_automntmap_step] (0x2000): Looking up automount maps from the DP (Tue Mar 29 18:06:18 2016) [sssd[autofs]] [setautomntent_send] (0x2000): lookup_automntmap_step is refreshing the cache, re-entering the mainloop (Tue Mar 29 18:06:18 2016) [sssd[autofs]] [sss_dp_get_reply] (0x1000): Got reply from Data Provider - DP error code: 1 errno: 11 error message: Fast reply - offline (Tue Mar 29 18:06:18 2016) [sssd[autofs]] [lookup_automntmap_cache_updated] (0x0020): Unable to get information from Data Provider Error: 1, 11, Fast reply - offline Will try to return what we have in cache
As a note, this is current Fedora 23, sssd-1.13.3-5.fc23.x86_64.
I don't want to spam the list with the full logs but I can send them personally if it would help. I do need to return the machine to service, though, so I'm not going to be able to do any debugging until this happens again.
OK, feel free to send the logs to me please.
sssd-users@lists.fedorahosted.org