https://fedorahosted.org/sssd/ticket/1584
First, I thought the race condition between sssd_nss and sss_cache should be solved by some sort of file locking mechanism, but when started working on it, the places where we needed to check for the file being locked or free were too many and spread among monitor, nss and sss_cache tool processes and it was not clear how the access is controlled.
So I decided to do it this way: 1. sss_cache tries to send_sighup to monitor as usual 2. if signal_sssd returns that sssd is not running, proceed with memcache invalidation 3. As a part of memcache invalidation: - it first opens the mc file - then it checks if sssd is running (with pgrep) - if sssd is running it stops the the process of invalidation - if sssd is still not running it proceeds with the invalidation
See that if sssd starts after (or during) the pgrep check (so we will not catch it as running, but will assume it is off) it is not a problem, because we have file descriptor associated with file that was present before sssd was running (we open the file before pgrep call). sssd_nss alwas creates a new memory cache file on startup, so we will only mark the old one as recycled (and not the new one), that we do not care about (because it will be deleted by sssd_nss and marking that as recycled is not a problem, I think -- the worst thing that can happen is race between nss and cache tool while both are marking the OLD file as recycled, but I can not figure out situation where this could be dangerous, because there are no "temp" states that could be harmful. Both processes are changing the value of status, but both to the same value).
Another thing that I like about this is that we do not have to care about communication sssd vs sssd_nss vs sss_cache but only control sssd vs sss_cache, which is much easier to understand.
Can someone see any problems that I missed?
NOTE: The function sss_mc_set_recycled is copied from different module. I did not want to make this function non static and put it to a header file because it is not intended to be used directly.
I tested it and it works fine for me.
The patch is attached.
Thanks Michal