On Tue, 2013-08-20 at 10:32 +0200, Lukas Slebodnik wrote:
> On (19/08/13 16:49), Simo Sorce wrote:
> >On Mon, 2013-08-19 at 21:05 +0200, Lukas Slebodnik wrote:
> >> MAIN_PROBLEM: We know that main problem is when hash table entry referes
> >> to record, which hashes are different.
> >> (hash_key != rec->hash1 && hash_key != rec->hash2)
> >>
> >> This can only happend if data in hash table are changed. Data in hash table
can
> >> be changed in two situation (adding record, removing record)
> >>
> >> A) Adding record to hash table:
> >> 1) hash_table[hash_key] == MC_INVALID VAL
> >> there is no record for hash_key and we can safely refer to new
record.
> >> hash_table[hash_key] refers to valid first record.
> >> MAIN_PROBLEM cannot occur in this situation.
> >> 2) hash_table[hash_key] refers to valid object.
> >> adding record to chain.
> >> hash_table[hash_key] refers to valid record.
> >> MAIN_PROBLEM cannot occur in this situation.
> >>
> >> AS we can see, there is no problem with adding record to hash table.
> >>
> >>
> >> B) Removing record from hash table
> >> We need to remove record from chain (function sss_mc_rm_rec_from_chain)
> >> 1) Record is the first in chain and *HAS NOT* successor
> >> (rec->next == MC_INVALID_VAL)
> >> hash_table[hash_key] will be set to MC_INVALID_VAL
> >> this mean that there is no record for hash_key.
> >> MAIN_PROBLEM cannot occur in this situation.
> >> 2) Record is the first in chain and *HAS* successor
> >> hash_table[hash_key] will be set to rec->next
> >> *** MAIN_PROBLEM can occur ***
> >> This will not be problem with attached patch
> >> 3) Record is in the middle of chain.
> >> record will be removed from chain. (nothing else)
> >> hash_table[hash_key] refers to valid record, because
> >> 1st record is untouched.
> >> MAIN_PROBLEM cannot occur in this situation.
> >> 4) Record is at the end of chain.
> >> The situation is the same as in previous point.
> >> MAIN_PROBLEM cannot occur in this situation.
> >>
> >>
> >> We can see that only one operation is problematic (B3).
> >>
> >> Attached patch fixes this problematic situation. Chains will not be
changed.
> >> on hash_table[hash_key] will refer to first valid record after removed
record.
> >>
> >> Simo, thank you very much for review. I hope this version is correct.
> >
> >After quite some thinking I couldn't find a way to break this, indeed I
> >guess the only case where a record can remain orphaned and accessible on
> >a chain is if it is the second record on the chain but brought there
> >only because it is a next record on another chain, and the first record
> >on that chain is removed.
> >
> >So this new patch seem to address the problem correctly, however I would
> >like to see comments before bot the critical assignments in the code
> >that points out that the difference in treatment is intentional (so that
> >if someone needs to modify the code at a later time he will not 'fix'
> >the discrepancy bymistake).
> done.
>
> >
> >Also I think we need to rewrite the comment of the commit, I still do
> >not think it is clear enough to convey what the actual issue is.
> >
> >What about this:
> >
> >"The code uses 2 hashes for each record, but only one hash table to
> >index them both, furthermore each record has only one single 'next'
> >pointer.
> >
> >This means that in certain conditions a record main end up being on a
> >hash chain even though its hashes do not match the hash chain. This can
> >happen when another record 'drags' it in from another hash chain where
> >they both belong.
> >
> >If the record without matching hashes happens to be the second of the
> >chain and the first record is removed, then the non matching record is
> >left on the wrong chain. On removal of the non-matching record the hash
> >chain will not be updated and the hash chain will end up pointing to an
> >invalid slot.
> >This slot may be later reused for another record and may not be the
> >first slot of this new record. In this case the hash chain will point to
> >arbitrary data and may cause issues if the slot is interpreted as the
> >head of a record.
> >
> >By skipping any block that has no matching hashes upon removing the
> >first record in a chain we insure that dangling references cannot be
> >left in the hash table"
>
> I am highly influenced by debugging this, so my point of view is not objective.
> I reused your proposal without any change.
>
> I also changed resolves section to
https://fedorahosted.org/sssd/ticket/2049,
> because I used reproducers from BZ997406 and reporter used sssd from master
> (including Michal's patch).
>
> Functionality of 1st patch is not changed.
>
> Thank you very much for review.
Looks good to me!
Simo.
While I was investigating the root of the problem I wrote two helper functions,
because it was impossible to find out solution from coredump.
Those functions check each entry in hash table, whether hash_key refers to
valid slot. Functions are called after adding new record and after removing
record. I think patches can be used as a base for unit tests.
Patches are really ugly. Macro EXCEPTION causes SEGFAULT :-)
But it was an aim, because I debug optimized code at the beginning.
I had an idea about optimized code, but luckily it was not a true.
LS