On (20/08/13 21:50), Simo Sorce wrote:
>On Tue, 2013-08-20 at 10:32 +0200, Lukas Slebodnik wrote:
>> On (19/08/13 16:49), Simo Sorce wrote:
>> >On Mon, 2013-08-19 at 21:05 +0200, Lukas Slebodnik wrote:
>> >> MAIN_PROBLEM: We know that main problem is when hash table entry
referes
>> >> to record, which hashes are different.
>> >> (hash_key != rec->hash1 && hash_key != rec->hash2)
>> >>
>> >> This can only happend if data in hash table are changed. Data in hash
table can
>> >> be changed in two situation (adding record, removing record)
>> >>
>> >> A) Adding record to hash table:
>> >> 1) hash_table[hash_key] == MC_INVALID VAL
>> >> there is no record for hash_key and we can safely refer to new
record.
>> >> hash_table[hash_key] refers to valid first record.
>> >> MAIN_PROBLEM cannot occur in this situation.
>> >> 2) hash_table[hash_key] refers to valid object.
>> >> adding record to chain.
>> >> hash_table[hash_key] refers to valid record.
>> >> MAIN_PROBLEM cannot occur in this situation.
>> >>
>> >> AS we can see, there is no problem with adding record to hash table.
>> >>
>> >>
>> >> B) Removing record from hash table
>> >> We need to remove record from chain (function
sss_mc_rm_rec_from_chain)
>> >> 1) Record is the first in chain and *HAS NOT* successor
>> >> (rec->next == MC_INVALID_VAL)
>> >> hash_table[hash_key] will be set to MC_INVALID_VAL
>> >> this mean that there is no record for hash_key.
>> >> MAIN_PROBLEM cannot occur in this situation.
>> >> 2) Record is the first in chain and *HAS* successor
>> >> hash_table[hash_key] will be set to rec->next
>> >> *** MAIN_PROBLEM can occur ***
>> >> This will not be problem with attached patch
>> >> 3) Record is in the middle of chain.
>> >> record will be removed from chain. (nothing else)
>> >> hash_table[hash_key] refers to valid record, because
>> >> 1st record is untouched.
>> >> MAIN_PROBLEM cannot occur in this situation.
>> >> 4) Record is at the end of chain.
>> >> The situation is the same as in previous point.
>> >> MAIN_PROBLEM cannot occur in this situation.
>> >>
>> >>
>> >> We can see that only one operation is problematic (B3).
>> >>
>> >> Attached patch fixes this problematic situation. Chains will not be
changed.
>> >> on hash_table[hash_key] will refer to first valid record after removed
record.
>> >>
>> >> Simo, thank you very much for review. I hope this version is correct.
>> >
>> >After quite some thinking I couldn't find a way to break this, indeed I
>> >guess the only case where a record can remain orphaned and accessible on
>> >a chain is if it is the second record on the chain but brought there
>> >only because it is a next record on another chain, and the first record
>> >on that chain is removed.
>> >
>> >So this new patch seem to address the problem correctly, however I would
>> >like to see comments before bot the critical assignments in the code
>> >that points out that the difference in treatment is intentional (so that
>> >if someone needs to modify the code at a later time he will not
'fix'
>> >the discrepancy bymistake).
>> done.
>>
>> >
>> >Also I think we need to rewrite the comment of the commit, I still do
>> >not think it is clear enough to convey what the actual issue is.
>> >
>> >What about this:
>> >
>> >"The code uses 2 hashes for each record, but only one hash table to
>> >index them both, furthermore each record has only one single 'next'
>> >pointer.
>> >
>> >This means that in certain conditions a record main end up being on a
>> >hash chain even though its hashes do not match the hash chain. This can
>> >happen when another record 'drags' it in from another hash chain
where
>> >they both belong.
>> >
>> >If the record without matching hashes happens to be the second of the
>> >chain and the first record is removed, then the non matching record is
>> >left on the wrong chain. On removal of the non-matching record the hash
>> >chain will not be updated and the hash chain will end up pointing to an
>> >invalid slot.
>> >This slot may be later reused for another record and may not be the
>> >first slot of this new record. In this case the hash chain will point to
>> >arbitrary data and may cause issues if the slot is interpreted as the
>> >head of a record.
>> >
>> >By skipping any block that has no matching hashes upon removing the
>> >first record in a chain we insure that dangling references cannot be
>> >left in the hash table"
>>
>> I am highly influenced by debugging this, so my point of view is not objective.
>> I reused your proposal without any change.
>>
>> I also changed resolves section to
https://fedorahosted.org/sssd/ticket/2049,
>> because I used reproducers from BZ997406 and reporter used sssd from master
>> (including Michal's patch).
>>
>> Functionality of 1st patch is not changed.
>>
>> Thank you very much for review.
>
>Looks good to me!
>
>Simo.
>
Reproducer for this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=997406#c9
While I was investigating the root of the problem I wrote two helper functions,
because it was impossible to find out solution from coredump.
Those functions check each entry in hash table, whether hash_key refers to
valid slot. Functions are called after adding new record and after removing
record. I think patches can be used as a base for unit tests.
Patches are really ugly. Macro EXCEPTION causes SEGFAULT :-)
But it was an aim, because I debug optimized code at the beginning.
I had an idea about optimized code, but luckily it was not a true.
LS