[389-users] Strange Disk IO issue

Thu May 17 00:48:16 UTC 2012

On 05/16/2012 04:01 PM, Rich Megginson wrote:
> On 05/16/2012 04:06 PM, Nathan Kinder wrote:
>> On 05/16/2012 01:09 PM, Brad Schuetz wrote:
>>> On 05/16/2012 11:54 AM, Nathan Kinder wrote:
>>>> On 05/16/2012 11:19 AM, Brad Schuetz wrote:
>>>>> On 05/16/2012 06:16 AM, Paul Robert Marino wrote:
>>>>>> The exact timing of the issue is to strange is there a backup job
>>>>>> running at midnight. Or some other timed job that could be eating
>>>>>> the
>>>>>> ram or disk IO. Possibly one that is reliant on ldap queries that
>>>>>> would otherwise be inocuious.
>>>>>>
>>>>>>
>>>>> It doesn't happen at midnight, it's 24 hours from when the process
>>>>> was
>>>>> started, so I can restart dirsrv at 3:17pm on Wednesday and at right
>>>>> around 3:17pm on Thursday that server will go to 100% disk IO usage.
>>>> The default tombstone purge interval is 1 day, which seems to fit what
>>>> you are seeing.  The tombstone reap thread will start every 24 hours
>>>> to find tombstone entries that can be deleted.  The default retention
>>>> period for tombstones is 1 week.  It is possible that you have a large
>>>> number of tombstone entries that need to be deleted.  This will occur
>>>> independently on all of your server instances.  This is controlled by
>>>> the "nsDS5ReplicaTombstonePurgeInterval" and "nsDS5ReplicaPurgeDelay"
>>>> attributes in your "cn=replica,cn=<suffixDN>,cn=mapping
>>>> tree,cn=config" entry.
>>>>
>>> I have no "nsDS5ReplicaTombstonePurgeInterval" value set (so it's using
>>> that default), and "nsDS5ReplicaPurgeDelay" is set to 3600
>> Ok, so this means every 24 hours, the tombstone reap thread will look
>> for tombstones older than 1 hour and remove them.
>>>
>>>
>>>> You can search for "(objectclass=nstombstone)" as Directory Manager to
>>>> see how many tombstone entries you have.
>>> I have a LOT of tombstone entries, over 200k on this one server (I'm
>>> guessing since I've been restarting the process for over a week now,
>>> not
>>> letting it run the cleanup process).
>> That's possible if you really do 200k delete operations in 1 week,
>> but that sounds like a lot.  It would seem that these tombstones have
>> been building up for a longer time than 1 week.
>>>
>>> So, any suggestions on what can I do to fix this?  The process that's
>>> reaping the entries is using too much IO making queries time out, older
>>> versions of the software did not exhibit this behavior.  In fact, I can
>>> reinitalize the entire replica faster than this thing is reaping the
>>> entries, it takes 7 minutes to reinit a replica, but when this issue
>>> first started I let the dirsrv run much longer before restarting it.
>> Due to the number of matching entries for the tombstone search, it is
>> having to walk your entire database, which is why you see the IO
>> spiking.
>
> Perhaps also try increasing nsslapd-idlistscanlimit so that it can
> hold the entire candidate list of tombstones to delete -
> http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Administration_Guide/Managing_Indexes.html#About_Indexes-Overview_of_the_Searching_Algorithm
>

Is there any way that I can remove the nsTombstone entries from the
master server so I can get this under control?  I think I found out why
I have so many nsTombstone entries in the first place so I'd like to get
the current ones deleted and see how much delete activity I'll have
moving forward.

I saw this bug report,
<https://bugzilla.redhat.com/show_bug.cgi?id=617862>, that seems to
indicate that I should be able to ldapdelete the nsTombstone entries
using their full dn, however it always says object not found.

I'd rather not fully export and reimport the master and then reinit all
the replicas if I can avoid it.

--
Brad