On 08/21/2013 09:14 AM, Jeffrey Dunham wrote:
The reason I asked about nsslapd-threadnumber is because during the time of the spike, all transactions slow.  Meaning that binds, adds, searches, ect. all start increasing in their etime until it hits the point where we've processed the majority of writes and then etimes fall back to 0.The customer in this case is doing 1k Adds to a subtree, an object with 10 attributes, three of which are indexed.  I will also try the micro second logging in test and see if I can recreate the issue and maybe see something there.  Hopefully that explanation gives you a little more insight into our issue.  I really don't want to affect other customers by this bad one.

Ok.  Please see my tuning recommendations.


"Replication on the supplier side or replication on the consumer side."
The consumer takes the burst of writes into it's on database fine through replication, but they're coming in obviously on a single replication session.  It's using the same hardware/ds version.

Replication updates are done on a single thread.


FWIW we're using 1.2.11 on RHEL5.4,
Did you build this yourself?
we're switching over to 1.3.1 on RHEL6 in a few months.
Are you planning to build this yourself?


On Wed, Aug 21, 2013 at 7:09 AM, Rich Megginson <rmeggins@redhat.com> wrote:
On 08/20/2013 08:39 PM, Jeffrey Dunham wrote:
We have a customer that has been multi-threading behind multiple servers and writing to our Master server.   These writes come in the form of heavy spikes (1k over 5 second intervals) very much burst traffic and all the writes are adding new items to the same ou.

What is the platform?  What version of 389-ds-base?  How much RAM do you have?  What is the size of your database?


While we have plans to throttle them I had a few questions:

a) If they're writing to the same ou / updating the same indexes are they blocked on one items success before another succeeds?

Yes.


So in this case multi threading behind multiple boxes does not give them any performance impact.  I would guess this is the case, but I want to be sure.  Because replication seems to be fine which goes through a single thread iirc.

Replication on the supplier side or replication on the consumer side.



b) are there any performance tweaks that can help?  I thought maybe looking at nsslapd-threadnumber.

To speed up writes?  That might help, but not much, since your bottleneck is that only one write can happen at a time.

The first thing you should do is optimize your db and entry cache usage.  You can use the https://github.com/richm/scripts/wiki/dbmon.sh script to monitor your cache usage, and find out how much RAM you need for your caches, and find out how much RAM you have left over for other tuning.

1) Try putting the db home directory on a RAM disk.  By default, bdb uses memory mapped files in /var/lib/dirsrv/slapd-INST/db.  These have to be flushed to disk.  Change nsslapd-db-home-directory to point to a RAM fs.

mkdir /dev/shm/slapd-INST ; chown nobody:nobody /dev/shm/slapd-INST ; chmod 0600 /dev/shm/slapd-INST

Then shutdown dirsrv, edit /etc/dirsrv/slapd-INST/dse.ldif
in the dn: cn=config,cn=ldbm database,cn=plugins,cn=config entry, add
nsslapd-db-home-directory: /dev/shm/slapd-INST

NOTE: This will use the amount of RAM specified by nsslapd-dbcachesize, so make sure you have enough RAM.

https://access.redhat.com/site/documentation/en-US/Red_Hat_Directory_Server/9.0/html/Configuration_Command_and_File_Reference/Database_Plug_in_Attributes.html#Database_Attributes_under_cnconfig_cnldbm_database_cnplugins_cnconfig


2) Use different physical disks for your db directory, transaction log directory, and server log directory.  If you can afford it, use a disk controller with a write back cache for the disk used for the transaction logs.

3) If you can afford the possibility of data loss, you can disable durable transactions.