After Ludwig's suggestion to seperate the machines, and after some heartstopping lvm
issues last week, I've now established a load testing environment where I have a two
VM's, one for ldclt and one for ds.
I'm currently testing only altering the "thread" parameter, since the
initial tests showed steep changes in performance based on threads. I have run two search
tests, with one DS instance set to a single thread, and the other to 16. The VM in use is
a dual core ontop of an i5 2.9GHz, with 4GB of ram and 40GB of ssd storage.
I'm going to attach two svgs to this email for the two tests. 1837 is the single
thread results, 1942 is the 16 thread result.
There are still some quirks with perf resolving symbol names, but the stacks otherwise
look correct. I'll be submitting my tooling in a PR when I have completed some more
The idea of a flamegraph is to allow visual inspection to determine high latency or
resource consuming areas of the codebase. These flamegraphs show at an early glance some
initial changes. We should expect these changes are due to the presence of lock contention
as a single thread instance should have NO lock contention, where a multi thread instance
will show it.
connection_threadmain (our worker threads, with poll/read), shows a decrease in time
usage. This makes sense since there are no locks in connection threadmain which means that
our "delays" or latency is coming from other code paths.
We spend fractionally more time in libback. This is too be expected though, since there
are extra threads and they can be in different stages of a search, so we expect to see
slightly more concurrent threads in each frame.
We spent an increased time in acl_access_allowed 7.7% vs 5.9%. This would indicate that we
may have some delays due to ACL processing, or other locking in that region.
There is also a minor increase in time spent in log_result (3.6% to 3.9%). It's
already a bit concerning that we spend ~4% of our time in logging, when it's only I
think 3 or 4 function calls in do_search out of hundreds. The change appears to be due to
higher numbers of vsnprintf, likely due to slightly increased throughput of the multi
I think my next step will be to update and look at a lock contention profile from system
tap, as well as potentially adding timing markers to the code base and using system tap to
record function entry/exit timings to determine where latency changes are occuring. It
appears we have no single source of pain, as much as what I have suspected - lots of
little locks, throughout the code base, that cause a small, yet noticeable progression
note: you can click into the content of the .svg to "zoom" to specific function
areas and see their percentages etc.
Show replies by date