On 24 Oct 2018, at 17:56, thierry bordaz <tbordaz(a)redhat.com>
On 10/21/2018 03:55 AM, William Brown wrote:
> Thanks for this write up Thierry,
>> On 19 Oct 2018, at 17:15, thierry bordaz <tbordaz(a)redhat.com> wrote:
>> C10K is a scalability problem that a server can face when dealing with events of
thousands of connections (i.e. clients) at the same time. Events can be new connections,
new operations on the established connections, closure of connection (from client or
>> For 389-ds, C10K problem was resolved with a new framework Nunc-Stans .
Nunc-stans was first enabled in RHDS 7.4 and improved/fixed in 7.5. Robustness issues 
and  were reported in 7.5 and it was decided to disable Nunc-stans. It is not known if
those issues exist or not in 7.4.
>> William posted a PR to fix those two issues . Nunc-stans is a complex
framework, with its own dynamic. Review of this PR is not easy and even a careful review
may not guaranty it will fix  and  and may not introduce others unexpected side
>> From there we discussed two options (but there may be others):
>> • Review and merge the PR , then later run some intensive tests aiming to
verify , and checking the robustness in order to reenable NS
> I think this is the best solution. If NS is “not working” today, then merging the PR
won’t make it “not work” any less ;)
> Given we have a reliable way to disabled it at build time, I think that merging,
testing, and then eventually discussion and decision of enabling by default is the best
The existing set of tests reveal a single failure. That shows the PR is really
promising but at the same time that tests are crucial with NS. Like the two robustness
bugs (conn leak and deadlock) only happening is some specific conditions.
I think review on large NS PR is bit formal step and the most important are
tests/measure. We may agree to push the NS patches base on existing tests results. New
tests/measure (for benefits and  and ) on top of the current PR could be the option.
Let's wait for additional feedback
Sure. I will work on Viktor's comments and found issues with paged search, and I’ll
commit some more time to running the tests if I can.
The reason I’m pushing to “just merge and test/fix later” is that at the moment I have
very little time, and I don’t want to see my work bit-rot over time, and get further and
further away. It’s going to make future fixes and improvements to the space easier once
Saying that I think it would be good to have a deadline or a list of people we want
comments from. Mark, Simon, Ludwig, Viktor, yourself and myself come to mind … we need to
know what our approach is, and this ties back to the “feature gating” discussion that I
hope you saw. We need a way to have features we are working on, so we can test and reason
about them, but without letting them out in production - and having ways to roll the back.
>> • Build some tests for
>> • measure the benefit of NS as  and  do not prevent some performance
>> • identify possible reproducers for  and 
>> • create robustness and long duration NS specific tests
>> • review and merge the PR 
>> As PR  is not intended for perf improvement, the step 2.1 will impact the
priority according to the performance benefits.
> I think that the test plan is good. 2 and 3 will be hard to reproduce I think, they
seem like they require complex state and interactions. The goal of the PR I have open is
to reduce the complexity of NS integration into DS, so that tests for situations like 2
and 3 can be more easily created (and hopefully the simpler integration is going to
resolve some issues).
> The “benefit” of NS is hard to say - because the other parts of the server core are
still needing threading improvement, NS may help some aspects (connection acceptance
performance for example), we won’t see all the of the benefits until we improve other
parts. There are many bottlenecks to fix! An example is lib globs high use of atomics
(should be COW), and the plugin architecture.
> At a theoretical level, NS will be faster as we don’t need to lock and poll over the
connection table, but today the CT may not be the bottleneck, so NS may not make this
“faster” just by enabling.
>> Comments are welcomed
>> Regarding 2.1 plan we made the following notes for the test plan:
>> The benefit of Nunc-Stans can only be measure with a large number of connections
(i.e. client) above 1000. That means a set of clients (sometime all) should keep their
connection opened. Clients should run on several hosts so that clients are not the
>> For the two types of events (new connection and new operations), the measurement
>> • Event: New connections
>> • Start all clients in parallel to establish connections (keeping them opened)
take the duration to get 1000, 2000, ... 10000 connections and check there are drop or
>> • Establish 1000 connections and monitor during to open 100 more, the same
starting with 2000, 10000
>> • Client should not run any operations during the monitoring
>> • Event: New operations
>> • Start all clients and when 1000 connections are established, launch simple
operations (e.g. search -s base -b "" objectclass) and monitor how many of them
can be handled. The same with 2000, ... 10000.
>> • response time and workqueue length could be monitored to be sure the
bottleneck are not the worker.
> At one point I started (but did not finish) ldclt integration with lib389. It would
be great to have a series of stress and acceptance tests like these available in lib389
for our release validations, so I believe that their creation is valuable regardless of
>>  http://www.port389.org/docs/389ds/design/nunc-stans.html
>>  https://bugzilla.redhat.com/show_bug.cgi?id=1608746
>>  https://bugzilla.redhat.com/show_bug.cgi?id=1605554
>>  https://pagure.io/389-ds-base/pull-request/49636
>> 389-devel mailing list -- 389-devel(a)lists.fedoraproject.org
>> To unsubscribe send an email to 389-devel-leave(a)lists.fedoraproject.org
>> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives:
> 389-devel mailing list -- 389-devel(a)lists.fedoraproject.org
> To unsubscribe send an email to 389-devel-leave(a)lists.fedoraproject.org
> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
389-devel mailing list -- 389-devel(a)lists.fedoraproject.org
To unsubscribe send an email to 389-devel-leave(a)lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines