William, Pierre,
Thanks for your patient explanation. I am convinced that the AL is
likely a better solution and probably easier to implement.
A last question (I promise :) ), regarding the drawback (i.e. difficult
to balance the load between split contables) I was thinking that a
counter of active connections in each sub-contable could make the job.
The counter being only in contention with the polling thread when a new
connection is detected by the listener. Do you think it would not be
good enough ?
thanks
thierry
On 12/15/21 12:07 PM, Pierre Rogier wrote:
To confirm the fact that the connection is really large:
(gdb) p sizeof (Connection)
$205 = 456
About pin a connection to a constable, the interest is a better cache
affinity because we always be using the same polling thread (but that
is mitigated by the fact that the connection is also handled by the
working threads)
And the drawback is that it is more difficult to spread the load on
the threads
On Tue, Dec 14, 2021 at 10:30 PM William Brown <william.brown(a)suse.com
<mailto:william.brown@suse.com>> wrote:
> On 14 Dec 2021, at 19:45, Thierry Bordaz <tbordaz(a)redhat.com
<mailto:tbordaz@redhat.com>> wrote:
>
>
>
> On 12/13/21 7:01 PM, Pierre Rogier wrote:
>>
>>
>> On Sun, Dec 12, 2021 at 11:45 PM William Brown
<william.brown(a)suse.com <mailto:william.brown@suse.com>> wrote:
>>
>>
>> > On 10 Dec 2021, at 22:43, Pierre Rogier <progier(a)redhat.com
<mailto:progier@redhat.com>> wrote:
>> >
>> > Hi Thierry,
>> > not sure I agree with your concern:
>> > As I understand things each listener thread is associated
with an active list
>>
>> I don't think so.
>>
>> Ouch! I see I wrote "listener" when I was thinking about
"polling threads",
>> Sorry for the confusion !
>> William, I agree with you about the advantage of having several
polling threads and have no concern about James' design.
>> I only tried to answer Thierry's concern about optimizing the
cache usage and
>> if we should split the CT (Connection table) rather than only
the AL (Active list).
>>
>> (I just do not think it is important because the connection
slot is quite large and anyway the
>> connection is constantly oscillating between the polling
thread(s) and the working threads.
>> Futhermore I do not like the idea of statically binding a
connection slot to a polling thread)
>
>
> Thanks Pierre, all you said makes sense (as usual :) ). My
understanding is that basically polling threads will run
setup_pr_read_pds/handle_pr_read_ready for their own active list.
Those routines are accessing a large set of fields of the
connection struct. It would be better, that active list pointer
(per polling thread) will be located in a different cache line
than those fields. For the same reason, I have a concern that the
active list pointer of a given polling thread being in the same
cache line as the active list pointers of the others polling
threads. I need to dive into James patches to understand if those
concern are real or not. I also admit that the potential impact is
more theoretical than proven.
Thierry, line 1641 of slap.h is the struct conn (typedef
Connection), just visually looking at it, this structure is poorly
laid out regarding padding (waste of space), and very much bigger
than 64bytes, so cache alignment is the least of our problems
here. We'd gain more just be re-arranging the fields here per C
padding rules to save space.
Cache alignment isn't a problem in the conntable either, since we
iterate the active lists via the conn->c_next pointer, not via the
conntable so we don't need to worry on that side.
>
> Regarding statically/dynamic bind connection slots, what
benefit/drawback are you thinking of ?
If we pin a connection to a conntable we are effectively splitting
the server into two, meaning one conntable can fill up and the
other is empty.
>
> best regards
> thierry
>
>>
>> That said, to fully ease thierry's concern we could easily do a
test after adding padding to the connection struct to align with
the cache line size.
>>
>> BTW: I think that it is AL instead of CT in William's diagram
>>
>>
>>
>> Where we have N listeners, and M active lists, the N listeners
can populate connections into the M active lists. (Potentially
able to skip blocking with a try_lock if the active list is
currently busy).
>>
>> If we did what you are thinking, we'd have half our conntable
for LDAPS and half for LDAP, and depending on the deployment, we
could end up with one CT completely idle, and the other full.
Which is not what we want, and not what James has done here :)
>>
>> > and active list links are in the middle of connection slot
(which is IMHO large enough to
>> > to spawn on several cache lines () So I do not think we will
really have problems with cache line (related to the multiple
listeners threads)
>> > Splitting the CT would mean that a connection would be linked
forever with a listener and may lead to have one listener
overloaded while some others are idle.
>> > The round robin (when opening the connection) solution limit
this risks and tend to spread the load over the CT at a price of
cache reload when a slot is reopen (which IMHO is a good compromise)
>> > That said, with James design it is easy to test the
"connection slot associated with fixed listener thread) by
replacing the round robin by a modulo on slot index
>> > A last point about the cache (the connection handling is not
bound to a listener but
>> > always oscillate between listener thread and worker thread
(so I suspect that having
>> > fixed or not listener will have litlle impact on the cache
handling)
>>
>> It's more about the select/poll than the listeners - rather
than select() over the full set of connections, there are multiple
threads selecting on each CT, and then setting up the connections
to be dispatched to the thread worker poll. As a result, this is
having a few benefits:
>>
>> * James marked the threads with a higher priority to the
scheduler, so they'll likely react quicker (if linux respects that
flag)
>> * It means that the size of the set being setup to select() is
smaller so lower latency before we are "selecting" again.
>> * We are able to respond quicker to each IO on an fd because
each selector() is doing half the work, meaning we get that IO
into the worker faster
>>
>>
>> This diagram may help explain it
>>
>>
>> ┌───────────┐
>> │ │
>> ┌───────┐ │ │ ┌──────────────┐
┌────────────────┐
>> │ LDAPS │──────┐ │ │ │ │ │
│
>> └───────┘ ├─────▶│ CT 1 │───▶│ Select 1 │─────┐
│ │
>> │ │ │ │ │ │ │
│
>> │ │ │ └──────────────┘ │
│ │
>> ┌───────┐ │ │ │ │ │ │
>> │ LDAPI │──────┤ └───────────┘ │ │ │
>> └───────┘ │ ┌───────────┐ ├────▶│ Worker Threads │
>> │ │ │ │ │ │
>> │ │ │ ┌──────────────┐ │
│ │
>> │ │ │ │ │ │ │
│
>> ┌───────┐ ├─────▶│ CT 2 │───▶│ Select 2 │─────┘
│ │
>> │ LDAP │──────┘ │ │ │ │ │
│
>> └───────┘ │ │ └──────────────┘
│ │
>> │ │ └────────────────┘
>> └───────────┘
>>
>> >
>> >
>> > On Fri, Dec 10, 2021 at 11:40 AM Thierry Bordaz
<tbordaz(a)redhat.com <mailto:tbordaz@redhat.com>> wrote:
>> >
>> >
>> > On 12/9/21 6:28 PM, James Chapman wrote:
>> >> Hi,
>> >>
>> >> I didn't create a PR yet, here is a link to the issue -
https://github.com/389ds/389-ds-base/issues/4812
<
https://github.com/389ds/389-ds-base/issues/4812>
>> >>
>> >> Thanks
>> >>
>> >> On Wed, Nov 24, 2021 at 10:40 PM William Brown
<william.brown(a)suse.com <mailto:william.brown@suse.com>> wrote:
>> >>
>> >>
>> >> > On 24 Nov 2021, at 22:03, James Chapman
<jachapma(a)redhat.com <mailto:jachapma@redhat.com>> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Nov 23, 2021 at 10:22 PM William Brown
<william.brown(a)suse.com <mailto:william.brown@suse.com>> wrote:
>> >> >
>> >> >
>> >> > > On 23 Nov 2021, at 23:40, James Chapman
<jachapma(a)redhat.com <mailto:jachapma@redhat.com>> wrote:
>> >> > >
>> >> > > I have done some work on 389 ds connection management
that I would appreciate the community's feedback on, you can find
attached a draft patch for review.
>> >> > >
>> >> > > Problem statement
>> >> > > Currently the connection table (CT) contains an array of
established connections that are monitored for activity by a
single process. As the number of established connections increase,
so too does the overhead of monitoring these connections. The
single process that monitors established connections for activity
becomes a bottleneck, limiting the ability of the server to handle
new connections.
>> >> > >
>> >> > > Solution
>> >> > > One solution to this problem is to segment the CT into
smaller portions, with each portion having a dedicated thread to
monitor its connections. But, rather than divide the CT into
smaller portions, the approach I prefered was to add multiple
active lists to the CT, where each active list would have its own
dedicated thread.
>> >
>> >
>> > James, I am really sorry to be back so late but I have a
concern that popup with multiple active lists within a shared CT.
>> >
>> > The CT will remain shared, so I imagine that for example
CT[1234] (slot 1234 of the connection table) will contain lists.
Let's imagine you have 10 listeners (of established connections).
Will those 10 threads access the slot CT[1234] ?
>> > If they do, then my concern is that when this slot be updated
(lock taken for example) then cache lines containing the lists
will likely be invalidated. So an listener accessing CT[1234] may
impact another listener running on another CPU. If this problem
exists I think it could be mitigate if we cache line align the
list structure but it would likely be a waste of space.
>> > What is the main concern to split the CT into chunks and give
a chunck to each listener ? is multiple lists safer/easier to
implement ?
>> >
>> >
>> >
>> > regards
>> > thierry
>> >
>> >> > >
>> >> > > Benefit
>> >> > > With a single thread monitoring each CT active list,
connections can be monitored in parallel, removing the bottleneck
mentioned above.
>> >> > > Instead of a single CT active list containing all
established connections, there will be multiple CT active lists
that share the total number of established connections between them.
>> >> > > With this change I noticed a ~20% increase in the number
of connections per second the server can handle.
>> >> >
>> >> > This is good, it really does help us here. It would be
better to move to epoll but I think that would be too invasive and
hard for the current connection code, as it would basically be a
rewrite.
>> >> >
>> >> > I did try epoll() a while ago, just to see if it performs
better than PR_Poll(), but I ran into some issue with permissions
of file descriptors, so I ditched it.
>> >> >
>> >> > But the multiple active lists I think is a much simpler
idea, especially given we can only have a single accept() anyway.
>> >> >
>> >> > Could it also be worth changing how we monitor
connections? Rather than iterating over the CT, we have a
connection on a "state" change issue that update to a channel, and
then the monitor thread aggregates all that info together to get a
snapshot of the current connection state?
>> >> >
>> >> > Yes, I can look into this.
>> >>
>> >> Happy to review that too :)
>> >>
>> >> >
>> >> > >
>> >> > > Opens
>> >> > > I tested this change with 100, 500, 1k, 5k and 10k
concurrent connections, I have found that having two CT active
lists is the optimal config. I think we should hardcode the CT
active list number to two and have it hidden from the
user/sysadmin, or would it be better as a configurable parameter?
>> >> >
>> >> > Hardcode. Every single tunable setting is something that
we then have to support til the heat death of the universe because
we have no way to "remove" support for anything. In most cases no
one will ever change, nor will they know the impact of changing it
to the same level we do.
>> >> >
>> >> > See also - research that literally says most tunables go
unused:
>> >> >
>> >> >
https://experts.illinois.edu/en/publications/hey-you-have-given-me-too-ma...
<
https://experts.illinois.edu/en/publications/hey-you-have-given-me-too-ma...
>> >> >
>> >> > That makes sense alright.
>> >> >
>> >> > I'll review the code further later, but it is worth
making
this a PR instead?
>> >> > Sure, I will harden the patch a bit and create a PR.
>> >>
>> >> No problem mate, great work :)
>> >>
>> >> >
>> >> > Thanks for your feedback
>> >> >
>> >> > >
>> >> > > Thanks
>> >> > > Jamie
>> >> > >
<connection-table-multi-lists.patch>_______________________________________________
>> >> > > 389-devel mailing list --
389-devel(a)lists.fedoraproject.org
<mailto:389-devel@lists.fedoraproject.org>
>> >> > > To unsubscribe send an email to
389-devel-leave(a)lists.fedoraproject.org
<mailto:389-devel-leave@lists.fedoraproject.org>
>> >> > > Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
<
https://docs.fedoraproject.org/en-US/project/code-of-conduct/>
>> >> > > List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
<
https://fedoraproject.org/wiki/Mailing_list_guidelines>
>> >> > > List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
<
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
>> >> > > Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
<
https://pagure.io/fedora-infrastructure>
>> >> >
>> >> > --
>> >> > Sincerely,
>> >> >
>> >> > William Brown
>> >> >
>> >> > Senior Software Engineer, Identity and Access Management
>> >> > SUSE Labs, Australia
>> >> > _______________________________________________
>> >> > 389-devel mailing list --
389-devel(a)lists.fedoraproject.org
<mailto:389-devel@lists.fedoraproject.org>
>> >> > To unsubscribe send an email to
389-devel-leave(a)lists.fedoraproject.org
<mailto:389-devel-leave@lists.fedoraproject.org>
>> >> > Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
<
https://docs.fedoraproject.org/en-US/project/code-of-conduct/>
>> >> > List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
<
https://fedoraproject.org/wiki/Mailing_list_guidelines>
>> >> > List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
<
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
>> >> > Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
<
https://pagure.io/fedora-infrastructure>
>> >> > _______________________________________________
>> >> > 389-devel mailing list --
389-devel(a)lists.fedoraproject.org
<mailto:389-devel@lists.fedoraproject.org>
>> >> > To unsubscribe send an email to
389-devel-leave(a)lists.fedoraproject.org
<mailto:389-devel-leave@lists.fedoraproject.org>
>> >> > Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
<
https://docs.fedoraproject.org/en-US/project/code-of-conduct/>
>> >> > List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
<
https://fedoraproject.org/wiki/Mailing_list_guidelines>
>> >> > List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
<
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
>> >> > Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
<
https://pagure.io/fedora-infrastructure>
>> >>
>> >> --
>> >> Sincerely,
>> >>
>> >> William Brown
>> >>
>> >> Senior Software Engineer, Identity and Access Management
>> >> SUSE Labs, Australia
>> >> _______________________________________________
>> >> 389-devel mailing list -- 389-devel(a)lists.fedoraproject.org
<mailto:389-devel@lists.fedoraproject.org>
>> >> To unsubscribe send an email to
389-devel-leave(a)lists.fedoraproject.org
<mailto:389-devel-leave@lists.fedoraproject.org>
>> >> Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
<
https://docs.fedoraproject.org/en-US/project/code-of-conduct/>
>> >> List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
<
https://fedoraproject.org/wiki/Mailing_list_guidelines>
>> >> List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
<
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
>> >> Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
<
https://pagure.io/fedora-infrastructure>
>> >>
>> >>
>> >> _______________________________________________
>> >> 389-devel mailing list --
>> >> 389-devel(a)lists.fedoraproject.org
<mailto:389-devel@lists.fedoraproject.org>
>> >>
>> >> To unsubscribe send an email to
>> >> 389-devel-leave(a)lists.fedoraproject.org
<mailto:389-devel-leave@lists.fedoraproject.org>
>> >>
>> >> Fedora Code of Conduct:
>> >>
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
<
https://docs.fedoraproject.org/en-US/project/code-of-conduct/>
>> >>
>> >> List Guidelines:
>> >>
https://fedoraproject.org/wiki/Mailing_list_guidelines
<
https://fedoraproject.org/wiki/Mailing_list_guidelines>
>> >>
>> >> List Archives:
>> >>
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
<
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
>> >>
>> >> Do not reply to spam on the list, report it:
>> >>
https://pagure.io/fedora-infrastructure
<
https://pagure.io/fedora-infrastructure>
>> > _______________________________________________
>> > 389-devel mailing list -- 389-devel(a)lists.fedoraproject.org
<mailto:389-devel@lists.fedoraproject.org>
>> > To unsubscribe send an email to
389-devel-leave(a)lists.fedoraproject.org
<mailto:389-devel-leave@lists.fedoraproject.org>
>> > Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
<
https://docs.fedoraproject.org/en-US/project/code-of-conduct/>
>> > List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
<
https://fedoraproject.org/wiki/Mailing_list_guidelines>
>> > List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
<
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
>> > Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
<
https://pagure.io/fedora-infrastructure>
>> >
>> >
>> > --
>> > --
>> >
>> > 389 Directory Server Development Team
>> > _______________________________________________
>> > 389-devel mailing list -- 389-devel(a)lists.fedoraproject.org
<mailto:389-devel@lists.fedoraproject.org>
>> > To unsubscribe send an email to
389-devel-leave(a)lists.fedoraproject.org
<mailto:389-devel-leave@lists.fedoraproject.org>
>> > Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
<
https://docs.fedoraproject.org/en-US/project/code-of-conduct/>
>> > List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
<
https://fedoraproject.org/wiki/Mailing_list_guidelines>
>> > List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
<
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
>> > Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
<
https://pagure.io/fedora-infrastructure>
>>
>> --
>> Sincerely,
>>
>> William Brown
>>
>> Senior Software Engineer, Identity and Access Management
>> SUSE Labs, Australia
>>
>> _______________________________________________
>> 389-devel mailing list -- 389-devel(a)lists.fedoraproject.org
<mailto:389-devel@lists.fedoraproject.org>
>> To unsubscribe send an email to
389-devel-leave(a)lists.fedoraproject.org
<mailto:389-devel-leave@lists.fedoraproject.org>
>> Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
<
https://docs.fedoraproject.org/en-US/project/code-of-conduct/>
>> List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
<
https://fedoraproject.org/wiki/Mailing_list_guidelines>
>> List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
<
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
>> Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
<
https://pagure.io/fedora-infrastructure>
>>
>>
>> --
>> --
>>
>> 389 Directory Server Development Team
>>
>>
>> _______________________________________________
>> 389-devel mailing list --
>> 389-devel(a)lists.fedoraproject.org
<mailto:389-devel@lists.fedoraproject.org>
>>
>> To unsubscribe send an email to
>> 389-devel-leave(a)lists.fedoraproject.org
<mailto:389-devel-leave@lists.fedoraproject.org>
>>
>> Fedora Code of Conduct:
>>
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
<
https://docs.fedoraproject.org/en-US/project/code-of-conduct/>
>>
>> List Guidelines:
>>
https://fedoraproject.org/wiki/Mailing_list_guidelines
<
https://fedoraproject.org/wiki/Mailing_list_guidelines>
>>
>> List Archives:
>>
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
<
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
>>
>> Do not reply to spam on the list, report it:
>>
https://pagure.io/fedora-infrastructure
<
https://pagure.io/fedora-infrastructure>
> _______________________________________________
> 389-devel mailing list -- 389-devel(a)lists.fedoraproject.org
<mailto:389-devel@lists.fedoraproject.org>
> To unsubscribe send an email to
389-devel-leave(a)lists.fedoraproject.org
<mailto:389-devel-leave@lists.fedoraproject.org>
> Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
<
https://docs.fedoraproject.org/en-US/project/code-of-conduct/>
> List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
<
https://fedoraproject.org/wiki/Mailing_list_guidelines>
> List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
<
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
> Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
<
https://pagure.io/fedora-infrastructure>
--
Sincerely,
William Brown
Senior Software Engineer, Identity and Access Management
SUSE Labs, Australia
--
--
389 Directory Server Development Team
_______________________________________________
389-devel mailing list -- 389-devel(a)lists.fedoraproject.org
To unsubscribe send an email to 389-devel-leave(a)lists.fedoraproject.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproje...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure