Rebase lib389 and Cockpit in 1.4.3 with master branch. This was needed
to fix all the react/patternfly security vulnerabilities. Rebase was
clean except for replication changelog.
Directory Server Development Team
I have been working on converting slapd from using NSPR PR_Poll to using epoll(7), forked from release 1.4.4.
The fork can be found here https://github.com/lslile/389-ds-base/tree/epoll. The patch is also attached.
I would appreciate any feedback from the community on my progress so far and any assistance with bringing this change to completion. I also hope that it might well be integrated with James Chapman's Connection Table splitting proposal and further proposal regarding listener threading.
I believe my code still contains an error, causing it to occasionally lose track of a connection under heavy load, but I have so far been unable to find the error. It doesn't seem to happen when I have logging at SLAPI_LOG_CONNS, so it is possible I have caused or encountered a race condition.
I tried not to deviate too far from the existing code at this point, the major changes at this point are:
- Listeners moved to a listen_table (setup_epoll_listen_pds)
- listen_table is a list of Connection's so the can be handled in the same way as a client Connection
- Differentiated by Connection->conn_state = CONN_STATE_LISTEN
- Connection_Table->listen_count is no longer maintained
- Eliminates listener pd handing from setup_pr_read_pds
- Adds or removes all listener pds from epoll
- Triggered from main event loop based on connection count limits
- Connection_Table->fd is currently only maintained for listeners
- This could likely be fully eliminated, but I'm not sure what to do with signalpipe to accomplish this end
- Connection_Table->epollfd has been added to hold the epoll fd set
- Adds descriptors to epoll immediately
- Eliminates the need for setup in setup_pr_read_pds
- epoll_pr_idle_pds ( timeout related section of setup_pr_read_pds )
- Should only handle client timeouts or special cases for re-adding a descriptor to epoll
- Eliminates the needs from the remainder of setup_pr_read_pds
- setup_pr_read_pds is not used with epoll
Is epoll(7) available on all platforms supported by 389-ds? Because I don't know, I have hesitated to remove any NSPR related code at this point.
In my testing I have found that epoll is provides a measurable boost in client servicing, however my current testing methodology is not sufficiently regimented enough to provide statistically sound measurements.
I believe conversion from PR_Poll to epoll(7) fits well with the "389 ds connection management proposal" that James Chapman had raised.
When epoll is accepting a large number of concurrent connections there are obvious stalls that indicate the need for one or more listener threads to be created to separate client connection processing from connected client servicing.
I also think that James' idea of creating multiple Connection Tables could be simplified with epoll.
- Connection_Table->epollfd could be converted to an array of epoll fd sets
- one thread and epoll fd for each listener
- one thread and epoll fd for each "Connection Table" processors
- Re-balancing connections between "Connection Table" processors could then be accomplished by adding and deleting the fd in the appropriate "Connection Table" epoll fd sets
Thanks in advance for all input or assistance.
We have a slapd preoperation + extendedop plugin that largely works well
for us. We have recently discovered that it leaks a small amount of
memory, which, over time, creates a headache.
I would appreciate advice on the proper way to solve this problem. In
short, our code allocates a small bit of state information for each
connection. We register a callback with SLAPI_PLUGIN_PRE_UNBIND_FN
which cleans up that state information.
However, that function is only invoked for authenticated connections;
simple anonymous binds never invoke the unbind callback, and hence the
leak. That is, our problem seems to be when the bind is both simple
(LDAP_AUTH_NONE) and anonymous (null DN).
If we set bindSubtree to '*' (ALL_DATA), then our unbind function is
invoked for the anonymous connections. It is not clear to us what the
ramifications are of that setting, and what effect making that change
will have on our plugin. Advice would greatly appreciated.
Alternately, inspection of factory.c suggests that
slapi_register_object_extension() could be used as a way for creating
and destroying the connection specific information we need to track.
However, those callbacks appear to be intended only for slapd internal
server plugins, not for external plugins. In order to make that work,
we would need a way of connecting a Slapi_PBlock to the opaque
Connection structure that is passed to the constructor and destructor. I
imagine that we could achieve that by modifying the slapi library to
expose something like slapi_get_conn_id() which would take the opaque
Connection * pointer and return the connection id. My concern is that
needing that call is a signal that it's The Wrong Approach (TM).
If it helps, some bits of how we construct our plugin are shared here:
Clue bats welcome; any advice is appreciated.