Hi,
I have been working on converting slapd from using NSPR PR_Poll to using epoll(7), forked
from release 1.4.4.
The fork can be found here
https://github.com/lslile/389-ds-base/tree/epoll. The patch is
also attached.
I would appreciate any feedback from the community on my progress so far and any
assistance with bringing this change to completion. I also hope that it might well be
integrated with James Chapman's Connection Table splitting proposal and further
proposal regarding listener threading.
I believe my code still contains an error, causing it to occasionally lose track of a
connection under heavy load, but I have so far been unable to find the error. It
doesn't seem to happen when I have logging at SLAPI_LOG_CONNS, so it is possible I
have caused or encountered a race condition.
I tried not to deviate too far from the existing code at this point, the major changes at
this point are:
- Listeners moved to a listen_table (setup_epoll_listen_pds)
- listen_table is a list of Connection's so the can be handled in the same way as a
client Connection
- Differentiated by Connection->conn_state = CONN_STATE_LISTEN
- Connection_Table->listen_count is no longer maintained
- Eliminates listener pd handing from setup_pr_read_pds
- epoll_arm_listen_pds
- Adds or removes all listener pds from epoll
- Triggered from main event loop based on connection count limits
- Connection_Table->fd is currently only maintained for listeners
- This could likely be fully eliminated, but I'm not sure what to do with
signalpipe to accomplish this end
- Connection_Table->epollfd has been added to hold the epoll fd set
- handle_new_connection
- Adds descriptors to epoll immediately
- Eliminates the need for setup in setup_pr_read_pds
- epoll_pr_idle_pds ( timeout related section of setup_pr_read_pds )
- Should only handle client timeouts or special cases for re-adding a descriptor to
epoll
- Eliminates the needs from the remainder of setup_pr_read_pds
- setup_pr_read_pds is not used with epoll
Is epoll(7) available on all platforms supported by 389-ds? Because I don't know, I
have hesitated to remove any NSPR related code at this point.
In my testing I have found that epoll is provides a measurable boost in client servicing,
however my current testing methodology is not sufficiently regimented enough to provide
statistically sound measurements.
I believe conversion from PR_Poll to epoll(7) fits well with the "389 ds connection
management proposal" that James Chapman had raised.
When epoll is accepting a large number of concurrent connections there are obvious stalls
that indicate the need for one or more listener threads to be created to separate client
connection processing from connected client servicing.
I also think that James' idea of creating multiple Connection Tables could be
simplified with epoll.
- Connection_Table->epollfd could be converted to an array of epoll fd sets
- one thread and epoll fd for each listener
- one thread and epoll fd for each "Connection Table" processors
- Re-balancing connections between "Connection Table" processors could then
be accomplished by adding and deleting the fd in the appropriate "Connection
Table" epoll fd sets
Thanks in advance for all input or assistance.
--Larry