On 05/21/2018 03:51 PM, Simo Sorce wrote:
On Mon, 2018-05-21 at 11:52 +0200, Pavel Březina wrote:
> On 05/18/2018 09:50 PM, Simo Sorce wrote:
>> On Fri, 2018-05-18 at 16:11 +0200, Sumit Bose wrote:
>>> On Fri, May 18, 2018 at 02:33:32PM +0200, Pavel Březina wrote:
>>>> Hi folks,
>>>> I sent a mail about new sbus implementation (I'll refer to it as
>> Sorry Pavel,
>> but I need to ask, why a new bus instead of somthing like varlink ?
> This is an old work, we did not know about varlink until this work was
> already finished. But since we still provide public D-Bus API, we need a
> way to work with it anyway.
Ack, thanks, wasn't sure how old the approach was, so I just asked :-)
>>>> Now, I'm integrating it into SSSD. The work is quite difficult since
>>>> touches all parts of SSSD and the changes are usually interconnected but
>>>> slowly moving towards the goal .
>>>> At this moment, I'm trying to take "miminum changes" paths
so the code can
>>>> be built and function with sbus2, however to take full advantage of it,
>>>> will take further improvements (that will not be very difficult).
>>>> There is one big change that I would like to take though, that needs to
>>>> discussed. It is about how we currently handle sbus connections.
>>>> In current state, monitor and each backend creates a private sbus
>>>> The current implementation of a private sbus server is not a message bus,
>>>> only serves as an address to create point to point nameless connection.
>>>> each client must maintain several connections:
>>>> - each responder is connected to monitor and to all backends
>>>> - each backend is connected to monitor
>>>> - we have monitor + number of backends private servers
>>>> - each private server maintains about 10 active connections
>>>> This has several disadvantages - there are many connections, we cannot
>>>> broadcast signals, if a process wants to talk to other process it needs
>>>> connect to its server and maintain the connection. Since responders do
>>>> currently provider a server, they cannot talk between each other.
>> This design has a key advantage, a single process going down does not
>> affect all other processes communication. How do you recover if the
>> "switch-board" goes down during message processing with sbus ?
> The "switch-board" will be restarted and other processes will reconnect.
> The same way as it is today when one backend dies.
Yes, but what about in-flight operations ?
Both client and server will abort and retry ?
Will the server just keep around data forever ?
It'd be nice to understand the mechanics of recovery to make sure the
actual clients do not end up being impacted, by lack of service.
>>>> sbus2 implements proper private message bus. So it
can work in the same way
>>>> as session or system bus. It is a server that maintains the connections,
>>>> keep tracks of their names and then routes messages from one connection
>>>> My idea is to have only one sbus server managed by monitor.
>> This conflict wth the idea of getting rid of the monitor process, do
>> not know if this is currently still pursued but it was brought up over
>> and over many times that we might want to use systemd as the "monitor"
>> and let socket activation deal with the rest.
> I chose monitor process for the message bus, since 1) it is stable, 2)
> it is idle most of the time. However, it can be a process on its own.
Not sure that moving it to another process makes a difference, the
concern would be the same I think.
> That being said, it does not conflict with removing the
> functionality. We only leave a single message bus.
Right but at that point might as well retain monitoring ...
>>>> Other processes
>>>> will connect to this server with a named connection (e.g. sssd.nss,
>>>> sssd.backend.dom1, sssd.backend.dom2). We can then send message to this
>>>> message bus (only one connection) and set destination to name (e.g.
>>>> to invalidate memcache). We can also send signals to this bus and it
>>>> broadcast it to all connections that listens to this signals. So, it is
>>>> proper way how to do it. It will simplify things and allow us to send
>>>> signals and have better IPC in general.
>>>> I know we want to eventually get rid of the monitor, the process would
>>>> as an sbus server. It would become a single point of failure, but the
>>>> process can be restarted automatically by systemd in case of crash.
>>>> Also here is a bonus question - do any of you remember why we use
>>>> server at all?
>>> In the very original design there was a "switch-board" process
>>> received a request from one component and forwarded it to the right
>>> target. I guess at this time we didn't know a lot about DBus to
>>> implement this properly. In the end we thought it was a useless overhead
>>> and removed it. I think we didn't thought about signals to all
>>> or the backend sending requests to the frontends.
>>>> Why don't we connect to system message bus?
>>> Mainly because we do not trust it to handle plain text passwords and
>>> other credentials with the needed care.
>> That and because at some point there was a potential chicken-egg issue
>> at startup, and also because we didn't want to handle additional error
>> recovery if the system message bus was restarted.
>> Fundamentally the system message bus is useful only for services
>> offering a "public" service, otherwise it is just an overhead, and has
>> security implications.
> Thank you for explanation.
>>>> I do not see any benefit in having a private server.
>> There is no way to break into sssd via a bug in the system message bus.
>> This is one good reason, aside for the other above.
>> Fundamentally we needed a private structured messaging system we could
>> easily integrate with tevent. The only usable option back then was
>> dbus, and given we already had ideas about offering some plugic
>> interface over the message bus we went that way so we could later reuse
>> the integration.
>> Today we'd probably go with something a lot more lightweight like
>>> If I understood you correctly we not only have 'a' private server but
>>> for a typically minimal setup (monitor, pam, nss, backend).
>>> Given your arguments above I think using a private message bus would
>>> have benefits. Currently two questions came to my mind. First, what
>>> happens to ongoing requests if the monitor dies and is restarted. E.g.
>>> If the backend is processing a user lookup request and the monitor is
>>> restarted can the backend just send the reply to the freshly stared
>>> instance and the nss responder will finally get it? Or is there some
>>> state lost which would force the nss responder to resend the request?
> It works the same way as now. If backend dies, responders will reconnect
> once it is up again. So no messages are lost.
> If the message bus die, clients will reconnect and then send awaiting
> replies. Also the sbus code will be pretty much stable so it is far less
> likely to crash (of course I expect some issues during review).
So you expect requests to be still serviceable if the message bus dies.
How does a client find out if a service dies and needs to send a new
request ? Will it have to time out and try again ? Or is there any
messaging that let's a client know it has to restart asap ?
And if the message bus dies and a service dies before it comes back up
how does a client find out ?
>> How would the responder even know the other side died, is there a way
>> for clients to know that services died and all requests in flight need
>> to be resent ?
> If client send request to a destination that is not available it will
> return a specific error code. The client can decide how to deal with it
> (return cached data or an error, resend the message once it is available).
I am not concerned with messages sent while a service is down, more
about what happens while the client is waiting.
> There are D-Bus signals (NameOwnerChanged/NameOwnerLost/...) that can be
> used as well.
> Given our use case, we can queue it in message bus until the destination
> is available (this is currently not implemented, but it is doable).
It is important to recover speedily, if at any point a crash leads to a
cascade of timeouts this will be very disruptive, and will have a much
bigger impact than the current behavior.
The reconnection mechanism is the same as it is now, I just polished and
fixed one or two corner cases. Code is available here:
1) Dispatcher wants to send/read messages and it finds out that the
connection was dropped
2) We try to reconnect until we get the new connection or we reached
3) We get new connection. I am not sure if the dispatcher will process
queued messages or they are overwritten, this is internal D-Bus stuff.
The behavior is the same as with current sbus implementation.
Depending on 3), timeout may or may not occur. This needs to be tested.
>>> The second is about the overhead. Do you have any numbers
>>> longer e.g. the nss responder has to wait e.g. for a backend if offline
>>> reply? I would expect that we loose more time at other places,
>>> nevertheless it would be good to have some basic understanding about the
> This needs to be measured. But we currently implement sbus servers in
> busy processes so logically, it takes more time to process a message
> then routing from a single-purpose process.
I do not think this follows. Processing messages is relatively fast,
the problem with a 3rd process is 2 more context switches. Context
switches add latency, and trash more caches, and may cause severe
performance drop depending on the workload.
>> Latency is what e should be worried ab out, one other reason to go with
>> direct connections is that you did not have to wait for 3 processes to
>> be awake and be scheduled (client/monitor/server) but only 2
>> (client/server). On busy machines the latency can be (relatively) quite
>> high if an additional process need to be scheduled just to pass a long
>> a message.
> This needs to be measured in such environment.
Yes, it would be nice to have some numbers with clients never hitting
the fast cache and looping through requests that have to go all the way
to the backend each time. For example creating an ldap server with 10k
users each only with a private group and then issuing 10k getpwnam
requests, and see the difference between current code and new code.
Running multiple tests in the same conditions will be important.
Ie first dummy run to prime LDAP server caches, then a number of runs
to average on.
I'm not going to do this change unless the code works completely with
message bus in each backend. If we agree that this is something that can
be done, we can get numbers then and switch back if it will have