[fedmsg] Proposal on a replay mechanism

Tue Jul 9 11:35:59 UTC 2013

Quoting Ralph Bean (2013-07-09 03:08:24)
> On Mon, Jul 08, 2013 at 03:39:56PM +0200, Simon Chopin wrote:
> > Hi,
> > 
> > As some of you might know, I am the student working on adapting fedmsg
> > for Debian as part of Google Summer of Code program.
> > 
> > One of the requirements for fedmsg to be part of Debian infrastructure
> > is to be resilient in case a network link drops, as we have services
> > dispatched all over the world. Currently, if a client drops out, it has
> > no way of catching up on what happened when it was offline.
> > 
> > To solve this, I was thinking of the following: all the endpoints that
> > must be able to replay some messages should provide two URLs, say
> > tcp://foo.bar:3000 and tcp+pair://foo.bar:3001, the later listening in
> > for PAIR-type[1] connexions. The clients on the simple URL are like the
> > current clients, but the PAIR socket allow the other clients to request
> > the missing messages.
> > 
> > The query would come on the $prefix.replay.$topic topic (say,
> > org.fedoraproject.dev.replay.buildsys.build.state.change), and specify
> > the IDs to resend, or a time interval (for manual queries), and the
> > answer(s) would come on the same topic.
> > 
> > To be able to detect a missing message, the "i" field would have to be
> > topic-bound instead of being at the endpoint level.
> > 
> > Thoughts?
> > 
> > Cheers,
> > Simon
> > 
> > [1] https://learning-0mq-with-pyzmq.readthedocs.org/en/latest/pyzmq/patterns/pair.html
> 
> Hi Simon, thanks for taking this up.
> 
> I like the idea of the special replay topic.  That makes for a pretty
> clean API for requesting replay of messages.  FWIW, a patch was just
> introduced in git that adds a "uuid" field to every message in
> addition to "i".  That could be used to request specific messages.

I'm guessing it would be an extraneous way to query messages, yes.

> One problem I see is in the implementation details.  How long is an
> endpoint expected to hold on to its old messages before discarding
> them?  Whereas currently, an application that gets a fedmsg hook added
> doesn't retain much extra state as a result, this replay-request
> proposal would require a book keeping mechanism added to every
> endpoint (in our case, every mod_wsgi/httpd process, others).

Not every endpoint. For instance, I don't expect your wiki endpoints to
provide such mechanism, as the systems depending on it are not critical
(unless I'm missing something). For those that are critical, well, I'd
say it is the price to pay.
As for how long, IMO there is no single answer to the question. Each
service should have its own policy. In my initial implementation I plan
to store the messages using sqlalchemy, without time limit, but I'm open
to suggestions on how to handle this differently.

> Have you considered using the datagrepper API for a replay mechanism?
> https://apps.fedoraproject.org/datagrepper ?  Although we haven't
> implemented it in practice yet, I have been anticipating using that
> more in the future.  I.e., if a consumer crashes and comes back
> online, it could request a list of every message during that timespan
> from the central store.

The problem of having a central store for messages is that it would only
work if the problem is on the client side. If the link between the
publisher and the datagrepper is down, there will be data irremediably
lost and nobody would notice.

Regards,
Simon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: signature
URL: <http://lists.fedoraproject.org/pipermail/messaging-sig/attachments/20130709/19d95f63/attachment.sig>