pubsubhubbub-ifying planet?

Matt Domsch Matt_Domsch at dell.com
Sun Jun 13 18:59:59 UTC 2010


There's this relatively new protocol, PubSubHubbub, in which a server
publishing an RSS feed pings a server whenever it publishes an update
to the feed's XML file.  Feed aggregators, such as Google Reader and
others, are then notified immediately when the updated feed is
available, and can thus refresh it immediately, rather than wait for
some timed cronjob to do so.

With respect to Planet Fedora, there are 2 things we _could_ do to
make it more timely.  Currently, planet.fp.o gets updated every 20
minutes by cronjob, rescanning all its feeds.

1) If those feeds were themselves publishing their PubSubHubbub address, we could:
   not rescan such feeds every 20 minutes, but only when notified that
   they have new content (plus say daily to be sure we don't miss
   something).  WordPress and others have a plugin to ping a hub, so
   that's easy for our users to do, and they may already be doing so.
2) Every time we finish publishing an updated aggregated RSS feed, we
   add a 'ping' to the public PubSubHubbub servers.  Doing so,
   aggregators could then immediately pull our updated feed.

>From a planet.fp.o publisher perspective, it's really simple.
1) Include a couple bits in the RSS feed itself: an atom namespace
   reference, and in the <channel>, an <atom:link> that references the
   hub server.  In this way, each aggregator can look at that
   atom:link tag and configure itself to subscribe to the pings when
   those feeds are updated.  This should be trivial to patch into
   Venus, our current planet software.

2) Ping a hub.  Here's a python module to do it:
   http://pypi.python.org/pypi/PubSubHubbub_Publisher/1.0 (other
   languages available too:
   http://code.google.com/p/pubsubhubbub/wiki/PublisherClients ). This
   could be done in Venus directly, or as a stand-alone program run
   right after the updated feed is published.  The python code is
   trivial too.

Doing this, subscribers using Google Reader or other advanced feed
aggregators will get new content immediately, rather than on it's
polling interval, whatever that is.


>From a planet.fp.o as subscriber perspective, it's trickier.
http://code.google.com/p/pubsubhubbub/wiki/SubscriberClients doesn't
list any simple Python libs to "just do it".

There is a plugin for the Tornado web server (which we don't currently
use), and support for Drupal and others.   This step might be "wait
and see"...

Thoughts?

Thanks,
Matt

-- 
Matt Domsch
Technology Strategist
Dell | Office of the CTO


More information about the infrastructure mailing list