wiki madness

Toshio Kuratomi a.badger at gmail.com
Fri Nov 2 22:07:55 UTC 2007


Matt Domsch wrote:
> On Fri, Nov 02, 2007 at 11:06:11AM -0700, Toshio Kuratomi wrote:
>> Chuck Anderson wrote:
>>
>>> Won't there be performance problems with a TurboGears-based wiki?  I 
>>> thought MirrorManager was having issues with TG performance and had to 
>>> enable form-data caching to get acceptable performance at the cost of 
>>> possibly stale data.  I don't know the details behind it, but that was 
>>> the reason I was given for why when you edit forms in MM it sometimes 
>>> returns old pre-edit field values.
>>>
>> We might have performance issues but I'm confident they'll be different 
>> performance issues than we're currently experiencing ;-)
>>
>> The issues we're running into with moin right now are largely caused by 
>> Moin's philosophy of having to run off the filesystem, not a db.  This 
>> means 1) we're unable to spread the load among multiple different app 
>> servers so we are constrained to a single server's memory and CPU 
>> resources, 2) it makes multiple views of data much harder than it needs 
>> to (in the subscription list case, Moin has to walk the filesystem, 
>> finding each user's prefs file, parsing it for a watchlist, if the 
>> watchlist exists, checking if the page and page categories are in that 
>> watchlist, and finally being able to send the notification.  With a db, 
>> we'd have a separate table for the watchlist and have indexes for the 
>> userid and the pagename.  Searching for a page wouldn't have to open a 
>> file for every single one of our users, instead it would access a single 
>> table and pull out the users which were in the watchlist.)
>>
>> With MirrorManager I know we've had memory and db query speed issues 
>> trying to serve the mirrorlist directly from the TG app.  I wasn't aware 
>> that mirrormanager was having trouble keeping up with it's management 
>> functionality, Matt is that still true or is caching a leftover from 
>> when the two functions were combined?
> 
> I'm sure it's still true, it predated having any mirrorlist
> functionality at all.
> 
> The short story is, TG (well, SQLObject) either caches data very
> aggresively, so you can see stale data on changes, or not at all, so
> each field read in each row results in a DB query.  Even with
> object.sync() calls scattered through the UI actions like I did,
> leaving caching enabled we do still see stale data on occasion.
> Disabling caching, generating the UI pages or certainly the publiclist
> pages takes _forever_, hundreds of thousands of small DB queries.
> 
> Maybe SQLAlchemy has a better caching mechanism, I don't know.
> 
I've just taken an extremely quick look at this and I don't know where 
the stale data problem is coming from, but it does look like SQLObject 
could make more db calls than SQLAlchemy even with caching on.  The 
first part of this is okay::

   In [30]: import model

   In [31]: sites = model.Site.select(orderBy='name')

   In [32]: for site in sites:
      ....:     pass
      ....:
    1/Select  :  SELECT site.id, site.name, site.password, site.org_url, 
site.private, site.admin_active, site.user_active, site.created_at, 
site.created_by, site.all_sites_can_pull_from_me, 
site.downstream_comments FROM site WHERE 1 = 1 ORDER BY name
    1/QueryR  :  SELECT site.id, site.name, site.password, site.org_url, 
site.private, site.admin_active, site.user_active, site.created_at, 
site.created_by, site.all_sites_can_pull_from_me, 
site.downstream_comments FROM site WHERE 1 = 1 ORDER BY name
    1/COMMIT  :  auto

This second part is inefficient::

   In [33]: site.hosts
    1/QueryAll:  SELECT id FROM host WHERE site_id = (173)
    1/QueryR  :  SELECT id FROM host WHERE site_id = (173)
    1/COMMIT  :  auto
    1/QueryAll:  SELECT id FROM host WHERE site_id = (173)
    1/QueryR  :  SELECT id FROM host WHERE site_id = (173)
    1/COMMIT  :  auto
   Out[33]:

[Snip values of site.hosts]

   In [34]: site.hosts
    1/QueryAll:  SELECT id FROM host WHERE site_id = (173)
    1/QueryR  :  SELECT id FROM host WHERE site_id = (173)
    1/COMMIT  :  auto
    1/QueryAll:  SELECT id FROM host WHERE site_id = (173)
    1/QueryR  :  SELECT id FROM host WHERE site_id = (173)
    1/COMMIT  :  auto
   Out[34]:

The list of hosts is retrieved from the db each time the variable is 
accessed even though caching is enabled.  This will make a difference if 
  you access a variable more than once, for instance, printing all the 
site.hosts.name in a menu of links at the top of the page and then 
looping through site.hosts to print out a complete record for each.

For the stale data problem I'd have to know how to reproduce it.  Is the 
data stale when two people are editing the same information?  Is it 
stale on a page refresh?  Etc.

-Toshio




More information about the infrastructure mailing list