On Tue, Mar 13, 2012 at 5:46 PM, Frido Roose <frido_roose@trimble.com> wrote:

On Tue, Mar 13, 2012 at 5:08 PM, David Teigland <teigland@redhat.com> wrote:
> > I increased the sanlock io_timeout to 30 seconds (default = 10), because
> > the sanlock dir is on a GFS2 volume and can be blocked for some time while
> > fencing and journal recovery takes place.

This is one of the reasons why you should not put sanlock leases on gfs2.
They should be put directly on a shared block device.


Ok, I looked over this when answering, but I was thinking the same...

 

> I'm a little bit confused about the io_timeout option for sanlock.  I
> increased the io_timeout to 30 seconds, but it seems like the overall
> initialization becomes slower now.
> libvirtd is the client through the sanlock plugin.
> sanlock runs as "sanlock daemon -R 1 -o 30"
>
> Restarting sanlock + libvirtd takes about 60 seconds before libvirtd
> acquires the lease (or at least, before libvirtd starts responding).
>
> After a reboot, for some reason, this delay increases up to 360 seconds...
>  I have no idea why this would take longer...
>
> The libvirtd guys don't seem to know why this happens... so I hope to find
> an answer on this list...  I didn't find any other timeouts that are
> configurable.  From the source code, it looks like most of the timeouts are
> based on the io_timeout.

Yes, all the timeouts are derived from the io_timeout and are dictated by
the recovery requirements and the algorithm the host_id leases are based
on: "Light-Weight Leases for Storage-Centric Coordination" by Gregory
Chockler and Dahlia Malkhi.

Here are the actual equations copied from sanlock_internal.h.
"delta" refers to host_id leases that take a long time to acquire at startup
"free" corresponds to starting up after a clean shutdown
"held" corresponds to starting up after an unclean shutdown

You should find that with 30 sec io timeout these come out to 1 min / 4 min
which you see when starting after a clean / unclean shutdown.



Thanks!  This information explains the differences in delay I encounter between a clean and unclean situation.
The reboot delay was effectively after a fence operation, so an unclean restart.

I guess having a delta_acquire_held_min of 300 seconds is to be sure that no host with this host_id would acquire the lock in the meantime.

I'm not sure if this still makes sense on top of GFS2, but that reminds me to the fact that you said sanlock was meant to be used with a block device.
Maybe this is something that the libvirtd devs need to be aware of.  I'll start a discussion about this on the libvirt list as the person who tried to help me didn't understand the delays neither.


 
 * io_timeout_seconds: defined by us
 *
 * id_renewal_seconds: defined by us
 *
 * id_renewal_fail_seconds: defined by us
 *
 * watchdog_fire_timeout: /dev/watchdog will fire without being petted this long
 * = 60 constant
 *
 * host_dead_seconds: the length of time from the last successful host_id
 * renewal until that host is killed by its watchdog.
 * = id_renewal_fail_seconds + watchdog_fire_timeout
 *
 * delta_large_delay: from the algorithm
 * = id_renewal_seconds + (6 * io_timeout_seconds)
 *
 * delta_short_delay: from the algorithm
 * = 2 * io_timeout_seconds
 *
 * delta_acquire_held_max: max time it can take to successfully
 * acquire a non-free delta lease
 * = io_timeout_seconds (read) +
 *   max(delta_large_delay, host_dead_seconds) +
 *   io_timeout_seconds (read) +
 *   io_timeout_seconds (write) +
 *   delta_short_delay +
 *   io_timeout_seconds (read)
 *
 * delta_acquire_held_min: min time it can take to successfully
 * acquire a non-free delta lease
 * = max(delta_large_delay, host_dead_seconds)
 *
 * delta_acquire_free_max: max time it can take to successfully
 * acquire a free delta lease.
 * = io_timeout_seconds (read) +
 *   io_timeout_seconds (write) +
 *   delta_short_delay +
 *   io_timeout_seconds (read)
 *
 * delta_acquire_free_min: min time it can take to successfully
 * acquire a free delta lease.
 * = delta_short_delay
 *
 * delta_renew_max: max time it can take to successfully
 * renew a delta lease.
 * = io_timeout_seconds (read) +
 *   io_timeout_seconds (write)
 *
 * delta_renew_min: min time it can take to successfully
 * renew a delta lease.
 * = 0


_______________________________________________
sanlock-devel mailing list
sanlock-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/sanlock-devel