Re: Releasing host id takes 4-6 seonds - is this epxected beahvioir?

Wednesday, 14 May 2014

----- Original Message -----
...
 From: "Nir Soffer" <nsoffer(a)redhat.com&gt;
 To: sanlock-devel(a)lists.fedorahosted.org
 Cc: "Federico Simoncelli" <fsimonce(a)redhat.com&gt;, "Allon
Mureinik" <amureini(a)redhat.com&gt;, "David Teigland"
 <teigland(a)redhat.com&gt;, "Dan Kenigsberg" <danken(a)redhat.com&gt;
 Sent: Wednesday, May 14, 2014 4:29:16 PM
 Subject: Releasing host id takes 4-6 seonds - is this epxected beahvioir?

 Hi David,

 I'm working on minimizing the time to stop vdsm domain monitors [1]. When we
 stop a monitor, the last thing it does is releasing the domain host id,
 invoking the python rem_lockspace api using sync mode.

 In my test, I have a system with 30 domain monitor threads. Signaling the
 threads to stop takes about 10 milliseconds, but joining the threads takes
 about 10 seconds.

 Profiling vdsm reveal that the time is spent in rem_lockspace call. In this
 example, there were 30 calls (one per thread), each call took 5.560 seconds
 (wall time).

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        30    0.000    0.000  166.829    5.561
        sd.py:469(BlockStorageDomain.releaseHostId)
        30    0.001    0.000  166.829    5.561
        clusterlock.py:203(SANLock.releaseHostId)
        30  166.813    5.560  166.813    5.560 sanlock:0(rem_lockspace)

 Is this expected behavior?

 Can we expect that removing a lockspace will be much faster?

 The reason we are concerned about this, is that the current stop timeout for
 vdsm is 10 seconds. So if you have many storage domains (I have seen systems
 with 40 domains in production), vdsm may be killed before all monitor threads
 are stopped, leading to orphaned lockspace, and if the machine is the spm,
 orphaned spm resource. 
Small nit: resources are associated to a process so the spm resource cannot
be orphaned.

-- 
Federico

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: Releasing host id takes 4-6 seonds - is this epxected beahvioir?