On 8 Jan 2014, at 4:34 pm, Gao,Yan <ygao(a)suse.com> wrote:
On 01/08/14 12:19, Andrew Beekhof wrote:
>
> On 8 Jan 2014, at 2:42 pm, Gao,Yan <ygao(a)suse.com> wrote:
>
>> Hi Andrew,
>> On 01/08/14 05:35, Andrew Beekhof wrote:
>>>
>>> On 8 Jan 2014, at 4:58 am, Gao,Yan <ygao(a)suse.com> wrote:
>>>
>>>> Hi Andrew, David,
>>>>
>>>> This is a scenario from an user:
>>>> Two nodes with 64 DRBD resources, running the latest throttling code.
>>>> The user wants as high as possible concurrency. So they started with the
>>>> following configuration:
>>>>
>>>> LRMD_MAX_CHILDREN=64
>>>
>>> This is almost certainly a horribly inappropriate value to use.
>>> How many cores do these boxes have? I'm guessing less than 16.
>> Each of them has 24 cores.
>
> Thats actually pretty respectable :)
> Alas, it probably makes things worse for the cib - since the updates are likely
taking longer than the actual operations.
>
> These guys are going to love the new cib code :)
Really looking forward to that :-)
Have you run it yet? Should be in a good state
Regards,
Gao,Yan
>
>>
>>>
>>>> load-threshold=0%
>>>>
>>>> 1. Start/Promote 64 DRBD resources [PASS].
>>>>
>>>> 2. Shutdown one node at 09:32:51. Quite some failures like the following
>>>> are encountered when the notify actions invoke "crm_master":
>>>>
>>>> Jan 5 09:33:08 liona drbd(happy21-drbdclone)[6714]: ERROR: happy21:
>>>> Called /usr/sbin/crm_master -Q -l reboot -v 10000
>>>> Jan 5 09:33:08 liona drbd(happy21-drbdclone)[6714]: ERROR: happy21:
>>>> Exit code 107
>>>> Jan 5 09:33:08 liona drbd(happy21-drbdclone)[6714]: ERROR: happy21:
>>>> Command output:
>>>> Jan 5 09:33:08 liona lrmd[25655]: notice: operation_finished:
>>>> happy21-drbdclone_notify_0:6714:stderr [ Could not establish cib_rw
>>>> connection: Resource temporarily unavailable (11) ]
>>>> Jan 5 09:33:08 liona lrmd[25655]: notice: operation_finished:
>>>> happy21-drbdclone_notify_0:6714:stderr [ Error signing on to the CIB
>>>> service: Transport endpoint is not connected ]
>>>> ...
>>>>
>>>>
>>>> Over 1 minute after the node was issued shutdown, the throttle code
says:
>>>>
>>>> Jan 5 09:34:08 lionb crmd[13212]: notice: throttle_mode: High CIB
>>>> load detected: 0.960333
>>>>
>>>> According to the code, cib_max_cpu is 0.95 here. While apparently,
>>>> before the cib's load is founded to have exceeded 0.95, the cib has
>>>> already been overloaded.
>>>
>>> The throttling code can only do so much.
>>> By setting LRMD_MAX_CHILDREN=64, there are still around 128 updates queued
for the cib to process.
>> Indeed.
>>>
>>>>
>>>> Of course, one of the options here is to tune down
"load-threshold" to
>>>> find an appropriate value for the deployment -- It might not be very
>>>> easy to find an optimized value of it for the possible scenarios.
>>>
>>> I'll let David judge the patch, but not specifying ridiculous values for
LRMD_MAX_CHILDREN would be the best initial path forward.
>>> Its fine to want "as high as possible concurrency", but subverting
the throttling code by setting unrealistic job limits achieves the opposite.
>> Yes, agreed. We've been telling them tuning down the concurrency
>> actually could speed up the overall process.
>>
>> Thanks a lot for your comments and suggestions!
>>
>> Regards,
>> Gao,Yan
>>
>>>
>>>>
>>>> While the user sought a way to prevent such failures -- to lengthen the
>>>> listen queue of libqb's IPC:
>>>>
>>>> diff -uNr libqb/lib/util_int.h libqbfio/lib/util_int.h
>>>> --- libqb/lib/util_int.h
>>>> 2013-10-23 08:44:54.000000000 -0600
>>>> +++ libqbfio/lib/util_int.h
>>>> 2014-01-06 13:12:18.471097320 -0700
>>>> @@ -99,7 +99,7 @@
>>>> */
>>>> void qb_socket_nosigpipe(int32_t s);
>>>>
>>>> -#define SERVER_BACKLOG 5
>>>> +#define SERVER_BACKLOG 128
>>>>
>>>> #ifndef UNIX_PATH_MAX
>>>> #define UNIX_PATH_MAX 108
>>>>
>>>>
>>>> And it did help. The failures are longer encountered in the tests.
>>>>
>>>> We'd want the user to tune down the thresholds since we believe the
cib
>>>> is being overloaded. Meanwhile, apparently, with the larger listen
>>>> backlog, the cib requests get better chance to get response with some
>>>> delay, instead of being rejected immediately. Do you think this change
>>>> make sense?
>>>>
>>>> Regards,
>>>> Gao,Yan
>>>> --
>>>> Gao,Yan <ygao(a)suse.com>
>>>> Software Engineer
>>>> China Server Team, SUSE.
>>>> _______________________________________________
>>>> quarterback-devel mailing list
>>>> quarterback-devel(a)lists.fedorahosted.org
>>>>
https://lists.fedorahosted.org/mailman/listinfo/quarterback-devel
>>>
>>>
>>>
>>> _______________________________________________
>>> Pcmk-devel mailing list
>>> Pcmk-devel(a)oss.clusterlabs.org
>>>
http://oss-2.clusterlabs.org/mailman/listinfo/pcmk-devel
>>>
>>
>> --
>> Gao,Yan <ygao(a)suse.com>
>> Software Engineer
>> China Server Team, SUSE.
>>
>> _______________________________________________
>> Pcmk-devel mailing list
>> Pcmk-devel(a)oss.clusterlabs.org
>>
http://oss-2.clusterlabs.org/mailman/listinfo/pcmk-devel
>
>
>
> _______________________________________________
> Pcmk-devel mailing list
> Pcmk-devel(a)oss.clusterlabs.org
>
http://oss-2.clusterlabs.org/mailman/listinfo/pcmk-devel
>
--
Gao,Yan <ygao(a)suse.com>
Software Engineer
China Server Team, SUSE.
_______________________________________________
quarterback-devel mailing list
quarterback-devel(a)lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/quarterback-devel