On Thu, Jan 23, 2014 at 10:53:51AM +0100, Marek Grac wrote:
[..]
>I think this is a problem. How would we know in advance how
>much it will take for dump to finish. And it will vary depending
>on so many things. (size of memory, speed of network etc).
You don't need to know this in advance. This is set on cluster-side
and administrator should be able to set this timeout to proper
value.
How would cluster admin know how long will it take to save dump and
what's the right value for this parameter?
>
>By default, why this value can't be very high? Or this value can act
>more like a watchdog. As long as you keep on getting tick, you keep
>resetting internal counter. If you don't get a tick (message from
>node which is saving vmcore) for 60 seconds, then you assume
>that something went wrong with the node and power cycle it.
>
>Trying to keep an upper limit of 60 seconds and assuming dump will
>finish in this time, will not help.
This is a general fence agent settings in cluster and fence_kdump is
only one that uses 'ticking' mechanism, all other should finished in
a much more fixed time. Setting this value for kdump agent is fine
as fence_kdump itself contains a different timeout mechanism which
are based on 'ticks'. I agree that it should be explained in
documentation/kbase but it is not something what can be changed on
fence agent level.
So are you saying that 60 seconds above is not total time taken to
dump. Instead it is the duration in which atleast one message from
fence_kdump should be received and timer will reset. And it should
receive another message with-in 60 seonds and it keeps going like
this.
IOW, as long as fence_kdump keeps on sending message to manager/nodes,
every 60 seconds, theoritically dump could take inifinitely long?
Thanks
Vivek