On 06/06/14 at 02:08pm, Vivek Goyal wrote:
On Fri, Jun 06, 2014 at 01:55:09PM +0800, WANG Chao wrote:
> On 06/04/14 at 09:57am, Vivek Goyal wrote:
> > On Wed, Jun 04, 2014 at 11:13:45AM +0800, WANG Chao wrote:
> >
> > [..]
> > > > > if [ $_ret -ne 0 ]; then
> > > > > + echo "ssh failed after multiple tries"
> > > > > echo "Could not create $DUMP_TARGET:$SAVE_PATH, you
probably need to run \"kdumpctl propagate\"" >&2
> > > >
> > > > Hold on. So assume that network is up but keys are not propagated or
keys
> > > > are not valid, we will still keep on retyring? That does not sound
right.
> > > >
> > > > We need to retry only if network interface is not up. If ssh fails
because
> > > > of no keys or wrong keys, then we should not retry.
> > >
> > > I'm not sure how can we do this, the return code from ssh is always
255
> > > in any case of failure, ie. wrong key, no key, network issue.
> >
> > Hey from DUMP_TARGET, can't we figure out which local network interface
> > it is routed through and then check the status of that network interface?
>
> When network isn't ready, we can't really figure out which interface
> routes to DUMP_TARGET.
>
> There can be situations that local network is up, but there's something
> wrong with the network connection between the host and local system, or
> host network is initializing.
I think we need to ask networking folks and also check how apache waits
for the interfaces.
>
> In this case, should we fail right away without trying for a few more time?
> So I'm not too particular to stop trying when local network is up and
> ssh fails.
>
> I think it's not too bad to fail after 180 seconds. If it's a
> configuration issue (wrong key, no key..), user could fix it after the
> first time the kdump service fails, and the next time there would be no
> such issues and the retry will be only for polling network connection.
In simplest form we could probably use something like "ping" and try to
ping target.
But this will have issue if target has specified that don't respond to
ping requests.
Yep, that could be the case...
>
> What do you think?
I am really not convinced that if keys are wrong that we should continue
to retry. Expect string of bugs on this.
The question is how we can distinguish the case of wrong keys and
network disconnection. ssh utility always returns 255 in failure.
What's more is network disconnection can be various reasons:
- local network isn't ready yet (no ip address)
- host network isn't ready yet.
- network connection somehow fails:
- router isn't working this time.
- packet lost because connection isn't stable.
I agree that we should treat the issue of wrong keys differently from
other issues. But the question is how we can seperate. As long as it's
figured out, we can handle this kind of failure differently ...
Thanks
WANG Chao