need help!

Rick Stevens ricks at alldigital.com
Mon Nov 3 17:54:11 UTC 2014


On 11/02/2014 09:04 AM, bruce issued this missive:
> Hi.
> 
> Got a network of fed/centos boxes.. The boxes are a combination of
> eth, and wifi, with ra3070 chipset.
> 
> We reverse tunnel into a couple of the boxes that are wifi, as well as eth.
> The boxes have dhcp.
> We're using a dyndns kind of service so we can access boxes via name
> instead of straight ip address.
> 
> Running into an issue where it appears that a couple of the boxes hang.
> 
> I can ssh into a box.. and when i try to enter a cmd.. the ssh/term shell hangs.
> 
> If I (sometimes) can access the same box from a different window/term
> then it appears that the box/term is back to being active..
> 
> At the same time, I can sometimes ping a box by name, and I get a "no
> route" for the name.. and if I then use the straight ip address, it
> still doesn't return..
> 
> This then seems to resolve itself after a few mins...
> 
> Any thoughts on what I can start to look at to resolve what the heck
> is going on!!
> 
> I have no access to the dhcp server, or the wifi router.
> 
> thanks guys!!
> 
> centos6.5 and I think fed 19.. the issue appears to be on the centos for now..

The most common reasons I've found for this sort of weird behaviour are:

1. The DNS service isn't keeping up with the DHCP addresses being doled
out or the DNS TTLs aren't short enough to keep up. I'd recommend that
you have the DNS administrator cut back the TTLs. This won't fix any
Windows clients (they're notorious for ignoring DNS TTLs anyway), but
it may help your situation. In highly dynamic environments, I'd set
them to 300 seconds (5 minutes).

2. The machines are not being given the correct address(es) for the DNS
server(s) via the DHCP server or they're being given incorrect routes.

3. Reverse DNS isn't working on the DNS server, so SSH can't do reverse
lookups reliably (something often overlooked).

4. Because it resolves itself over time and you get "no route to host"
errors, it's possible the addresses are being reused by DHCP quite
often and the ARP caches on the various machines aren't being purged
often enough to keep up. You can test by deleting the ARP entry for a
specific IP using "arp -d <ipaddress-to-delete>" and trying to connect
to it again.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital    ricks at alldigital.com -
- AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
-                                                                    -
-      "Microsoft is a cross between The Borg and the Ferengi.       -
-  Unfortunately they use Borg to do their marketing and Ferengi to  -
-               do their programming."  -- Simon Slavin              -
----------------------------------------------------------------------


More information about the users mailing list