On Thu, 17 Jan 2019 15:31:36 +1100 Cameron Simpson wrote:
ISTR that the TCP connection timeout is about 30s. Can you "netstat -anp | grep SYN" during the delay and see if somethings trying a TCP connection to something now answering?
That didn't detect anything, so I used the big hammer of running strace on sshd and following all the pids and I see it connect fd 4 to /run/dbus/system_bus_socket
[pid 3275] connect(4, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, 29) = 0
Then this happens a tad later:
[pid 3275] recvmsg(4, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
This poll is apparently where things hang. I see fd 4 is connected just above to /run/dbus/system_bus_socket
[pid 3275] ppoll([{fd=4, events=POLLIN}], 1, {tv_sec=24, tv_nsec=999872000}, NULL, 8) = 0 (Timeout)
As soon as this stuff starts showing up in the trace, the login completes.
[pid 3275] openat(AT_FDCWD, "/etc/localtime", O_RDONLY|O_CLOEXEC) = 7 ...
An lsof shows that pid 1 (systemd) does have the socket open:
systemd 1 root 145u unix 0x00000000caf35a01 0t0 17299 /run/dbus/system_bus_socket type=STREAM
So it looks like systemd is just sometimes speechless (which would explain the systemctl commands hanging as well).