https://bugzilla.redhat.com/show_bug.cgi?id=2185785
Bug ID: 2185785 Summary: sss_ssh_knownhostsproxy does not exit after disconnect from libssh, leaks memory Product: Fedora Version: 37 Status: NEW Component: sssd Assignee: sssd-maintainers@lists.fedoraproject.org Reporter: mpitt@redhat.com QA Contact: extras-qa@fedoraproject.org CC: abokovoy@redhat.com, atikhono@redhat.com, jhrozek@redhat.com, lslebodn@redhat.com, luk.claes@gmail.com, mzidek@redhat.com, pbrezina@redhat.com, sbose@redhat.com, ssorce@redhat.com, sssd-maintainers@lists.fedoraproject.org Target Milestone: --- Classification: Fedora
Description of problem: In https://github.com/cockpit-project/cockpit/issues/18310 we got a report of leaked sss_ssh_knownhostsproxy processes which eat up quite a lot of RAM and keep SSH connections open to target hosts even after the parent ssh client went away.
The user logs in to cockpit locally, then starts a remote cockpit session through SSH (cockpit-ssh in particular, which uses libssh), then logs out. Logging out SIGTERMs the cockpit-ssh process. That then goes away, but the sss_ssh_knownhostsproxy child doesn't exit, but gets reparented to pid 1. It also keeps the SSH connection open still.
Version-Release number of selected component (if applicable):
sssd-common-2.8.2-1.fc37.x86_64 libssh-0.10.4-2.fc37.x86_64 cockpit-bridge-289-1.fc37.x86_64
How reproducible: Always
Steps to Reproduce: 1. Join a machine to a FreeIPA domain, and log in as IPA user. This should create /etc/ssh/ssh_config.d/04-ipa.conf with a ProxyCommand for sss_ssh_knownhostsproxy 2. Set up an SSH key and add it to ~/.ssh/authorized_keys; you should be able to do "ssh `hostname`" *without* an "unknown host key" prompt (thanks to sss_ssh_knownhostsproxy) and *without* a password prompt (due to using key login). 3. dnf install cockpit-bridge 3. Run an SSH session through libssh, and kill it: (printf '\n\n\n\n\n\n'; sleep 20) | /usr/libexec/cockpit-ssh `hostname` & sleep 1 && pkill -e cockpit-ssh
Actual results:
The SSH logind session hangs on shutdown:
Since: Tue 2023-04-11 05:22:06 UTC; 1min 36s ago Leader: 2935 TTY: web console Remote: ::ffff:172.27.0.2 Service: cockpit; type web; class user State: closing Unit: session-11.scope └─3025 /usr/bin/sss_ssh_knownhostsproxy -p 22 x0.cockpit.lan
The cockpit-ssh process is gone, but there are three leaked processes:
admin@c+ 5572 0.0 0.8 16624 5632 pts/1 S 07:40 0:00 /usr/bin/sss_ssh_knownhostsproxy -p 22 x0.cockpit.lan root 5573 0.0 2.0 47060 13184 ? Ss 07:40 0:00 sshd: admin@cockpit.lan [priv] admin@c+ 5594 0.0 1.1 47060 7320 ? S 07:40 0:00 sshd: admin@cockpit.lan@notty
strace -p 5572 says
restart_syscall(<... resuming interrupted read ...>
but it's not clear from what it tries to read.
This does *not* reproduce with "ssh `hostname` sleep 20" and killing that ssh process. So this is some condition that only libssh triggers.
I know that this isn't an ideal reproducer for you. Do you have some idea how to debug that further? Enable some debug logging or so? (it's an user process, so it can't log to /var/log/sssd/)
Thanks!