I have 3 machines with clean F37 installs. One of the F37 machines has 4GB of RAM, and I maintain it as a backup and normally only log in via ssh and do dnf updates via command line. In the last few weeks this has become extremely difficult to do due to being automatically logged out, presumably by systemd-oomd. It happens even if I boot in multiuser, which ought to reduce memory use. From what little I've read and what experimentation I've done so far, it appears that being logged into a DE (maybe only GNOME or KDE?) protects against this, but non-DE logins (including ssh), and any commands running in them, are not protected. This goes against the expectation that non-DE access should be LESS likely to run out of memory, especially if there isn't even a DE running. How hard would it be for systemd-oomd to be configured to protect non-DE logins and anything running in them?
I've also read that configuring non-zram swap might be a cure. As I said, these are clean F37 installs, and if that's necessary for reasonable behavior when there's not enough RAM, the installer should be doing it automatically. In my case, I don't think that's the cause, since the free command suggests that I'm only using a fraction of both the memory and swap even when the automatic logging out is happening.
On Sun, Mar 12, 2023 at 10:38 AM Andre Robatino robatino@fedoraproject.org wrote:
I have 3 machines with clean F37 installs. One of the F37 machines has 4GB of RAM, and I maintain it as a backup and normally only log in via ssh and do dnf updates via command line. In the last few weeks this has become extremely difficult to do due to being automatically logged out, presumably by systemd-oomd.
You should look for reasons with journalctl and monitor memory usage in a terminal (using top, bpytop).
It happens even if I boot in multiuser, which ought to reduce memory use.
Again. you should be monitoring memory use instead of guessing. It would not be surprising that some recent "improvement" increased memory use.
From what little I've read and what experimentation I've done so far, it
appears that being logged into a DE (maybe only GNOME or KDE?) protects against this, but non-DE logins (including ssh), and any commands running in them, are not protected. This goes against the expectation that non-DE access should be LESS likely to run out of memory, especially if there isn't even a DE running. How hard would it be for systemd-oomd to be configured to protect non-DE logins and anything running in them?
I've also read that configuring non-zram swap might be a cure. As I said, these are clean F37 installs, and if that's necessary for reasonable behavior when there's not enough RAM, the installer should be doing it automatically. In my case, I don't think that's the cause, since the free command suggests that I'm only using a fraction of both the memory and swap even when the automatic logging out is happening.
Since you are using ssh, you should consider whether a network connection is getting dropped. My network monitoring software reports when a network interface goes offline. Recently there have been drops of inactive wifi connections with Fedora 37.
I suspect these might be a result of efforts to reduce power consumption.
It's not just ssh. It happens even if I boot in multiuser and log in via the console, so any non-DE login is affected. Sometimes, I can't even run a simple "dnf check" command (after having a previous transaction aborted by a logout) without being logged out again. And the free command indicates that I never come close to using either all the memory or swap.
On Sun, 12 Mar 2023 13:37:46 -0000 "Andre Robatino" robatino@fedoraproject.org wrote:
I have 3 machines with clean F37 installs. One of the F37 machines has 4GB of RAM, and I maintain it as a backup and normally only log in via ssh and do dnf updates via command line. In the last few weeks this has become extremely difficult to do due to being automatically logged out, presumably by systemd-oomd. It happens even if I boot in multiuser, which ought to reduce memory use. From what little I've read and what experimentation I've done so far, it appears that being logged into a DE (maybe only GNOME or KDE?) protects against this, but non-DE logins (including ssh), and any commands running in them, are not protected. This goes against the expectation that non-DE access should be LESS likely to run out of memory, especially if there isn't even a DE running. How hard would it be for systemd-oomd to be configured to protect non-DE logins and anything running in them?
I've also read that configuring non-zram swap might be a cure. As I said, these are clean F37 installs, and if that's necessary for reasonable behavior when there's not enough RAM, the installer should be doing it automatically. In my case, I don't think that's the cause, since the free command suggests that I'm only using a fraction of both the memory and swap even when the automatic logging out is happening.
I don't have any problems with any of the things that you do of F37, and I also initially log in to multiuser. This sounds like there is some configuration issue on your system causing a problem. Or could the memory be failing with intermittent faults? Or maybe you have a setting that create the problems (seems like a longshot if you are using a default install). If you can find a cause, it would be good to let the maintainers of systemd-oomd know with a bugzilla.
You could, as root, run systemctl stop systemd-oomd.service If there is in fact an OOM condition, your system will hang. But, as George said, it might be better to run one of the tops in a terminal, and see what is happening with memory (top, or htop, or itop). Or dmesg or journalctl -r I took the additional step of running systemctl mask systemd-oomd.service so that it never will run. I have never had an issue, though I do have disk based swap, so when I get close to memory issues (around 90% usage), I notice the slowdown or hear the disk activating a lot.
Does your machine have 4GB or less of RAM? If you have more, it may be much less likely to trigger. I just verified that when I log into GNOME on the machine in question, an rsync of a single large file that never works when done remotely works fine, it only fails when attempted from a non-DE login (ssh or console). This should be easy to reproduce with a VM with 1 or 2 GB of RAM (I don't know what the current minimum is). I have a F38 VM that I normally allocate 2GB and boot in multiuser, just to update. I was starting to notice the problem in that as well, so at first I increased the RAM to 4GB and didn't notice it anymore. I don't know why the VM requires less RAM than the F37 bare metal machine. Anyway, once I suspected that the problem was with the non-DE login, I lowered the allocation to 2GB and booted in graphical, and logged into GNOME, and the problem was gone. I don't like doing that, though, since the updating takes longer with all the nonessential stuff going on in GNOME (which is why I normally use multiuser). The VM is running on a different host with 16GB so I doubt this has anything to do with hardware issues.
I did try "systemctl mask systemd-oomd" yesterday, but then discovered that if I reinstall systemd-oomd-defaults, it starts running again, even though it's still masked, so that's not reliable. The same thing would probably happen on any systemd update, unless I just removed systemd-oomd-defaults itself.
BTW, I did notice that the problem was gone during the time that systemd-oomd wasn't running, so that's definitely the cause. Unfortunately, the mask command alone isn't enough to prevent it from running, I'd have to either remove systemd-oomd-defaults or edit some config files. And this really should be fixed, it's just wrong that anything running in a DE is protected from being killed, while a non-DE login, or any command run from within that, isn't.
On 12 Mar 2023, at 15:58, Andre Robatino robatino@fedoraproject.org wrote:
BTW, I did notice that the problem was gone during the time that systemd-oomd wasn't running, so that's definitely the cause. Unfortunately, the mask command alone isn't enough to prevent it from running, I'd have to either remove systemd-oomd-defaults or edit some config files. And this really should be fixed, it's just wrong that anything running in a DE is protected from being killed, while a non-DE login, or any command run from within that, isn't.
It may be dnf that is using up the memory. Depending on the complexity of the upgrade the dnf memory will go up or down. I find on my 2GiB RPi i have update packages in groups to avoid the oom caused by dnf.
users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
I've heard dnf uses a lot of memory, but in my 2GB F38 VM, I can run a large DNF transaction with no problem when logged into GNOME. On 4GB F37 bare metal, in multiuser, even updating one letter at a time isn't enough, even "dnf check" can fail. I read somewhere that GNOME and KDE are the only two DEs that have the necessary protection, but don't quote me on that. I haven't even read up on what "cgroup" means yet.
On Sun, 12 Mar 2023 15:47:48 -0000 "Andre Robatino" robatino@fedoraproject.org wrote:
Does your machine have 4GB or less of RAM? If you have more, it may be much less likely to trigger. I just verified that when I log into
Yeah, 16 GB.
GNOME on the machine in question, an rsync of a single large file that never works when done remotely works fine, it only fails when attempted from a non-DE login (ssh or console). This should be easy to reproduce with a VM with 1 or 2 GB of RAM (I don't know what the current minimum is). I have a F38 VM that I normally allocate 2GB and boot in multiuser, just to update. I was starting to notice the problem in that as well, so at first I increased the RAM to 4GB and didn't notice it anymore. I don't know why the VM requires less RAM than the F37 bare metal machine. Anyway, once I suspected that the problem was with the non-DE login, I lowered the allocation to 2GB and booted in graphical, and logged into GNOME, and the problem was gone. I don't like doing that, though, since the updating takes longer with all the nonessential stuff going on in GNOME (which is why I normally use multiuser). The VM is running on a different host with 16GB so I doubt this has anything to do with hardware issues.
This sounds like a bug, as if systemd-oomd is imposing much stricter standards when login is sshd or console. I agree with you, how can a resource intensive task like a desktop succeed when a minimal implementation fails? I think the current minimum memory is 2GB (I vaguely recall that I recently read on the test(?) list that people in some situations are having issues with that amount). Like you, I notice that dnf is using almost no memory when I do updates from multiuser. Maybe systemd-oomd is somehow interpreting that usage incorrectly?
I did try "systemctl mask systemd-oomd" yesterday, but then discovered that if I reinstall systemd-oomd-defaults, it starts running again, even though it's still masked, so that's not reliable. The same thing would probably happen on any systemd update, unless I just removed systemd-oomd-defaults itself.
That hasn't happened to me. When I check with systemctl -a -t service it is still masked, and the status of systemd-oomd is failed, dead. And that is after updates to systemd. But I don't have the package systemd-oomd-defaults installed. Reinstalling that seems like it might be a reset of the mask to the default. Or maybe, if that is the package that actually sets up OOMD, I don't even have OOMD installed. That would explain why my experience is different.
On Sun, Mar 12, 2023 at 9:38 AM Andre Robatino robatino@fedoraproject.org wrote:
I have 3 machines with clean F37 installs. One of the F37 machines has 4GB of RAM, and I maintain it as a backup and normally only log in via ssh and do dnf updates via command line. In the last few weeks this has become extremely difficult to do due to being automatically logged out, presumably by systemd-oomd. [...]
DNF is known to have excessive memory requirements lately. It started around Fedora 35.
* https://bugzilla.redhat.com/show_bug.cgi?id=1907030 * https://www.reddit.com/r/Fedora/comments/isfw07/excessive_ram_usage_during_d... * https://discussion.fedoraproject.org/t/dnf-operations-use-large-amount-of-ra... * etc.
Jeff
I just tried removing systemd-oomd-defaults and it's still possible to run systemd-oomd so I was wrong in thinking that would prevent it.
Andre Robatino composed on 2023-03-12 15:58 (UTC):
BTW, I did notice that the problem was gone during the time that systemd-oomd wasn't running, so that's definitely the cause. Unfortunately, the mask command alone isn't enough to prevent it from running, I'd have to either remove systemd-oomd-defaults or edit some config files. And this really should be fixed, it's just wrong that anything running in a DE is protected from being killed, while a non-DE login, or any command run from within that, isn't.
Just a thought: I've never found anything to be automatically running after which I ran multiple systemctl commands in order:
stop > disable > mask
I've heard that, but for me, with limited RAM, both DNF transactions and an rsync of a very large file fail in multiuser (by causing logout while they're running) while they both succeed in GNOME on the same machine.
Hi.
On Sun, 12 Mar 2023 17:33:42 +0000 "Andre Robatino" wrote:
I just tried removing systemd-oomd-defaults and it's still possible to run systemd-oomd so I was wrong in thinking that would prevent it.
Right: systemd-oomd-defaults only configures systemd-oomd.
rpm -ql --scripts systemd-oomd-defaults /usr/lib/systemd/oomd.conf.d /usr/lib/systemd/oomd.conf.d/10-oomd-defaults.conf /usr/lib/systemd/system/system.slice.d/10-oomd-per-slice-defaults.conf /usr/lib/systemd/system/user-.slice.d/10-oomd-per-slice-defaults.conf /usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf
I checked on a small VM.
With it installed, oomctl shows some CGroups monitored
root# oomctl Dry Run: no Swap Used Limit: 90.00% Default Memory Pressure Limit: 60.00% Default Memory Pressure Duration: 30s System Context: Memory: Used: 349.5M Total: 1.9G Swap: Used: 0B Total: 0B Swap Monitored CGroups: Memory Pressure Monitored CGroups: Path: /user.slice/user-0.slice Memory Pressure Limit: 50.00% <snip> Path: /user.slice/user-0.slice/user@0.service <snip> root#
With it removes. and after a reboot (needed), no CGroups monitored
root# oomctl Dry Run: no Swap Used Limit: 90.00% Default Memory Pressure Limit: 60.00% Default Memory Pressure Duration: 30s System Context: Memory: Used: 457.5M Total: 1.9G Swap: Used: 0B Total: 0B Swap Monitored CGroups: Memory Pressure Monitored CGroups: root#
Thanks. I confirmed on the affected machine that immediately after removing systemd-oomd-defaults, it was still monitoring the same CGroups, but after rebooting, it was monitoring none, even though systemd-oomd was still enabled and running.
On Sun, 12 Mar 2023 13:37:46 -0000, Andre Robatino wrote:
I have 3 machines with clean F37 installs.
Dunno yet whether it's also F37 or just F38, but here on F38 even gnome-terminal is getting killed when running a simple tar command in it that works on creating a ~20 GB archive.
I filed https://bugzilla.redhat.com/show_bug.cgi?id=2177722 for my original problem (involving non-DE logins) and it was indeed fixed with the latest systemd updates in both F37 and F38. I know other people have reported it happening in GNOME but have never seen it personally. There are closed bugs and some open bugs for what you're describing, for example
https://bugzilla.redhat.com/show_bug.cgi?id=2162708 https://bugzilla.redhat.com/show_bug.cgi?id=2173394
so check Bugzilla.