Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM
by Ben Cotton
https://fedoraproject.org/wiki/Changes/EnableEarlyoom
== Summary ==
Install earlyoom package, and enable it by default. This will cause
the kernel oomkiller to trigger sooner, but will not affect which
process it chooses to kill off. The idea is to recover from out of
memory situations sooner, rather than the typical complete system hang
in which the user has no other choice but to force power off.
== Owner ==
* Name: [[User:chrismurphy| Chris Murphy]]
* Email: bugzilla(a)colorremedies.com
== Detailed Description ==
Workstation working group has discussed "better interactivity in
low-memory situations" for some months. In certain use cases,
typically compiling, if all RAM and swap are completely consumed,
system responsiveness becomes so abysmal that a reasonable user can
consider the system "lost", and resorts to forcing a power off. This
is objective a very bad UX. The broad discussion of this problem, and
some ideas for near term and long term solutions, is located here:
Recent long discussions on "Better interactivity in low-memory situations"<br>
https://pagure.io/fedora-workstation/issue/98<br>
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o...<br>
Fedora editions and spins, have the in-kernel OOM (out-of-memory)
manager enabled. The manager's concern is keeping the kernel itself
functioning. It has no concern about user space function or
interactivity. This proposed change attempts to improve the user
experience, in the short term, by triggering the in-kernel process
killing mechanism, sooner. Instead of the system becoming completely
unresponsive for tens of minutes, hours or days, the expectation is an
offending process (determined by oom_score, same as now) will be
killed off within seconds or a few minutes. This is an incremental
improvement in user experience, but admittedly still suboptimal. There
is additional work on-going to improve the user experience further.
Workstation working group discussion specific to enabling earlyoom by default
https://pagure.io/fedora-workstation/issue/119
Other in-progress solutions:<br>
https://gitlab.freedesktop.org/hadess/low-memory-monitor<br>
Background information on this complicated problem:<br>
https://www.kernel.org/doc/gorman/html/understand/understand016.html<br>
https://lwn.net/Articles/317814/<br>
== Benefit to Fedora ==
There are two major benefits to Fedora:
* improved user experience by more quickly regaining control over
one's system, rather than having to force power off in low-memory
situations where there's aggressive swapping. Once a system becomes
unresponsive, it's completely reasonable for the user to assume the
system is lost, but that includes high potential for data loss.
* reducing forced poweroff as the main work around will increase data
collection, improving understanding of low memory situations and how
to handle them better
== Scope ==
* Proposal owners:
a. Modify {{code|https://pagure.io/fedora-comps/blob/master/f/comps-f32.xml.in}}
to include earlyoom package for Workstation.<br>
b. Modify {{code|https://src.fedoraproject.org/rpms/fedora-release/blob/master/f/80-workstation.preset}}
to include:
<pre>
# enable earlyoom by default on workstation
enable earlyoom.service
</pre>
* Other developers:
Restricted to Workstation edition, unless other editions/spins want to opt-in.
* Release engineering: [https://pagure.io/releng/issues #9141] (a
check of an impact with Release Engineering is needed) <!-- REQUIRED
FOR SYSTEM WIDE CHANGES -->
* Policies and guidelines: N/A
* Trademark approval: N/A
== Upgrade/compatibility impact ==
earlyoom.service will be enabled on upgrade. An upgraded system should
exhibit the same behaviors as a clean installed system.
== How To Test ==
* Fedora 30/31 users can test today, any edition or spin:<br>
{{code|sudo dnf install earlyoom}}<br>
{{code|sudo systemctl enable --now earlyoom}}
And then attempt to cause an out of memory situation. Examples:<br>
{{code|tail /dev/zero}}<br>
{{code|https://lkml.org/lkml/2019/8/4/15}}
* Fedora Workstation 32 (and Rawhide) users will see this service is
already enabled. It can be toggled with {{code|sudo systemctl
start/stop earlyoom}} where start means earlyoom is running, and stop
means earlyoom is not running.
== User Experience ==
The most egregious instances this change is trying to mitigate:
a. RAM is completely used
b. Swap is completely used
c. System becomes unresponsive to the user as swap thrashing has ensued
--> earlyoom disabled, the user often gives up and forces power off
(in my own testing this condition lasts >30 minutes with no kernel
triggered oom killer and no recovery)
--> earlyoom enabled, the system likely still becomes unresponsive but
oom killer is triggered in much less time (seconds or a few minutes,
in my testing, after less than 10% RAM and 10% swap is remaining)
earlyoom starts sending SIGTERM once both memory and swap are below
their respective PERCENT setting, default 10%. It sends SIGKILL once
both are below their respective KILL_PERCENT setting, default 5%.
The package includes configuration file /etc/default/earlyoom which
sets option {{code|-r 60}} causing a memory report to be entered into
the journal every minute.
== Dependencies ==
earlyoom package has no dependencies
== Contingency Plan ==
* Contingency mechanism: Owner will revert all changes
* Contingency deadline: Final freeze
* Blocks release? No
* Blocks product? No
== Documentation ==
{{code|man earlyoom}}<br><br>
https://www.kernel.org/doc/gorman/html/understand/understand016.html
== Release Notes ==
Earlyoom service is enabled by default, which will cause kernel
oom-killer to trigger sooner. To revert to previous behavior:<br>
{{code|sudo systemctl disable earlyoom.service}}
And to customize see {{code|man earlyoom}}.
--
Ben Cotton
He / Him / His
Fedora Program Manager
Red Hat
TZ=America/Indiana/Indianapolis
3 years, 10 months
Remote wipe options for Fedora?
by Martin Langhoff
Are there options for remote-wipe features for Fedora (or RHEL for that
matter)?
Ideally something integrated into the early boot process, as well as a
persistent service that is non-trivial to tamper with. It would naturally
need a network/internet based service as control point.
Googling and searching the mailing list has not turned any leads.
It is a can of worms, naturally, and I am well aware of limitations, and
tricky tradeoffs in remote-wipe schemes. For some use cases, including one
affecting me, it can reduce attack surface. I am hoping that some solutions
exist, I would be happy to improve, package, integrate...
regards,
martin
--
martin.langhoff(a)gmail.com
- ask interesting questions ~ http://linkedin.com/in/martinlanghoff
- don't be distracted ~ http://github.com/martin-langhoff
by shiny stuff
3 years, 10 months
z3 soname bump
by Jerry James
I will soon push a change to the z3 package, in Rawhide only, which
will result in an soname bump. The actual contents of libz3 will not
change, however. The only Fedora consumer outside of the z3 package
itself is cppcheck, which currently fails to build due to the recent
cmake change. If the cppcheck maintainers want me to try to fix that,
I will do so; otherwise, I am happy to let them fix it themselves.
The gory details for those interested:
The z3 project has two build systems: an old one, based on generating
Makefiles with python scripts, and a new one based on cmake. We have
been using the old one, because the cmake build system does not
support building z3's OCaml interface. However, the old build system
has a number of ... features ... that we had to work around, leading
to a fair amount of uncleanness in the spec file.
I have decided to take the plunge and switch to using the cmake build
system, with manual steps afterward to build the OCaml interface. The
old build system gave the z3 library an soname of "libz3.so", which
the spec file modified to be "libz3.so.0", with a versioned library
libz3.so.0.0.0. The cmake build system gives the library an soname of
"libz3.so.4.8" (where 4 and 8 are the major and minor version numbers,
respectively), with a versioned library libz3.so.4.8.8.0. This lets
me throw out a bunch of cruft, while introducing a much smaller amount
of cruft due to the OCaml interface. I think it's a win.
--
Jerry James
http://www.jamezone.org/
3 years, 10 months
ar (binutils) segfaulting in Rawhide - known bug?
by Richard W.M. Jones
Just upgraded a development machine to:
binutils-2.34.0-10.fc33.x86_64
gcc-10.1.1-2.fc33.x86_64
glibc-2.31.9000-21.fc33.x86_64
and a very simple C compile (non-LTO) is now segfaulting:
make[3]: Entering directory '/home/rjones/d/nbdkit/common/protocol'
/bin/sh ../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c -o libprotocol_la-protostrings.lo `test -f 'protostrings.c' || echo './'`protostrings.c
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -fPIC -DPIC -o .libs/libprotocol_la-protostrings.o
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -o libprotocol_la-protostrings.o >/dev/null 2>&1
mv -f .deps/libprotocol_la-protostrings.Tpo .deps/libprotocol_la-protostrings.Plo
/bin/sh ../../libtool --tag=CC --mode=link gcc -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -O0 -g -Wp,-U_FORTIFY_SOURCE -o libprotocol.la libprotocol_la-protostrings.lo
libtool: link: ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o
../../libtool: line 1734: 2572327 Segmentation fault (core dumped) ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o
Core was generated by `ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000000000 in ?? ()
binutils-2.34.0-10.fc33.x86_64
(gdb) bt
Missing separate debuginfos, use: dnf debuginfo-install#0 0x0000000000000000 in ?? ()
#1 0x00007f15bd3e03d0 in make_relative_prefix_1.part ()
from /lib64/libbfd-2.34.0.20200522.so
#2 0x00007f15bd3d22db in bfd_plugin_object_p.lto_priv ()
from /lib64/libbfd-2.34.0.20200522.so
#3 0x00007f15bd3401ce in bfd_check_format_matches ()
from /lib64/libbfd-2.34.0.20200522.so
#4 0x00007f15bd340e7a in _bfd_write_archive_contents ()
from /lib64/libbfd-2.34.0.20200522.so
#5 0x00007f15bd348b2a in bfd_close () from /lib64/libbfd-2.34.0.20200522.so
#6 0x0000559ee83994b6 in write_archive ()
#7 0x0000559ee8396ac3 in main ()
I can't find any BZ for this. Any ideas what it could be?
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines. Supports shell scripting,
bindings from many languages. http://libguestfs.org
3 years, 10 months
Headsup: dbus 1.12.10-1.fc29 is missing systemd dbus.service file,
breaking almost everything
by Hans de Goede
Hi All,
Just a quick headsup for users following Fedora 29, the
dbus 1.12.10-1.fc29 build is missing the systemd dbus.service
file, breaking almost everything.
Instead it contains a dbus-daemon.service file, but the
dbus.socket file expects a matching dbus.service, not
dbus-daemon.service.
So either hold of on applying updates until this is fixed
or exclude dbus.
Regards,
Hans
3 years, 10 months
Lots of FTBFS bugs filed for S390x "BuildrootError: Requested repo
(1785390) is DELETED" / "rpm.error: error reading package header" errors
by Hans de Goede
Hi All,
<grumpy mode>
I just noticed that a lot my packages got a FTBFS because of
failing to build on s390x. The first set of rebuilds failed with:
"BuildrootError: Requested repo (1785390) is DELETED"
The second set of rebuilds failed with:
"rpm.error: error reading package header"
errors.
The last error was also seen quite a bit during the F32 mass rebuid ...
I just checked 3 semi-random packages of the 9 FTBFS bugs filed
sofar, still at the later a only and I already got 9! And all 3
have this issue rather then being true FTBFS errors.
Now I can try to resubmit these, and resubmit again and resubmit
again, until they succeed as I did for a bunch of packages
(but not this much) during the previous mass-rebuild, but that
seems like a significant waste of mine and other contributors time.
With me Red Hat off and my Fedora contributor head on, I really
think we need to get these s390x build issues escalated. 99%
of the reasons to support s390x is because of a certain downstream
derivative of Fedora. If they care so much about this, they really
ought to fix these s30-x build issues, which seem to have been
plaguing us for at least a full cycle now (at least the second
problem mentioned above).
Alternatively, maybe we need to re-introduce secondary arches and
make s390x build failures non fatal? I dunno but IMHO we need to
do something continuing as usual with this is IMHO not a good
answer here.
</grumpy mode>
Regards,
Hans
3 years, 10 months
Automatic logout due to quota
by Steven Grubb
Hello,
I was using my desktop system when I got logged out. After logging back in, I found this message in my logs:
Aug 1 13:08:22 x2 journal[1751]: UID 1000 exceeded its 'bytes' quota on UID 1000.
which was then followed by:
Aug 1 13:08:22 x2 dbus-broker[1751]: Peer :1.200 is being disconnected as it does not have the resources to perform an operation.
Aug 1 13:08:22 x2 dbus-broker[1751]: Peer :1.176 is being disconnected as it does not have the resources to receive a signal it subscribed to
...
then
Aug 1 13:08:22 x2 /usr/libexec/gdm-x-session[1703]: cinnamon-session[1754]: WARNING: t+7310.22685s: Lost name on bus: org.gnome.SessionManager
Aug 1 13:08:22 x2 /usr/libexec/gdm-x-session[1703]: cinnamon-session[1754]: CRITICAL: t+7310.22738s: We failed, but the fail whale is dead. Sorry....
Aug 1 13:08:22 x2 cinnamon-session[1754]: WARNING: t+7310.22685s: Lost name on bus: org.gnome.SessionManager
Aug 1 13:08:22 x2 cinnamon-session[1754]: CRITICAL: t+7310.22738s: We failed, but the fail whale is dead. Sorry....
Aug 1 13:08:23 x2 /usr/libexec/gdm-x-session[1703]: Cinnamon warning: CurrentTime used to choose focus window; focus window may not be correct
Aug 1 13:08:23 x2 cinnamon-session[1754]: WARNING: t+7310.45021s: Playing logout sound '/usr/share/cinnamon-control-center/sounds/logout.ogg'
Does anyone have any idea what this quota message is about? I don't use quotas on my system since it's not shared.
Thanks,
-Steve
3 years, 10 months
Did check 0.15.x in rawhide break packages' test suites?
by Fabio Valentini
Hi all,
I'm looking through F33 FTBFS issues, and I see an increasing number
of packages that fail to build because their test suites (using check)
fail to build with errors like this:
/usr/include/check.h:502:27: note: declared here
502 | CK_DLL_EXP void CK_EXPORT _ck_assert_failed(const char *file, int line,
| ^~~~~~~~~~~~~~~~~
path_utils/path_utils_ut.c:698:3: error: too few arguments to function
'_ck_assert_failed'
For example, gsignond-plugin-oauth, gsignond-plugin-sasl,
libaccounts-glib, signon-glib, ding-libs fail with these issues.
Any idea if that's a problem with the "check" 0.15.x builds
themselves, or if packages need to adapt to changed API here?
Fabio
3 years, 10 months