So I went to do a dnf system-upgrade from F35 to F36 on a test machine,
as part of my usual testing. In the middle of the process, it appears
that /var filled up and that left the system in an unfortunate state.
Surprisingly (to me) it did boot with a random mix of F35 and F36
packages and even though it's a throwaway test box, I wanted to play
around with fixing it a bit and trying to understand why it ran out of
space instead of just reinstalling.
Turns out that "dnf --releasever 36 --nogpgcheck remove --duplicates"
was able to effectively everything in the system, and while running this
/var filled up again. When that happened, dnf couldn't even be aborted;
I had to kill -9. The culprit is the write-ahead log,
/var/lib/rpm/rpmdb.sqlite-wal. I resized /var and reran, and by the end
of the process had grown to over 9GB:
-rw-r--r--. 1 root root 9124576392 May 13 13:11 rpmdb.sqlite-wal
Of course it immediately went to 0 once the transaction completed,
though rpmdb.sqlite went from:
-rw-r--r--. 1 root root 281739264 May 11 14:24 rpmdb.sqlite
-rw-r--r--. 1 root root 730648576 May 13 13:15 rpmdb.sqlite
which seems... odd for what's effectively just reinstalling the existing
Anyway, obviously the solution is to make sure that /var is "big enough"
before you do a system upgrade. And we do have warnings about
filesystems being too small, but nothing about needing an extra 10GB for
this. Certainly my case might be somewhat pathological and it was good
that in the end I was able to get the system back into a useful state
without wiping it. But in the end I wonder:
1) Is it really expected that the wal file will grow to that size?
2) Is there anything to be done to reduce the size of the log?
3) Is there any better way to handle a lack of space in /var during an
4) Can we estimate how large the file will grow, and refuse to start a
system upgrade if there is not enough space? Certainly we already do
this to some degree, but it seems that the estimate of the required
space is a bit too small.