I upgraded from F25 to F26 yesterday and ever since have been seeing the system frequently become totally unresponsive.
It seems to be quite random and can only be resolved by hitting the reset button to reboot.
On other occasions it doesn't quite die but starting anything takes several minutes rather than seconds.
Thunderbird and dnf are examples but sometimes there is just no response to key strokes or clicks.
The only clue that I have seen is that top often shows very high wait I/O levels and swap space is sometimes (but not always) low.
Nothing has changed in the system workload or configuration.
This is a production machine so any help will be very welcome.
Cheers and thanks, Stephen
On 12/18/2017 01:19 AM, Stephen Davies wrote:
I upgraded from F25 to F26 yesterday and ever since have been seeing the system frequently become totally unresponsive.
It seems to be quite random and can only be resolved by hitting the reset button to reboot.
On other occasions it doesn't quite die but starting anything takes several minutes rather than seconds.
Thunderbird and dnf are examples but sometimes there is just no response to key strokes or clicks.
The only clue that I have seen is that top often shows very high wait I/O levels and swap space is sometimes (but not always) low.
Nothing has changed in the system workload or configuration.
This is a production machine so any help will be very welcome.
Cheers and thanks, Stephen
I've been noticing the same thing. /Watch it!/ That could be a sign of imminent HDD failure. I've had a professional installer tell me that very thing.
If you've been following my Data Migration thread, you'll find in there a solution Fred Roller suggested. That is: you have two drives (HDD or SSD, it doesn't matter which except for power and space requirements), one small (120 GB minimum size) and one large (as large as you want). You install Fedora on the smaller drive. Then you format the larger drive in the same filesystem (I use ext4) and mount it in your Linux directory tree under a name of your choosing. (Fred uses /crypt, but any name will do--but if you decide to use /cache, make sure that's not a reserved word.) Then for each user account:
1. Create new directories for every user. Thus for every folder named /home/username (where /username/ is the name on a user account), create, say, /crypt/username.
2. Use chown and chmod (as a "sudoer") to set ownership, group membership, and permissions exactly as they are in the original.
3. Create the next level of subfolders of every user folder, and again use chown and chmod to set their ownerships, group memberships, and permissions exactly as you would have them.
4. Now, in each /home/username directory, /remove all subdirectories/. And each time you do that, create a /symbolic link/ to the counterpart directory on the larger drive. For example:
$ sudo rmdir Documents
$ sudo ln -s /crypt/username/Documents /home/username/Documents
If I understand this properly, now those new folders will become visible in /home/username as if they actually resided there. But they will have far more capacity and will be safe.
5. Do this also for any hidden configuration folder, such as for Thunderbird or Kmail, that you want to preserve from one installation to the next.
From then on, you can do clean installs of each successive iteration of Fedora, re-create your users and groups, remove all top-level folders from each user account, and re-create the symlink structure. You can even swap out the smaller HDD or SSD without fear of compromising your data on the larger drive.
An SSD makes an inherently better system drive than an HDD. File access is much faster. Furthermore the system drive, being the workhorse, has a heavier work burden. An SSD will take a lot more punishment than an HDD can take, and for far longer. In fact, a system drive is likely to fail first, for this reason. So using an SSD is far preferable. I got mine for less than $60 US, tax included.
Just remember to power up your system at least once every three months (no more than four) to make sure the SSD doesn't "forget" everything you "taught" it. I use my desktop every day, or idle it for no more than two weeks at a time, so that doesn't present a problem.
I happen to be planning to use an SSD for the user-data drive as well. I have it on hand from an earlier plan (now abandoned) and might as well use it. You might do the same, if you are doing things like torrenting or frequent uploading or downloading or running a database or Web site or anything else that causes you to access user-data files nearly as often as you access system files.
Temlakos
On Mon, 18 Dec 2017 16:49:47 +1030 Stephen Davies sdavies@sdc.com.au wrote:
I upgraded from F25 to F26 yesterday and ever since have been seeing the system frequently become totally unresponsive.
[snip]
The only clue that I have seen is that top often shows very high wait I/O levels and swap space is sometimes (but not always) low.
Check your memory. Does top show the amount of memory that you expect? What's using all that memory? Swap should be empty or close all the time if the system has enough memory. Is there a background task that is using a lot of CPU? The locate database program can use a lot of IO as it scans the disk for things.
You could check the logs for messages about kernel errors because of hardware (journalctl) and run the memory checker to be sure your memory is OK. My /boot shows two, memtest86+-5.01 and elf-memtest86+-5.01.
Nothing has changed in the system workload or configuration.
Sure sounds like a hardware issue showed up. Try cleaning the case, and reseating the memory and connectors.
Temlakos makes good points about the hard drive, too. All the new writes during upgrade could have brought out issues.
On 19/12/17 03:36, stan wrote:
On Mon, 18 Dec 2017 16:49:47 +1030 Stephen Davies sdavies@sdc.com.au wrote:
I upgraded from F25 to F26 yesterday and ever since have been seeing the system frequently become totally unresponsive.
[snip]
The only clue that I have seen is that top often shows very high wait I/O levels and swap space is sometimes (but not always) low.
Check your memory. Does top show the amount of memory that you expect? What's using all that memory? Swap should be empty or close all the time if the system has enough memory. Is there a background task that is using a lot of CPU? The locate database program can use a lot of IO as it scans the disk for things.
You could check the logs for messages about kernel errors because of hardware (journalctl) and run the memory checker to be sure your memory is OK. My /boot shows two, memtest86+-5.01 and elf-memtest86+-5.01.
Nothing has changed in the system workload or configuration.
Sure sounds like a hardware issue showed up. Try cleaning the case, and reseating the memory and connectors.
Temlakos makes good points about the hard drive, too. All the new writes during upgrade could have brought out issues.
It seems to have something to do with graphic apps such as Thunderbird and Firefox. After restarting yet again yesterday afternoon, I left the box alone. Did not even log into KDE on the console but did all my work from either an Android tablet or a laptop using network access to the server only when necessary. No failures for 24 hours.
Just now I started using Thunderbird on the laptop (also F26) and almost immediately started to see huge wait IO levels and lack or response. Prior to starting TB, all was well.
Does this suggest anything?
Cheers, Stephen
On Wed, 20 Dec 2017 17:28:18 +1030 Stephen Davies sdavies@sdc.com.au wrote:
It seems to have something to do with graphic apps such as Thunderbird and Firefox. After restarting yet again yesterday afternoon, I left the box alone. Did not even log into KDE on the console but did all my work from either an Android tablet or a laptop using network access to the server only when necessary. No failures for 24 hours.
Just now I started using Thunderbird on the laptop (also F26) and almost immediately started to see huge wait IO levels and lack or response. Prior to starting TB, all was well.
Does this suggest anything?
Not really. Did memory usage also spike? Did swap get activated again? When the system is idle, neither disk or memory will be stressed at all. It will probably have ticks set to only occur when necessary to save power, so will mostly just be sitting there.
You could try running a video game of some sort instead of firefox or thunderbird. Both of those are large programs with lots of disk and memory usage. A video game should be mostly memory, and video, of course. If it also has a problem, then it is probably not due to the app, but due to another issue.
Can you boot into an older kernel? Maybe it's a kernel issue, and a different kernel won't have the problem.
Nothing shows up in the logs? Yesterday, when the problem occurred, were there any suspicious entries in the journal. You can look back by running journalctl, and finding the time when the issue started yesterday. I usually use -r option so that the latest entries are at the top of the output.
I think there are also disk diagnostics you can run with smartctl. It has a man page. It will show if lots of bad sectors are present.
If it is graphics related, might be nouveau driver (if you are using an nvidia card). I had to switch to the binary nvidia driver from rpmfusion.org to prevent my display from randomly freezing up (though it was just the disply, I could still ssh into the system).
I have seen some other mysterious junk recently. My system crashed for no obvious reason recently, then I rebooted it again today, and got a kernel dump on my console with a walkback saying it was in something like "start kernel" and the stack was corrupted.
Rebooted again, no problem, ran memtest for a couple of cycles, no errors detected.
Hi.
According to the Message below I have to say I have the same issue. My system (a compaq notebook) freezes totally while I am using Firefox, Epiphany or SMPlayer. Checking anything else is not possible, because when I am speaking of a total freeze, I mean a total freeze. ^^
The graphics is an Intel Baytrail, 4GB of RAM, 500GB HDD. And yes, checked memory, HDD and the rest of the hardware without any error.
Regards, Dirk
Am Montag, den 18.12.2017, 16:49 +1030 schrieb Stephen Davies:
I upgraded from F25 to F26 yesterday and ever since have been seeing the system frequently become totally unresponsive.
It seems to be quite random and can only be resolved by hitting the reset button to reboot.
On other occasions it doesn't quite die but starting anything takes several minutes rather than seconds.
Thunderbird and dnf are examples but sometimes there is just no response to key strokes or clicks.
The only clue that I have seen is that top often shows very high wait I/O levels and swap space is sometimes (but not always) low.
Nothing has changed in the system workload or configuration.
This is a production machine so any help will be very welcome.
Cheers and thanks, Stephen
--
======== Stephen Davies Consulting P/L Phone: 08- 8177 1595 Adelaide, South Australia. Mobile:040 304 0583 _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org
On Wed, 20 Dec 2017 22:28:43 +0100 Dirk Gottschalk dirk.gottschalk1980@googlemail.com wrote:
Hi.
According to the Message below I have to say I have the same issue. My system (a compaq notebook) freezes totally while I am using Firefox, Epiphany or SMPlayer. Checking anything else is not possible, because when I am speaking of a total freeze, I mean a total freeze. ^^
The graphics is an Intel Baytrail, 4GB of RAM, 500GB HDD. And yes, checked memory, HDD and the rest of the hardware without any error.
Go here, select a 4.13 kernel for F26, download the binary packages you have installed, and from the directory where they are, run dnf -C install [all the package names]
https://koji.fedoraproject.org/koji/packageinfo?packageID=8
Then boot into that older kernel and see if the problem persists.
On 21/12/17 10:43, stan wrote:
On Wed, 20 Dec 2017 22:28:43 +0100 Dirk Gottschalk dirk.gottschalk1980@googlemail.com wrote:
Hi.
According to the Message below I have to say I have the same issue. My system (a compaq notebook) freezes totally while I am using Firefox, Epiphany or SMPlayer. Checking anything else is not possible, because when I am speaking of a total freeze, I mean a total freeze. ^^
The graphics is an Intel Baytrail, 4GB of RAM, 500GB HDD. And yes, checked memory, HDD and the rest of the hardware without any error.
Go here, select a 4.13 kernel for F26, download the binary packages you have installed, and from the directory where they are, run dnf -C install [all the package names]
https://koji.fedoraproject.org/koji/packageinfo?packageID=8
Then boot into that older kernel and see if the problem persists.
The killer apps seem to be Firefox and Thunderbird. Chrome seems OK.
As soon as I start Firefox, performance declines and eventually stops.
Thunderbird is fine so long as it is only doing email. But when it starts a Javascript script to check events, performance follows the same path as Firefox.
Updating with dnf takes ages and also sends wait I/O through the roof.
The server has 4 Gb memory and 500 Gb SCSI disk. Memory and swap use generally look OK. It is just disk access that seems to be the issue. but the disk configuration hasn't changed . All file systems are set to be checked at boot and smartctl says that all is OK.
Tests with an older kernel are so far ambigious.
On Fri, 22 Dec 2017 10:48:13 +1030 Stephen Davies sdavies@sdc.com.au wrote:
The killer apps seem to be Firefox and Thunderbird. Chrome seems OK.
As soon as I start Firefox, performance declines and eventually stops.
Thunderbird is fine so long as it is only doing email. But when it starts a Javascript script to check events, performance follows the same path as Firefox.
Updating with dnf takes ages and also sends wait I/O through the roof.
The server has 4 Gb memory and 500 Gb SCSI disk. Memory and swap use generally look OK. It is just disk access that seems to be the issue. but the disk configuration hasn't changed . All file systems are set to be checked at boot and smartctl says that all is OK.
Tests with an older kernel are so far ambigious.
I'm out of suggestions. One off the wall thing you could try is reinstalling those two apps. It just seems strange that they would have problems accessing the disk when other programs don't. Is their data spread all over the drive? I don't use TB but I have zero problems with firefox. I vaguely recall a version where it would get into some kind of loop and start consuming memory after a while. If I closed it and restarted, everything was sane again for a while. But that was several versions ago.
I've been reviewing the system log after rebooting and find that the journald entries below are present in all cases.
I do not know whether they are a cause or an effect. What do they mean?
Dec 24 14:15:46 mustang systemd-coredump[5168]: Process 5132 (systemd-journal) of user 0 dumped core. Dec 24 14:19:57 mustang systemd-coredump[5168]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.286203bf080e41509c0dfccdad30ec1b.5132.1514085861000000.lz4 Dec 24 14:20:41 mustang systemd-coredump[5168]: Stack trace of thread 5132: Dec 24 14:20:45 mustang systemd-coredump[5168]: #0 0x00007f62440cebf0 journal_file_find_data_object_with_hash (libsystemd-shared-233.so) Dec 24 14:20:51 mustang systemd-coredump[5168]: #1 0x00007f62440d390d journal_file_append_data (libsystemd-shared-233.so) Dec 24 14:21:00 mustang systemd-coredump[5168]: #2 0x00007f62440d44d1 journal_file_append_entry (libsystemd-shared-233.so) Dec 24 14:21:02 mustang systemd-coredump[5168]: #3 0x00005614572ae14a dispatch_message_real (systemd-journald) Dec 24 14:21:03 mustang systemd-coredump[5168]: #4 0x00005614572afb78 server_dispatch_message (systemd-journald) Dec 24 14:21:03 mustang systemd-coredump[5168]: #5 0x00005614572b0ab1 server_process_syslog_message (systemd-journald) Dec 24 14:21:06 mustang systemd-coredump[5168]: #6 0x00005614572b2e22 server_process_datagram (systemd-journald) Dec 24 14:23:24 mustang kernel: systemd-coredum: 6 output lines suppressed due to ratelimiting Dec 24 14:29:54 mustang systemd-journald[5252]: Journal started Dec 24 14:29:59 mustang systemd-journald[5252]: System journal (/var/log/journal/b998f5fbcd264bf59a80ce00113e07c2) is 1.2G, max 4.0G, 2.7G free. Dec 24 14:30:00 mustang audit[5132]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=5132 comm="systemd-journal" exe="/usr/lib/systemd/systemd-journald" sig=6 res=1 Dec 24 14:30:09 mustang systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)! Dec 24 14:30:11 mustang abrt-dump-journal-core[855]: Failed to obtain all required information from journald Dec 24 14:30:13 mustang systemd[1]: systemd-journald.service: Killing process 5132 (systemd-journal) with signal SIGABRT. Dec 24 14:30:22 mustang abrt-dump-journal-core[855]: Failed to obtain all required information from journald
Cheers, Stephen
Stephen Davies writes:
I've been reviewing the system log after rebooting and find that the journald entries below are present in all cases.
I do not know whether they are a cause or an effect. What do they mean?
Dec 24 14:15:46 mustang systemd-coredump[5168]: Process 5132 (systemd- journal) of user 0 dumped core.
This looks fairly cut and dry to me. systemd-journal blows chunks and crashes. That's what this says. Since this is a critical component of systemd it would not be surprising that something like that ends up taking the whole system down.
A few log lines down, there's an indication that something restarts it.
This would explain why your system slows down, before going kaput. systemd- journal starts crashing. Something respawns it again, which, of course, results in it crashing again. Lather, rinse, repeat.