I have done a fresh install of Fedora 22 on the same computer 4 times. Each time, after using the system from 1 day to 3 days and having successfully rebooted several times, a reboot results in being started in emergency mode. This computer was running Fedora 21 since its release without any problems. I am using an ASUS P9X79 Deluxe motherboard.
At the emergency mode console the following is displayed in yellow:
Ignoring BGRT: invalid status 0 (expected 1) ata16.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 ata16.00: irq_stat 0x40000001 ata16.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 1 dma 16640 in Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
Running journalctl -xb has the following lines in red:
Jun 30 14:48:15 mac.localdomain kernel: Ignoring BGRT: invalid status 0 (expected 1) Jun 30 14:48:15 mac.localdomain kernel: ata16.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Jun 30 14:48:15 mac.localdomain kernel: ata16.00: irq_stat 0x40000001 Jun 30 14:48:15 mac.localdomain kernel: ata16.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 1 dma 16640 in Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation) Jun 30 14:48:32 mac.localdomain kernel: EDAC sbridge: ECC is disabled. Aborting Jun 30 14:48:32 mac.localdomain kernel: EDAC sbridge: Couldn't find mci handler
I have tried installing to a new hard disk. I have tried a different video card. I have run fsck on the hard disk after booting a live image. No errors were found. Nothing I have tried has changed the result.
Where do I start in order to determine the cause of this problem?
On 07/01/2015 10:37 AM, Patrick O'Callaghan wrote:
On Wed, 2015-07-01 at 09:45 -0500, Craig Goodyear wrote:
Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
Googling "HSM violation linux" throws up a bunch of possibilities. Start there.
poc
Thank you for the suggestion. This error is related to the Marvell SATA controller that is not being used. Disabling it in the BIOS elimates the error.
Craig
On Wed, Jul 1, 2015 at 8:45 AM, Craig Goodyear cjhs22a@cableone.net wrote:
Where do I start in order to determine the cause of this problem?
The closest thing I find that are semi recent is https://bbs.archlinux.org/viewtopic.php?id=184783
So it sounds like a hardware bug that the kernel previously worked around and now it's not working around it (?) and thus is a kernel bug. Try using a different kernel. kernel-4.0.7-300.fc22 is in updates-testing repo so you could easily try that. If that doesn't work then I'd go to koji and get kernel-4.1.0-1.fc23.
If that doesn't work, then I suggest you go backwards to the newest Fedora 21 kernel available which presumably will work since it worked for you before. And you can stick with that for now. But then you'll need to file a bug report, including which kernels you've tested, which versions have the problem and don't. And full details on your hardware, like an lspci -vvnn > lspci.txt and attach that file. Same with dmesg > dmest.txt, and attach that. (Anything either long, or important to format correctly without web browser wrapping issues should be attachments.)
Before you get started, you should probably change /etc/dnf/dnf.conf such that installonly_limit is set to something like 10, just to make sure dnf doesn't start deleting kernels. You can clean this up later, which is a bit tedious, but that's another matter.
On 07/01/2015 10:45 AM, Chris Murphy wrote:
On Wed, Jul 1, 2015 at 8:45 AM, Craig Goodyear cjhs22a@cableone.net wrote:
Where do I start in order to determine the cause of this problem?
The closest thing I find that are semi recent is https://bbs.archlinux.org/viewtopic.php?id=184783
So it sounds like a hardware bug that the kernel previously worked around and now it's not working around it (?) and thus is a kernel bug. Try using a different kernel. kernel-4.0.7-300.fc22 is in updates-testing repo so you could easily try that. If that doesn't work then I'd go to koji and get kernel-4.1.0-1.fc23.
If that doesn't work, then I suggest you go backwards to the newest Fedora 21 kernel available which presumably will work since it worked for you before.
Thank you for the response. I have downloaded and tested kernels 4.0.7-300.fc22, 4.1.0-1.fc23 and 3.17.4-301.fc21. All resulted in booting to emerengcy mode.
At this point, I will install Fedora 21 and test. If not successful, I will assume that I have a motherboard failure.
On 07/01/2015 03:44 PM, Joe Zeff wrote:
On 07/01/2015 01:35 PM, Craig Goodyear wrote:
At this point, I will install Fedora 21 and test. If not successful, I will assume that I have a motherboard failure.
Can you boot off of a LiveUSB? If so, it might not be the mobo.
I can boot from a LiveDVD. I have not tried a LiveUSB. I am able to mount a USB thumb drive in emergency mode.
On 07/01/2015 08:45 AM, Craig Goodyear wrote:
I have done a fresh install of Fedora 22 on the same computer 4 times. Each time, after using the system from 1 day to 3 days and having successfully rebooted several times, a reboot results in being started in emergency mode. This computer was running Fedora 21 since its release without any problems. I am using an ASUS P9X79 Deluxe motherboard.
At the emergency mode console the following is displayed in yellow:
Ignoring BGRT: invalid status 0 (expected 1) ata16.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 ata16.00: irq_stat 0x40000001 ata16.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 1 dma 16640 in Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
Running journalctl -xb has the following lines in red:
Jun 30 14:48:15 mac.localdomain kernel: Ignoring BGRT: invalid status 0 (expected 1) Jun 30 14:48:15 mac.localdomain kernel: ata16.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Jun 30 14:48:15 mac.localdomain kernel: ata16.00: irq_stat 0x40000001 Jun 30 14:48:15 mac.localdomain kernel: ata16.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 1 dma 16640 in Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation) Jun 30 14:48:32 mac.localdomain kernel: EDAC sbridge: ECC is disabled. Aborting Jun 30 14:48:32 mac.localdomain kernel: EDAC sbridge: Couldn't find mci handler
I have tried installing to a new hard disk. I have tried a different video card. I have run fsck on the hard disk after booting a live image. No errors were found. Nothing I have tried has changed the result.
Where do I start in order to determine the cause of this problem?
Some googling showed similar problems since 2010 on ubuntu and rhel platforms. Some were dated 2014. So, somehow, an old bug has re-incarnated??
On 07/01/2015 09:45 AM, Craig Goodyear wrote:
I have done a fresh install of Fedora 22 on the same computer 4 times. Each time, after using the system from 1 day to 3 days and having successfully rebooted several times, a reboot results in being started in emergency mode. This computer was running Fedora 21 since its release without any problems. I am using an ASUS P9X79 Deluxe motherboard.
I have tried installing to a new hard disk. I have tried a different video card. I have run fsck on the hard disk after booting a live image. No errors were found. Nothing I have tried has changed the result.
To close this thread. I think I have found the problem. Upon inspecting the BIOS settings, I found that I had not completely disabled UEFI support.
I may have also created a problem for the boot devices when changing the first boot device to the DVD drive for the Fedora 22 install. There were two UEFI entries for Fedora with only one Fedora version installed.
I removed all boot options except the DVD drive and the hard disk. I have done a fresh install of Fedora 21. If it proves to be stable for a couple of weeks, I will proceed with the Fedora 22 install.
Craig
On Thu, Jul 2, 2015 at 1:38 PM, Craig Goodyear cjhs22a@cableone.net wrote:
On 07/01/2015 09:45 AM, Craig Goodyear wrote:
I have done a fresh install of Fedora 22 on the same computer 4 times. Each time, after using the system from 1 day to 3 days and having successfully rebooted several times, a reboot results in being started in emergency mode. This computer was running Fedora 21 since its release without any problems. I am using an ASUS P9X79 Deluxe motherboard.
I have tried installing to a new hard disk. I have tried a different video card. I have run fsck on the hard disk after booting a live image. No errors were found. Nothing I have tried has changed the result.
To close this thread. I think I have found the problem. Upon inspecting the BIOS settings, I found that I had not completely disabled UEFI support.
This is sub-optimal, and is basically used as a last ditch effort. There is no actual way to disable UEFI, what actually happens, this setting enables a compatibility support module that presents a faux-BIOS to the OS to bridge between the OS and UEFI. So UEFI isn't actually disabled, you've just added another layer.
What's really needed are logs, to troubleshoot why there's a boot failure. What's supposed to happen if you're dropped to emergency mode by dracut, is you get an rdsosreport.txt produced that typically contains a bunch of information useful for troubleshooting.
On 07/02/2015 06:01 PM, Chris Murphy wrote:
On Thu, Jul 2, 2015 at 1:38 PM, Craig Goodyear cjhs22a@cableone.net wrote:
What's really needed are logs, to troubleshoot why there's a boot failure. What's supposed to happen if you're dropped to emergency mode by dracut, is you get an rdsosreport.txt produced that typically contains a bunch of information useful for troubleshooting.
I wish you had responded to my original request for help before I did a fresh install. I was not aware of the rdsosreport.txt file and emergency mode only refers to journalctl for troubleshooting. I will keep this in mind if the problem repeats.
Craig
On Fri, Jul 3, 2015 at 6:43 AM, Craig Goodyear cjhs22a@cableone.net wrote:
On 07/02/2015 06:01 PM, Chris Murphy wrote:
On Thu, Jul 2, 2015 at 1:38 PM, Craig Goodyear cjhs22a@cableone.net wrote:
What's really needed are logs, to troubleshoot why there's a boot failure. What's supposed to happen if you're dropped to emergency mode by dracut, is you get an rdsosreport.txt produced that typically contains a bunch of information useful for troubleshooting.
I wish you had responded to my original request for help before I did a fresh install. I was not aware of the rdsosreport.txt file and emergency mode only refers to journalctl for troubleshooting. I will keep this in mind if the problem repeats.
If an rdsosreport.txt is created, there's a hint displayed where to find it. If you're dropped to a shell, and nowhere on that screen is such a hint, then it wasn't created, so you'll have to fake one up. First you need to mount a file system, like a USB stick. /mnt doesn't exist so you can mount it at /sysroot and then:
journalctl -b -l -o short-monotonic > /sysroot/journal.txt
That'll write out the entire journal for just the current (failed) boot, long format in case there's important stuff there, and use monotonic time. All but -b are optional, but they make the log more readable.
Given how many reinstalls you've done, I suspect a hardware problem. Instead of waiting for it to happen again, you could do two things.
Post the output from
smartctl -x /dev/sdX ###where X is the letter for the drive that you've installed Fedora
And after that, over the weekend if you can afford to be without the use of this computer, run memtest86+ as long as you can stand it. Sometimes it takes days for problems to show up.
On 07/03/2015 04:25 PM, Chris Murphy wrote:
Given how many reinstalls you've done, I suspect a hardware problem. Instead of waiting for it to happen again, you could do two things.
Post the output from
smartctl -x /dev/sdX ###where X is the letter for the drive that you've installed Fedora
And after that, over the weekend if you can afford to be without the use of this computer, run memtest86+ as long as you can stand it. Sometimes it takes days for problems to show up.
I ran memtest86+ for 30 hours. No errors were found.
Here is the output from smartctl -x /dev/sda:
smartctl 6.2 2014-07-16 r3952 [x86_64-linux-4.0.6-200.fc21.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST500DM002-1BD142 Serial Number: W3TEGX3B LU WWN Device Id: 5 000c50 07d130b7d Firmware Version: KC48 User Capacity: 500,107,862,016 bytes [500 GB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sun Jul 5 17:08:56 2015 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM level is: 208 (intermediate), recommended: 208 APM feature is: Unavailable Rd look-ahead is: Enabled Write cache is: Enabled ATA Security is: Disabled, frozen [SEC2] Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 600) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 87) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported.
SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 110 099 006 - 27448144 3 Spin_Up_Time PO---- 100 100 000 - 0 4 Start_Stop_Count -O--CK 100 100 020 - 37 5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0 7 Seek_Error_Rate POSR-- 063 060 030 - 1927500 9 Power_On_Hours -O--CK 100 100 000 - 155 10 Spin_Retry_Count PO--C- 100 100 097 - 0 12 Power_Cycle_Count -O--CK 100 100 020 - 37 183 Runtime_Bad_Block -O--CK 100 100 000 - 0 184 End-to-End_Error -O--CK 100 100 099 - 0 187 Reported_Uncorrect -O--CK 100 100 000 - 0 188 Command_Timeout -O--CK 100 100 000 - 0 0 0 189 High_Fly_Writes -O-RCK 100 100 000 - 0 190 Airflow_Temperature_Cel -O---K 067 063 045 - 33 (Min/Max 33/35) 194 Temperature_Celsius -O---K 033 040 000 - 33 (0 24 0 0 0) 195 Hardware_ECC_Recovered -O-RC- 060 039 000 - 27448144 197 Current_Pending_Sector -O--C- 100 100 000 - 0 198 Offline_Uncorrectable ----C- 100 100 000 - 0 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 240 Head_Flying_Hours ------ 100 253 000 - 155h+35m+47.085s 241 Total_LBAs_Written ------ 100 253 000 - 2664640084 242 Total_LBAs_Read ------ 100 253 000 - 3421058523 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning
General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 5 Comprehensive SMART error log 0x03 GPL R/O 5 Ext. Comprehensive SMART error log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters 0x21 GPL R/O 1 Write stream error log 0x22 GPL R/O 1 Read stream error log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa1 GPL,SL VS 20 Device vendor specific log 0xa2 GPL VS 2248 Device vendor specific log 0xa8 GPL,SL VS 129 Device vendor specific log 0xa9 GPL,SL VS 1 Device vendor specific log 0xab GPL VS 1 Device vendor specific log 0xb0 GPL VS 2928 Device vendor specific log 0xbd GPL VS 252 Device vendor specific log 0xbe-0xbf GPL VS 65535 Device vendor specific log 0xc0 GPL,SL VS 1 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (5 sectors) No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 123 - # 2 Short offline Completed without error 00% 122 - # 3 Short offline Completed without error 00% 65 - # 4 Short offline Completed without error 00% 41 - # 5 Short offline Completed without error 00% 17 - # 6 Short offline Completed without error 00% 6 -
SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3 SCT Version (vendor specific): 522 (0x020a) SCT Support Level: 1 Device State: Active (0) Current Temperature: 33 Celsius Power Cycle Min/Max Temperature: 32/35 Celsius Lifetime Min/Max Temperature: 24/48 Celsius Under/Over Temperature Limit Count: 0/0 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 59 minutes Min/Max recommended Temperature: 14/55 Celsius Min/Max Temperature Limit: 10/60 Celsius Temperature History Size (Index): 128 (92)
Index Estimated Time Temperature Celsius 93 2015-06-30 11:40 25 ****** 94 2015-06-30 12:39 ? - 95 2015-06-30 13:38 26 ******* 96 2015-06-30 14:37 ? - 97 2015-06-30 15:36 27 ******** 98 2015-06-30 16:35 ? - 99 2015-06-30 17:34 25 ****** 100 2015-06-30 18:33 ? - 101 2015-06-30 19:32 27 ******** 102 2015-06-30 20:31 ? - 103 2015-06-30 21:30 30 *********** 104 2015-06-30 22:29 ? - 105 2015-06-30 23:28 24 ***** 106 2015-07-01 00:27 ? - 107 2015-07-01 01:26 27 ******** 108 2015-07-01 02:25 ? - 109 2015-07-01 03:24 24 ***** 110 2015-07-01 04:23 ? - 111 2015-07-01 05:22 25 ****** 112 2015-07-01 06:21 ? - 113 2015-07-01 07:20 31 ************ 114 2015-07-01 08:19 ? - 115 2015-07-01 09:18 34 *************** 116 2015-07-01 10:17 ? - 117 2015-07-01 11:16 34 *************** 118 2015-07-01 12:15 ? - 119 2015-07-01 13:14 27 ******** 120 2015-07-01 14:13 ? - 121 2015-07-01 15:12 31 ************ 122 2015-07-01 16:11 ? - 123 2015-07-01 17:10 30 *********** 124 2015-07-01 18:09 ? - 125 2015-07-01 19:08 32 ************* 126 2015-07-01 20:07 ? - 127 2015-07-01 21:06 27 ******** 0 2015-07-01 22:05 ? - 1 2015-07-01 23:04 29 ********** 2 2015-07-02 00:03 ? - 3 2015-07-02 01:02 24 ***** 4 2015-07-02 02:01 ? - 5 2015-07-02 03:00 28 ********* 6 2015-07-02 03:59 ? - 7 2015-07-02 04:58 34 *************** 8 2015-07-02 05:57 ? - 9 2015-07-02 06:56 28 ********* 10 2015-07-02 07:55 34 *************** 11 2015-07-02 08:54 35 **************** 12 2015-07-02 09:53 33 ************** 13 2015-07-02 10:52 33 ************** 14 2015-07-02 11:51 34 *************** 15 2015-07-02 12:50 33 ************** 16 2015-07-02 13:49 33 ************** 17 2015-07-02 14:48 33 ************** 18 2015-07-02 15:47 35 **************** 19 2015-07-02 16:46 ? - 20 2015-07-02 17:45 29 ********** 21 2015-07-02 18:44 34 *************** ... ..( 5 skipped). .. *************** 27 2015-07-03 00:38 34 *************** 28 2015-07-03 01:37 35 **************** 29 2015-07-03 02:36 34 *************** 30 2015-07-03 03:35 34 *************** 31 2015-07-03 04:34 34 *************** 32 2015-07-03 05:33 33 ************** 33 2015-07-03 06:32 34 *************** 34 2015-07-03 07:31 33 ************** ... ..( 7 skipped). .. ************** 42 2015-07-03 15:23 33 ************** 43 2015-07-03 16:22 34 *************** 44 2015-07-03 17:21 33 ************** ... ..( 3 skipped). .. ************** 48 2015-07-03 21:17 33 ************** 49 2015-07-03 22:16 34 *************** 50 2015-07-03 23:15 34 *************** 51 2015-07-04 00:14 34 *************** 52 2015-07-04 01:13 35 **************** 53 2015-07-04 02:12 34 *************** 54 2015-07-04 03:11 34 *************** 55 2015-07-04 04:10 35 **************** 56 2015-07-04 05:09 34 *************** 57 2015-07-04 06:08 34 *************** 58 2015-07-04 07:07 34 *************** 59 2015-07-04 08:06 37 ****************** 60 2015-07-04 09:05 ? - 61 2015-07-04 10:04 35 **************** 62 2015-07-04 11:03 34 *************** ... ..( 2 skipped). .. *************** 65 2015-07-04 14:00 34 *************** 66 2015-07-04 14:59 33 ************** 67 2015-07-04 15:58 34 *************** 68 2015-07-04 16:57 33 ************** ... ..( 23 skipped). .. ************** 92 2015-07-05 16:33 33 **************
SCT Error Recovery Control: Read: Disabled Write: Disabled
Device Statistics (GP Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x000a 2 4 Device-to-host register FISes sent due to a COMRESET 0x0001 2 0 Command failed due to ICRC error 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS
Well I don't see any problems there. If you have a backup of the contents of /var/log/journal, then you can point journalctl to it with -D and see if anything weird was happening before the failure. You can use -r to reverse the log, so as you scroll it goes backwards in time. You can also filter it
grep ERR grep UNC grep -i error grep -i sector
If you get a hit you'll need to note the time stamp and then pick some time maybe 5 minutes before and plug that into --since
journalctl -since="2015-07-05 13:00:00"
And scroll until you find some instigator or at least the first part of what will probably be multiple error lines. Assuming the problems were written in the journal of course.
If there's nothing or the journals are gone or corrupt - > There is a way to point systemd-journald's journal to another computer. I haven't done that so I can't tell you how. But it might be worth setting that up now so that if/when this problem happens again, you'll have logs of the problem.
Chris Murphy
On 03.07.2015, Chris Murphy wrote:
And after that, over the weekend if you can afford to be without the use of this computer, run memtest86+ as long as you can stand it. Sometimes it takes days for problems to show up.
Most often, mprime95 is a better alternative and fails within a short amount of time in case of failing RAM or heat problems: http://www.mersenne.org/download/
One full hour with each of the three stress-tests (respectively) will usually suffice.
On Mon, Jul 6, 2015, 12:11 AM Heinz Diehl htd+ml@fritha.org wrote:
On 03.07.2015, Chris Murphy wrote:
And after that, over the weekend if you can afford to be without the use of this computer, run memtest86+ as long as you can stand it. Sometimes it takes days for problems to show up.
Most often, mprime95 is a better alternative and fails within a short amount of time in case of failing RAM or heat problems: http://www.mersenne.org/download/
One full hour with each of the three stress-tests (respectively) will usually suffice.
The description days it can be used to stress test the CPU including on board caches. It doesn't say it's a memory tester. The reason it can take a long time for memtest to find a defect is that sometimes they produce only intermittent error.
Chris Murphy
On Fri, Jul 3, 2015 at 3:21 PM, Chris Murphy lists@colorremedies.com wrote:
If an rdsosreport.txt is created, there's a hint displayed where to find it.
Example: https://ask.fedoraproject.org/en/question/71011/how-do-i-get-past-the-dracul...