On Thu, 2006-09-21 at 13:24 +0100, James Wilkinson wrote:
Since you say that this is a "scratch" test PC, I'd do
a
smartctl -H /dev/hda
(which is probably what I should have told you in the first
place). If that says "PASSED", I'd do a combination of
dd if=/dev/zero of=/dev/hda
to blank the drive (that should remap all the bad sectors), and
dd if=/dev/hda of=/dev/null
to read them all back. Then check for any more errors. If you
don't get any, I'd trust the drive for testing purposes.
Those dd commands will probably take several hours.
Um, no actually. Under an hour, 'twas only a 15 gig drive. I did a
quick test of seeing what what happen if I did dd to the drive that the
computer had booted from. Watched it working, went away, came back to a
black screen (about what I expected). Then I took the drive out and put
it into another box; results below.
[root@box ~]# dd if=/dev/zero of=/dev/hdc
dd: writing to `/dev/hdc': Input/output error
23953097+0 records in
23953096+0 records out
Above is as I'd expect. Below, seems about right (same output count as
input, same number as worked above, and an error). I'm not sure at what
stage a bad block gets mapped out of use. In the past, I'd have done
that while prepping/formatting a drive.
[root@box ~]# dd if=/dev/hdc of=/dev/null
dd: reading `/dev/hdc': Input/output error
23952864+0 records in
23952864+0 records out
Then did a "smartctl -t short /dev/hdc" looked at the results, then a
"smartctl -t long /dev/hdc", results after both further below. The
basic health check showed fine:
[root@box ~]# smartctl -H /dev/hdc
smartctl version 5.33 [i386-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is
http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
So that looks okay. But the "smartctl -a /dev/hdc" is less inspiring:
[root@box ~]# smartctl -a /dev/hdc
smartctl version 5.33 [i386-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is
http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: WDC WD153AA-00BAA0
Serial Number: WD-WMA2L2483801
Firmware Version: 10.09K11
User Capacity: 15,393,079,296 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 4
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sat Sep 23 19:13:27 2006 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (1040) seconds.
Offline data collection
capabilities: (0x1b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 14) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED
RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 197 098 051 Pre-fail Always - 45
3 Spin_Up_Time 0x0006 109 104 000 Old_age Always -
1150
4 Start_Stop_Count 0x0012 098 098 040 Old_age Always -
2524
5 Reallocated_Sector_Ct 0x0012 198 198 112 Old_age Always - 5
9 Power_On_Hours 0x0012 065 065 000 Old_age Always -
26136
10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
12 Power_Cycle_Count 0x0012 098 098 000 Old_age Always -
2297
196 Reallocated_Event_Count 0x0012 196 196 000 Old_age Always - 4
197 Current_Pending_Sector 0x0012 200 199 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0012 100 253 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail Offline - 0
SMART Error Log Version: 1
ATA Error Count: 572 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 572 occurred at disk power-on lifetime: 1013 hours (42 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 18 cd 7e 6d e1 Error: UNC 24 sectors at LBA = 0x016d7ecd = 23953101
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 18 c8 7e 6d e1 00 00:57:28.650 READ DMA
c8 00 20 c0 7e 6d e1 00 00:57:22.800 READ DMA
c8 00 28 b8 7e 6d e1 00 00:57:16.700 READ DMA
c8 00 30 b0 7e 6d e1 00 00:57:10.750 READ DMA
c8 00 38 a8 7e 6d e1 00 00:57:04.750 READ DMA
Error 571 occurred at disk power-on lifetime: 1013 hours (42 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 20 cd 7e 6d e1 Error: UNC 32 sectors at LBA = 0x016d7ecd = 23953101
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 20 c0 7e 6d e1 00 00:57:22.800 READ DMA
c8 00 28 b8 7e 6d e1 00 00:57:16.700 READ DMA
c8 00 30 b0 7e 6d e1 00 00:57:10.750 READ DMA
c8 00 38 a8 7e 6d e1 00 00:57:04.750 READ DMA
c8 00 40 a0 7e 6d e1 00 00:56:58.850 READ DMA
Error 570 occurred at disk power-on lifetime: 1013 hours (42 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 28 cd 7e 6d e1 Error: UNC 40 sectors at LBA = 0x016d7ecd = 23953101
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 28 b8 7e 6d e1 00 00:57:16.700 READ DMA
c8 00 30 b0 7e 6d e1 00 00:57:10.750 READ DMA
c8 00 38 a8 7e 6d e1 00 00:57:04.750 READ DMA
c8 00 40 a0 7e 6d e1 00 00:56:58.850 READ DMA
c8 00 48 98 7e 6d e1 00 00:56:53.050 READ DMA
Error 569 occurred at disk power-on lifetime: 1013 hours (42 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 30 cd 7e 6d e1 Error: UNC 48 sectors at LBA = 0x016d7ecd = 23953101
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 30 b0 7e 6d e1 00 00:57:10.750 READ DMA
c8 00 38 a8 7e 6d e1 00 00:57:04.750 READ DMA
c8 00 40 a0 7e 6d e1 00 00:56:58.850 READ DMA
c8 00 48 98 7e 6d e1 00 00:56:53.050 READ DMA
c8 00 50 90 7e 6d e1 00 00:56:47.350 READ DMA
Error 568 occurred at disk power-on lifetime: 1013 hours (42 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 38 cd 7e 6d e1 Error: UNC 56 sectors at LBA = 0x016d7ecd = 23953101
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 38 a8 7e 6d e1 00 00:57:04.750 READ DMA
c8 00 40 a0 7e 6d e1 00 00:56:58.850 READ DMA
c8 00 48 98 7e 6d e1 00 00:56:53.050 READ DMA
c8 00 50 90 7e 6d e1 00 00:56:47.350 READ DMA
c8 00 58 88 7e 6d e1 00 00:56:41.550 READ DMA
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 1014 23953101
# 2 Short offline Completed: read failure 90% 1013 23953101
# 3 Extended offline Completed: read failure 30% 990 23953101
# 4 Short offline Completed without error 00% 990 -
# 5 Short offline Completed without error 00% 327 -
# 6 Short offline Completed without error 00% 93 -
# 7 Short captive Completed without error 00% 0 -
Device does not support Selective Self Tests/Logging
Tests #1 & #2 are after the dd experiment, the rest are from before. A
quick perusal of information doesn't give me any clues as to what the
remaining and lifetime columns mean. Predicted failure time, uptime?
--
(Currently running FC4, occasionally trying FC5.)
Don't send private replies to my address, the mailbox is ignored.
I read messages from the public lists.