understanding smart logs
JD
jd1008 at gmail.com
Mon Aug 16 03:27:59 UTC 2010
On 08/15/2010 08:14 PM, James McKenzie wrote:
> JD wrote:
>> On 08/15/2010 06:44 PM, Suvayu Ali wrote:
>>
>>> On Sunday 15 August 2010 10:17 AM, James McKenzie wrote:
>>>
>>>> Got a good backup of this drive? Looks like it needs to be retested, in
>>>> a different machine and if it fails, replaced.
>>>>
>>>> I had a drive that exhibited the same behavior and eventually, it failed.
>>>>
>>>>
>>> I downloaded the bootable iso of the disk diagnostic suite from Western
>>> Digital and ran. It claimed to detect and fix the errors. After the scan
>>> the smart logs read like this,
>>>
>>>
>>>> Vendor Specific SMART Attributes with Thresholds:
>>>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
>>>> 1 Raw_Read_Error_Rate 0x002f 199 199 051 Pre-fail Always - 1545
>>>> 3 Spin_Up_Time 0x0027 253 253 021 Pre-fail Always - 1066
>>>> 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 42
>>>> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
>>>> 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
>>>> 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1426
>>>> 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
>>>> 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
>>>> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 38
>>>> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 21
>>>> 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 20
>>>> 194 Temperature_Celsius 0x0022 109 107 000 Old_age Always - 41
>>>> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
>>>> 197 Current_Pending_Sector 0x0032 200 199 000 Old_age Always - 78
>>>> 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
>>>> 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
>>>> 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
>>>>
>>>> SMART Error Log Version: 1
>>>> No Errors Logged
>>>>
>>>> SMART Self-test log structure revision number 1
>>>> Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
>>>> # 1 Conveyance offline Completed: read failure 90% 1422 1106820646
>>>> # 2 Extended offline Completed: read failure 90% 1393 1106820646
>>>>
>>>>
>>> Is it okay to continue with this drive? I bought them a few months back,
>>> I am not in a position to change them unless I can RMA the unit.
>>>
>>>
>>>> James McKenzie
>>>>
>>> All suggestions welcome.
>>>
>> Is it possible to purge the SMART logs and reset
>> the counters, and the rerun the SMART tests?
>>
>>
> That should be possible. Any errors should be a good reason to send the
> drives back.
>
> James McKenzie
>
Of course. Be sure to zero out the drive if it contains
sensitive data or private intellectual property before
sending it for replacement.
dd if=/dev/zero of=/dev/sdx bs=256M
I use 256m to reduce the total number of
calls to write(2). If you have oodles of ram,
then by all means use a larger number (keep it sane) :)
Kernel will break it down to many buffers and queue
them up for io.
More information about the users
mailing list