understanding smart logs

Mon Aug 16 03:27:59 UTC 2010

  On 08/15/2010 08:14 PM, James McKenzie wrote:
> JD wrote:
>>    On 08/15/2010 06:44 PM, Suvayu Ali wrote:
>>
>>> On Sunday 15 August 2010 10:17 AM, James McKenzie wrote:
>>>
>>>> Got a good backup of this drive?  Looks like it needs to be retested, in
>>>> a different machine and if it fails, replaced.
>>>>
>>>> I had a drive that exhibited the same behavior and eventually, it failed.
>>>>
>>>>
>>> I downloaded the bootable iso of the disk diagnostic suite from Western
>>> Digital and ran. It claimed to detect and fix the errors. After the scan
>>> the smart logs read like this,
>>>
>>>
>>>> Vendor Specific SMART Attributes with Thresholds:
>>>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>>>>     1 Raw_Read_Error_Rate     0x002f   199   199   051    Pre-fail  Always       -       1545
>>>>     3 Spin_Up_Time            0x0027   253   253   021    Pre-fail  Always       -       1066
>>>>     4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       42
>>>>     5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
>>>>     7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
>>>>     9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1426
>>>>    10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
>>>>    11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
>>>>    12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       38
>>>> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       21
>>>> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       20
>>>> 194 Temperature_Celsius     0x0022   109   107   000    Old_age   Always       -       41
>>>> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
>>>> 197 Current_Pending_Sector  0x0032   200   199   000    Old_age   Always       -       78
>>>> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
>>>> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
>>>> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
>>>>
>>>> SMART Error Log Version: 1
>>>> No Errors Logged
>>>>
>>>> SMART Self-test log structure revision number 1
>>>> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
>>>> # 1  Conveyance offline  Completed: read failure       90%      1422         1106820646
>>>> # 2  Extended offline    Completed: read failure       90%      1393         1106820646
>>>>
>>>>
>>> Is it okay to continue with this drive? I bought them a few months back,
>>> I am not in a position to change them unless I can RMA the unit.
>>>
>>>
>>>> James McKenzie
>>>>
>>> All suggestions welcome.
>>>
>> Is it possible to purge the SMART logs and reset
>> the counters, and the rerun the SMART tests?
>>
>>
> That should be possible.  Any errors should be a good reason to send the
> drives back.
>
> James McKenzie
>
Of course. Be sure to zero out the drive if it contains
sensitive data or private intellectual property before
sending it for replacement.

dd if=/dev/zero of=/dev/sdx bs=256M

I use 256m to reduce the total number of
calls to write(2). If you have oodles of ram,
then by all means use a larger number (keep it sane) :)
Kernel will break it down to many buffers and queue
them up for io.