understanding smart logs

Robert Nichols rnicholsNOSPAM at comcast.net
Mon Aug 16 01:46:50 UTC 2010


On 08/15/2010 12:05 PM, Suvayu Ali wrote:
>> SMART Attributes Data Structure revision number: 16
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>>    1 Raw_Read_Error_Rate     0x002f   199   199   051    Pre-fail  Always       -       1354
>>    3 Spin_Up_Time            0x0027   253   253   021    Pre-fail  Always       -       1158
>>    4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       40
>>    5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
>>    7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
>>    9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1403
>>   10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
>>   11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
>>   12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       38
>> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       21
>> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       18
>> 194 Temperature_Celsius     0x0022   112   107   000    Old_age   Always       -       38
>> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
>> 197 Current_Pending_Sector  0x0032   199   199   000    Old_age   Always       -       172
>> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
>> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
>> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
>>
>> SMART Error Log Version: 1
>> No Errors Logged
>>
>> SMART Self-test log structure revision number 1
>> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
>> # 1  Extended offline    Completed: read failure       90%      1393         1106820646

Your problem is the 172 sectors pending reallocation.  Those are sectors
that are currently unreadable and will be reallocated to spare sectors
the next time they are written.  The problem is that the drive has no
way to know whether the current contents are important (part of some
file, or file system metadata) or irrelevant (part of file system free
space), so the drive _must_ continue to return an error on any attempted
read of those sectors.

The most straightforward way to recover is to back up all of the data
now on the drive while making note of any files that have read errors,
write zeros to the entire drive, then re-make the file system(s) and
restore the data, hopefully having some other source for any important
files that could not be read when backing up.

Trying to use a less ham-fisted approach gets complicated in a hurry.
You need to identify every file affected by a bad sector and re-write
it, then find all of the bad sectors that are now part of free space and
re-write those (filling up the file system with a huge all-zero file
would be one way), and then hope that there are no bad sectors that are
part of file system metadata or otherwise inaccessible via normal file
I/O.

If it were my drive I'd probably make an attempt at rewriting any
affected files I could find (using dd with the "conv=notrunc" option so
that the OS won't reallocate the space) and hope I could get lucky (all
of the errors bunched in a few files that I could recover elsewhere or
simply overwrite with zeros and delete).  In the end, I'd probably waste
more time than the simplistic approach would take, and with less
assurance of success.

-- 
Bob Nichols     "NOSPAM" is really part of my email address.
                 Do NOT delete it.



More information about the users mailing list