On 15 May 2020, at 11:53, Patrick O'Callaghan <pocallaghan@gmail.com> wrote:

However gsmartcontrol reports that one of the HDDs has internal errors.
Would it be best to correct these using mdadm (assuming they can be
corrected), and if so, how? Or should I do an offline copy with the
docking station's "clone" button?


Typically once this starts to happen will the see the drive fail completely.

There are 4 smart values that are of interesting:

  1 Raw_Read_Error_Rate     0x000f   072   063   044    Pre-fail  Always       -       14974679
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
195 Hardware_ECC_Recovered  0x001a   029   019   000    Old_age   Always       -       14974679

and I think its this one:

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

If the counts on (1) and (195) are the same you are good.

If (5) is not 0 that means the drive is replacing bad blocks. Watch the valiue
if it rises over time backup the drive and replace it.

And you may well be seeing (198) rise in which case you are losing data.

The other thing to look for is uncorrectable disk accesses that you can see
in dmesg (I think UNC is in the message as well).

Once you get these you have lost data and you are likely to lose the whole drive.

Also note that on a desktop motherboard smart error can prevent you booting into
the OS. Server motherboard do not do this.

Barry