So, SMART reports it has N pending sectors. This must mean it knows exactly which sectors those are, but nothing in the SMART interface is willing to tell you what sectors it is talking about?
You could maybe correlate them with the filesystem structures and find out what files they might be affecting (or if they are just in free space), but that's not useful information for SMART to report?
Please tell me I'm the one who is an idiot and I've just overlooked the obvious here :-).
I have never found a way to get smart to report what specific sectors is pending.
You can do a smartctl -l long against the device and generally it will stop when it hits that sector.
Also usually the errors are found by linux doing a read against it, so there should be error messages on the reads in the messages file when it happened, that is usually what I use to determine what sectors are getting the error.
On Sat, Mar 14, 2015 at 4:48 PM, Tom Horsley horsley1953@gmail.com wrote:
So, SMART reports it has N pending sectors. This must mean it knows exactly which sectors those are, but nothing in the SMART interface is willing to tell you what sectors it is talking about?
You could maybe correlate them with the filesystem structures and find out what files they might be affecting (or if they are just in free space), but that's not useful information for SMART to report?
Please tell me I'm the one who is an idiot and I've just overlooked the obvious here :-). -- users mailing list users@lists.fedoraproject.org To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org
On Sat, 14 Mar 2015 16:53:15 -0500 Roger Heflin wrote:
Also usually the errors are found by linux doing a read against it, so there should be error messages on the reads in the messages file when it happened, that is usually what I use to determine what sectors are getting the error.
Yea, I poked around in the logs and the very first thing that looks like any kind of error is the smart message showing up for the first time (and repeating every 30 minutes since then in an attempt to fill up the logs :-).
That would imply the disk itself found the errors on one of its scans.
You could do a "dd if=/dev/sdx of=/dev/null conv=noerror bs=1M" that should mean the dd will continue on when it hits the error and you will get the list of bad sectors in the messages file. You would have to use fsdebugger or something similar to find the specific stuff in that sector.
On Sat, Mar 14, 2015 at 5:09 PM, Tom Horsley horsley1953@gmail.com wrote:
On Sat, 14 Mar 2015 16:53:15 -0500 Roger Heflin wrote:
Also usually the errors are found by linux doing a read against it, so there should be error messages on the reads in the messages file when it happened, that is usually what I use to determine what sectors are getting the error.
Yea, I poked around in the logs and the very first thing that looks like any kind of error is the smart message showing up for the first time (and repeating every 30 minutes since then in an attempt to fill up the logs :-). -- users mailing list users@lists.fedoraproject.org To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org
On Sat, Mar 14, 2015 at 3:48 PM, Tom Horsley horsley1953@gmail.com wrote:
So, SMART reports it has N pending sectors. This must mean it knows exactly which sectors those are, but nothing in the SMART interface is willing to tell you what sectors it is talking about?
If there's a definite latent sector error, this shows up with a 'smarctl -t long' which will be aborted at the first error found. The LBA for this shows up under LBA_of_first_error.
The other way it will show up is in dmesg, libata will report the read error with the affected LBA. There's some chance of the drive attempting long recoveries, and the kernel SCSI/ATA command timer times out and does a link reset which stops the recovery and finding out what sector is affected.
You could maybe correlate them with the filesystem structures and find out what files they might be affecting (or if they are just in free space), but that's not useful information for SMART to report?
Please tell me I'm the one who is an idiot and I've just overlooked the obvious here :-).
No, it requires esoteric knowledge. It's completely non-obvious and non-discoverable. Fedora actually has smartd running by default and it'll report some things into the journal; if it's minimally configured it can do smart -t long on a schedule and report more things.
On Sat, Mar 14, 2015 at 4:09 PM, Tom Horsley horsley1953@gmail.com wrote:
On Sat, 14 Mar 2015 16:53:15 -0500 Roger Heflin wrote:
Also usually the errors are found by linux doing a read against it, so there should be error messages on the reads in the messages file when it happened, that is usually what I use to determine what sectors are getting the error.
Yea, I poked around in the logs and the very first thing that looks like any kind of error is the smart message showing up for the first time (and repeating every 30 minutes since then in an attempt to fill up the logs :-).
I'd say the first step is to confirm this is due to a media error rather than something else, otherwise you end up down a rat hole.
The top post here is a good example of a URE due to media error. http://ubuntuforums.org/archive/index.php/t-1034762.html
If the drive is attempting a recovery longer than 30 seconds, you'll get errors along these lines (this is a write example, which is really bad, the read version is more common).
[ 2161.457698] ata8.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x6 frozen [ 2161.457709] ata8.00: failed command: WRITE FPDMA QUEUED [ 2161.457718] ata8.00: cmd 61/00:00:80:c4:2c/02:00:1e:00:00/40 tag 0 ncq 262144 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 2161.457723] ata8.00: status: { DRDY } ... [ 5628.308982] ata8.00: failed command: WRITE FPDMA QUEUED [ 5628.308990] ata8.00: cmd 61/80:50:80:34:44/01:00:50:00:00/40 tag 10 ncq 196608 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 5628.308993] ata8.00: status: { DRDY } [ 5628.309000] ata8: hard resetting link [ 5638.311674] ata8: softreset failed (1st FIS failed) [ 5638.311686] ata8: hard resetting link
This is a how to on what to do about bad sectors, including partial recovery. http://www.smartmontools.org/browser/trunk/www/badblockhowto.xml
But the tl;dr for all of that, in my opinion, is to update your backups, and then obliterate the drive with writes. Only on a write does the firmware determine if sector problems are transient or persistent. If it's a persistent problem, then the LBA is reassigned to a reserve sector. Once this is all done, then you can restore from backups.
To do the write correctly, first you have to know if you have a 512n or 512e drive. Most drives these days are 512e, or 512 byte logical, 4096 byte physical. The LBA error is for the first logical sector in the bad physical sector. So writing over that 512 byte sector will not work (it'll fail as a read error even though you're writing, due to a read-modify-write attempt by the drive firmware). 'parted -l' will tell you what type of drive you have is.
What I suggest is this:
# badblocks -b 4096 -svw /dev/sdX
This is destructive! Note that any block numbers that are reported by badblocks at predicated on the -b value. So the reported value isn't a sector LBA value. You have to multiply by 8 to get LBA. But after this cycles through even once, the problem should be resolved. You could let it run through all 8 passes (or whatever it is). What ought to be true is you either get no errors (meaning all read errors weren't media errors they were just bad data, like from torn writes or something) or you get some write errors with reallocations on the first pass. And no errors for subsequent passes. If any subsequent passes have errors, especially corruption errors, then get rid of the drive or turn it into a play thing or send it to me :-D
On Sat, 14 Mar 2015 17:13:15 -0600 Chris Murphy wrote:
The top post here is a good example of a URE due to media error. http://ubuntuforums.org/archive/index.php/t-1034762.html
Yep, that the sort of thing I was looking for in the logs, but there are no ata yadda-yadda complaints anywhere, just the normal boot time ata messages when it is initializing the device. The only errors I see are the sudden appearance of the smart messages.
On Sat, Mar 14, 2015 at 5:29 PM, Tom Horsley horsley1953@gmail.com wrote:
On Sat, 14 Mar 2015 17:13:15 -0600 Chris Murphy wrote:
The top post here is a good example of a URE due to media error. http://ubuntuforums.org/archive/index.php/t-1034762.html
Yep, that the sort of thing I was looking for in the logs, but there are no ata yadda-yadda complaints anywhere, just the normal boot time ata messages when it is initializing the device. The only errors I see are the sudden appearance of the smart messages.
Can you post them?
It's possible actual read errors happened a while ago, in which case dmesg won't have the error messages but either the journal or /var/log/messages will.
On Sat, 14 Mar 2015 16:42:37 -0600 Chris Murphy wrote:
If there's a definite latent sector error, this shows up with a 'smarctl -t long' which will be aborted at the first error found. The LBA for this shows up under LBA_of_first_error.
I actually ran one of those when I first started seeing the messages (I've got another going now), and the prev test results were:
# 2 Extended offline Completed without error 00% 17259 -
So that lonely '-' out there apparently says there is no LBA with an error, the overall health assessment says PASSED, yet these have been showing up every half our or so for a week now:
Mar 14 19:46:52 zooty smartd[812]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors Mar 14 19:46:52 zooty smartd[812]: Device: /dev/sdc [SAT], 8 Offline uncorrectable sectors
It seems to be telling me there is nothing wrong and something wrong at the same time. I'd probably just be happy with the "PASSED" health check if it wasn't constantly spewing these messages :-).
On 03/14/2015 04:56 PM, Tom Horsley wrote:
It seems to be telling me there is nothing wrong and something wrong at the same time.
It looks to me as though it's not telling you that the drive is perfect but that the number of errors are is small enough that there's no reason to worry. As long as the drive keeps working and the number of bad sectors doesn't increase, you don't need to do anything unusual. (Making regular backups shouldn't be considered unusual.)
On Sat, Mar 14, 2015 at 5:56 PM, Tom Horsley horsley1953@gmail.com wrote:
On Sat, 14 Mar 2015 16:42:37 -0600 Chris Murphy wrote:
If there's a definite latent sector error, this shows up with a 'smarctl -t long' which will be aborted at the first error found. The LBA for this shows up under LBA_of_first_error.
I actually ran one of those when I first started seeing the messages (I've got another going now), and the prev test results were:
# 2 Extended offline Completed without error 00% 17259 -
So that lonely '-' out there apparently says there is no LBA with an error, the overall health assessment says PASSED, yet these have been showing up every half our or so for a week now:
Mar 14 19:46:52 zooty smartd[812]: Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors Mar 14 19:46:52 zooty smartd[812]: Device: /dev/sdc [SAT], 8 Offline uncorrectable sectors
This is consistent with a single sector on a 512e AF drive. If it's unreadable, somewhere in the journal or messages is a read error or link reset. You could search for "media error" and "hard resetting link".
What do you get for: # smartctl -x /dev/sdc # parted /dev/sdc u s p
It seems to be telling me there is nothing wrong and something wrong at the same time. I'd probably just be happy with the "PASSED" health check if it wasn't constantly spewing these messages :-).
A valid option is to keep the backups current and ignore it until the number goes up again.
Another option is a non-destructive badblocks (omit the w) with -b 4096, and see if you can trigger a read error. A libata error will be a proper LBA. A badblocks error will need to be multiplied by 8 to get an LBA. This value then gets plugged into debugfs (this is ext4?) to find out what file is affected. And then it also gets plugged into a dd if=/dev/zero if=/dev/sdc bs=4096 seek=$((LBA/8)) to write over that sector - that'll fix this.
On Sat, 14 Mar 2015 19:33:13 -0600 Chris Murphy wrote:
This is consistent with a single sector on a 512e AF drive. If it's unreadable, somewhere in the journal or messages is a read error or link reset. You could search for "media error" and "hard resetting link".
The logs go back to april of last year and there isn't a single instance of either of those (and the messages only started a week or so ago).
What do you get for: # smartctl -x /dev/sdc # parted /dev/sdc u s p
It's a little long, so I uploaded it here:
https://drive.google.com/file/d/0B7pVI_DKcKbySURleHpJSXdpZWs/view?usp=sharin...
There isn't anything vitally important on this drive, but I have lots of space on my new USB3 backup drive so I'm doing an rsync of the stuff it would be inconvenient to lose now (maybe that will trigger an I/O error somewhere).
On Sat, Mar 14, 2015 at 7:56 PM, Tom Horsley horsley1953@gmail.com wrote:
https://drive.google.com/file/d/0B7pVI_DKcKbySURleHpJSXdpZWs/view?usp=sharin...
Sector Sizes: 512 bytes logical, 4096 bytes physical
It's a 512e AF drive. Whether using dd or badblocks to do this, the block size needs to be 4096 (bytes) to write to the full physical sector. dd defaults to 512 bytes, and badblocks to 1024 bytes, neither of which will work correctly on this drive.
198 Offline_Uncorrectable ----C- 100 100 000 - 8
Weird. Because of this, I'd expect an extended offline test to fail and report the first affected LBA. Thanks SMART.
There isn't anything vitally important on this drive, but I have lots of space on my new USB3 backup drive so I'm doing an rsync of the stuff it would be inconvenient to lose now (maybe that will trigger an I/O error somewhere).
If nothing is triggered with the backup, try a non-destructive badblocks or dd read of the drive. Any error reported by either of those that's not triggered by the backup is probably safe to just write over. Just make sure to get the block conversion right.
On Sat, Mar 14, 2015 at 09:56:46PM -0400, Tom Horsley wrote:
There isn't anything vitally important on this drive, but I have lots of space on my new USB3 backup drive so I'm doing an rsync of the stuff it would be inconvenient to lose now (maybe that will trigger an I/O error somewhere).
I apologize if this has been mentioned earlier--I just stuck my nose in on the thread, and if so, expect it to be chopped off.
But by the time you actually see block failures on a drive, you're already in trouble. Internally, the drive will detect bad blocks in operation, mark them bad, and reallocate from its reserve pool. It only allows errors to be seen by the host OS when it can't do this--meaning it's had enough bad blocks accumulating to exhaust its pool.
Drives be cheap; data be expensive. I'd just get everything off this drive and deep six it.
Cheers, -- Dave Ihnat dihnat@dminet.com
On Sun, 15 Mar 2015 08:33:06 -0500 Dave Ihnat wrote:
It only allows errors to be seen by the host OS when it can't do this--meaning it's had enough bad blocks accumulating to exhaust its pool.
The host, in fact, hasn't seen an error. There is no trace of any I/O error reports in logs going back a year. Absolutely the only error I'm seeing is smart itself reporting 8 pending sectors over and over again, yet a long selftest doesn't find a single bad LBA.
At this point I suspect confused firmware in the SMART department and nothing at all actually wrong with the disk :-). (My crucial SSD drive was another one that had a SMART firmware bug so that it stopped working right after 5000 some-odd hours of operation not because there was anything wrong, but because the SMART firmware was busted - fortunately a firmware update fixed it).
On 03/14/2015 08:56 PM, Tom Horsley wrote:
On Sat, 14 Mar 2015 19:33:13 -0600 Chris Murphy wrote:
What do you get for: # smartctl -x /dev/sdc # parted /dev/sdc u s p
It's a little long, so I uploaded it here:
https://drive.google.com/file/d/0B7pVI_DKcKbySURleHpJSXdpZWs/view?usp=sharin...
One thing I noticed in there is:
193 Load_Cycle_Count -O--CK 001 001 000 - 747412
That drive is absolutely _killing_ itself by unloading the heads every 90 seconds or so (17421/747412 = .0233 hours/cycle). This probably isn't related to the problem you're seeing, but you should look into what timeout setting is causing that. It's hurting performance, too.
On Sun, Mar 15, 2015 at 8:54 AM, Robert Nichols rnicholsNOSPAM@comcast.net wrote:
One thing I noticed in there is:
193 Load_Cycle_Count -O--CK 001 001 000 - 747412
That drive is absolutely _killing_ itself by unloading the heads every 90 seconds or so (17421/747412 = .0233 hours/cycle). This probably isn't related to the problem you're seeing, but you should look into what timeout setting is causing that. It's hurting performance, too.
While it seems pathological, I'd leave it alone if the drive is being used for the proper workload it was designed for. The less time the heads are flying over platter surface, the better. Even though this attribute value is 001 and the threshold is 000, it's not a pre-fail attribute, just an age attribute. It's probably instigated at least as much by something that's fsyncing every ~90 seconds like the journal or rsyslog.
On 03/15/2015 10:23 AM, Chris Murphy wrote:
On Sun, Mar 15, 2015 at 8:54 AM, Robert Nichols rnicholsNOSPAM@comcast.net wrote:
One thing I noticed in there is:
193 Load_Cycle_Count -O--CK 001 001 000 - 747412
That drive is absolutely _killing_ itself by unloading the heads every 90 seconds or so (17421/747412 = .0233 hours/cycle). This probably isn't related to the problem you're seeing, but you should look into what timeout setting is causing that. It's hurting performance, too.
While it seems pathological, I'd leave it alone if the drive is being used for the proper workload it was designed for. The less time the heads are flying over platter surface, the better. Even though this attribute value is 001 and the threshold is 000, it's not a pre-fail attribute, just an age attribute. It's probably instigated at least as much by something that's fsyncing every ~90 seconds like the journal or rsyslog.
At that rate, in about 5 days smartd will start reporting "FAILING NOW" for that attribute. The performance impact of having to wait for the heads to reload every 90 seconds should be noticeable. The only advantage of _not_ having the heads flying over the platter surface is about 1 Watt decrease in idle power. Seagate specs the idle power at 4W, with a note "5W with DIPLM enabled" (whatever "DIPLM" is -- I can find no information on that, or how to enable it).
Allegedly, on or about 15 March 2015, Robert Nichols sent:
One thing I noticed in there is:
193 Load_Cycle_Count -O--CK 001 001 000 - 747412
That drive is absolutely _killing_ itself by unloading the heads every 90 seconds or so (17421/747412 = .0233 hours/cycle). This probably isn't related to the problem you're seeing, but you should look into what timeout setting is causing that. It's hurting performance, too.
I noticed that kind of thing on my laptop, years ago. Soon there were reports in the online media about laptop drives aging prematurely from that behaviour.
It strikes me that if they want to park the heads all the time, as some sort of drive saving feature, it ought to be done less violently, so it's not so wearing. The continual clunking is annoying on the ears, too.
For that matter, spinning down the disc isn't too nice either. It makes for a tedious wait when you want to do something, but the drive has gone to sleep.
On Sun, Mar 15, 2015 at 9:40 AM, Robert Nichols rnicholsNOSPAM@comcast.net wrote:
On 03/15/2015 10:23 AM, Chris Murphy wrote:
On Sun, Mar 15, 2015 at 8:54 AM, Robert Nichols rnicholsNOSPAM@comcast.net wrote:
One thing I noticed in there is:
193 Load_Cycle_Count -O--CK 001 001 000 - 747412
That drive is absolutely _killing_ itself by unloading the heads every 90 seconds or so (17421/747412 = .0233 hours/cycle). This probably isn't related to the problem you're seeing, but you should look into what timeout setting is causing that. It's hurting performance, too.
While it seems pathological, I'd leave it alone if the drive is being used for the proper workload it was designed for. The less time the heads are flying over platter surface, the better. Even though this attribute value is 001 and the threshold is 000, it's not a pre-fail attribute, just an age attribute. It's probably instigated at least as much by something that's fsyncing every ~90 seconds like the journal or rsyslog.
At that rate, in about 5 days smartd will start reporting "FAILING NOW" for that attribute.
No it won't. That attribute type is old_age not pre-fail. You don't ever get failing now reports for old_age attributes.
The performance impact of having to wait for the heads to reload every 90 seconds should be noticeable.
Doubtful. The actuator can move from parked to the far side of the platter in what, 1/10th of a second? Faster?
The only
advantage of _not_ having the heads flying over the platter surface is about 1 Watt decrease in idle power.
That's not true, there's a much higher likelihood the head hits particles on the surface of the platter, or even touches the surface of the platter. It can't ever do that if it's parked. You basically have to assume the engineers are morons to 2nd guess this behavior.
On 03/14/2015 06:56 PM, Tom Horsley wrote:
On Sat, 14 Mar 2015 16:42:37 -0600 Chris Murphy wrote:
If there's a definite latent sector error, this shows up with a 'smarctl -t long' which will be aborted at the first error found. The LBA for this shows up under LBA_of_first_error.
I actually ran one of those when I first started seeing the messages (I've got another going now), and the prev test results were:
# 2 Extended offline Completed without error 00% 17259 -
So that lonely '-' out there apparently says there is no LBA with an error, the overall health assessment says PASSED, yet these have been showing up every half our or so for a week now:
I may be imagining things, but ISTR reading about a smartd configuration option for drives that don't clear their pending remap count when they remap a sector. It may be that you have such a drive.
(Or it may the the drugs.)
I have seen my seagate 3tb drive not clear PENDING, not sure of the exact set of conditions for it to not clear. After doing something (booting, running a long test or something) it finally cleared the pending.
On Sun, Mar 15, 2015 at 7:24 PM, Ian Pilcher arequipeno@gmail.com wrote:
On 03/14/2015 06:56 PM, Tom Horsley wrote:
On Sat, 14 Mar 2015 16:42:37 -0600 Chris Murphy wrote:
If there's a definite latent sector error, this shows up with a 'smarctl -t long' which will be aborted at the first error found. The LBA for this shows up under LBA_of_first_error.
I actually ran one of those when I first started seeing the messages (I've got another going now), and the prev test results were:
# 2 Extended offline Completed without error 00% 17259
So that lonely '-' out there apparently says there is no LBA with an error, the overall health assessment says PASSED, yet these have been showing up every half our or so for a week now:
I may be imagining things, but ISTR reading about a smartd configuration option for drives that don't clear their pending remap count when they remap a sector. It may be that you have such a drive.
(Or it may the the drugs.)
--
Ian Pilcher arequipeno@gmail.com
-------- "I grew up before Mark Zuckerberg invented friendship" --------
-- users mailing list users@lists.fedoraproject.org To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org
On Sun, Mar 15, 2015 at 6:24 PM, Ian Pilcher arequipeno@gmail.com wrote:
(Or it may the the drugs.)
My Samsung 840 EVO is on drugs.
SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1801
[...snip...]
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 6 - # 2 Extended offline Completed without error 00% 5 - # 3 Short offline Completed without error 00% 19 - # 4 Extended offline Completed without error 00% 11 - # 5 Short offline Completed without error 00% 0 -
Yes, that's right, it's most recent extended and short offline tests (last night) consider the drive to be 5 and 6 hours, lifetime, in age. Yet power on hours is 1800+. That's funny. If I could lie about my lifetime age on every test, that would be awesome. How old are you? Today, hmm, I'm feeling 6 years old, that work for you? No? OK 19? That's age of majority at least. Oh you don't like that either, OK fine back to 6 then.
On Sat, 14 Mar 2015 21:56:46 -0400 Tom Horsley wrote:
# smartctl -x /dev/sdc # parted /dev/sdc u s p
It's a little long, so I uploaded it here:
https://drive.google.com/file/d/0B7pVI_DKcKbySURleHpJSXdpZWs/view?usp=sharin...
Time has passed, I had been thinking about making a raid1 array for the data on this disk anyway, so I went ahead and got a couple of new disks and did that. Copied everything off the disk with nary an I/O error appearing, and after copying everything off it, it still said 8 pending sectors. I guess that means no data in any file or directory was associated with the bad sectors.
That left me free to experiment with this silly disk, so I started copying /dev/zero to it, and several hours into the zeroing, I got a log message about pending going back to zero. I figured that means it reallocated the blocks finally, but when I look at smartctl -a, I see this:
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
apparently there are no reallocated sectors, but the pending and offline numbers have gone back to zero:
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
the one number that looks like some kind of error is
183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1
but it said that before as well.
So, mostly I guess SMART is still confusing me, but maybe I can use this disk for something non-critical now without filling the logs with pending and offline error messages. (4TB of swap space maybe :-).