Recovering a failed (SSD) hard drive. Unknown partition type.

Wed Dec 28 08:20:37 UTC 2011

(Look inline)
On 2011/12/27 19:26, linux guy wrote:
> On Tue, Dec 27, 2011 at 8:15 PM, Sam Varshavchik<mrsam at courier-mta.com>  wrote:
>
>> Yes, I'm sure it's fine now.
>
> Please clarify, what do you think is fine now ?  The drive ?  Or the
> laptop/drive controller ?
>
> Looks to me like some pages went bad, and the
>> drive mapped them out and replaced with some spare pages held in reserve.
>> Unfortunately, the bad pages were mapped to the initial sectors that held
>> the partition table and the bootloader.
>
> OK.
>
>> The partition table is stored only in one place. There's no backup copy of
>> it.
>
> Looking forward, should one be backing up the PT ?  Would that have
> helped in this situation ?

I have been known to do that. And some systems do this in point of fact.
If there is "unpartitioned space" at the end of the drive you can use that
to real good advantage. "dd" is your good buddy there. Otherwise you may
have to hunt for some otherwise wasted space, use it, and pray it stays
unused. I've done both. But, then, I have perpetrated uncounted sins with
some filesystems, mostly the Amiga file systems, of course.

> I suppose that it's theoretically possible to scan the sectors looking
>> for something that looks like the start of an ext4 partition. It'll probably
>> be on sector 63 or 2048, and from the ext4 superblock figure out the
>> partition's size, then past it scan for the next partition's ext4
>> superblock, and be able to reconstruct the partition table that way. But I
>> don't know of any tool that would do that.
>
> OK.  Can anyone else chime in on this idea ?

'Swhat I said on a prior message rock.

>> In this respect, I say that SSDs are no better than mechnical drives.
>
> I think I agree with you now !

Everything fails. SSDs usually have a maximum number of write cycles. I am
surprised it would take out block zero, though. That's not written very
often. I wonder how old the drive is.

> For
>> peace of mind, nothing beats a pair of RAID-1 drives. One drive goes bad –
>> it's easy to swap it out without losing any data.
>
> I ordered a laptop to replace my old one last week.  It has an eSATA
> port... and hard drives are inexpensive these days...
>
> This whole experience has been a huge eye opener.

Backups are your very good buddies. Push come to shove use a live CD and
one of two or three external plug in drives to take a disk image backup.
(Use at LEAST two drives so you can go back one when your disk image backup
reports a bad read.) Another alternative is two or three machines with
REALLY big disks you can use for periodic backups of important files. I
*HATE* losing data. I sweat too much blood generating it.

>> Plus, when you have one of those once-in-a-biennial events like the old
>> 100mb for /boot not being big enough any more, or grub growing too fat to
>> fit within the first 63 sectors, with RAID-1 you'll be able to sweat it out,
>> and survive, without having to wipe everything and fresh-install from
>> scratch.
>
> Right.  I need to put more effort into data backup, though, to my
> credit, I do have regular backups and ironically, doing a backup
> showed me how bad the problem was AND allowed me to get 69 of 70 GB of
> data off the drive before it failed entirely.

I think it is theoretically possible to do too many backups. I've been
on my third backup at least once. With the RAID arrays here I've been
skimping on that, perhaps too much.

> The first sign of this problem was a very suspicious boot failure.  I
> immediately performed a backup upon experiencing that, wherein I found
> out how bad it really was.  Though my old laptop has had boot issues
> with Linux kernels for a long, long time.

It is theoretically possible a hardware problem bit you. On the other rock
I mentioned that VERY VERY hard to pin down Adaptec SCSI adapter bug when
both it and the IDE were in use at the same time. About one byte per
megabyte was getting trashed utterly at random. It was ACUTELY frustrating.
And I was blaming it on Linux at the time. It might have been somebody here
who suggested that might be the source of the bug. (Once I updated that
card's image the Linux running on it racked up something like 430+ days
of continuous uptime with no problems - through two system moves. I moved
it live along with its UPS. "It was a challenge."

{^_^}