Too much hard drives failing

Albert Graham agraham at g-b.net
Fri Jan 26 05:56:43 UTC 2007


Pablo,

Wouldn't think this is Reiser/Fedora related, it could be BIOS related, 
Seagate Firmware related (e.g. if using RAID you may need *exact* same 
Seagate firmware installed).

Not sure you if you the option of turning NCQ off ?

This could be a faulty batch of disks too, I would contact Seagate and 
check, in my case I'm one of those guys who believes in hardware raid 
for production machines, so failures can be resolved by disk replacement 
by anyone.

> How did you work the 3Ware firmware thing?
I put 3ware under extreme pressure to produce a fixed (patch) that allowed my to revert their firmware to a previous version (not normally possible) and that fixed it. 3ware recently re-wrote there firmware code base which caused me lots of problems.


Albert.


Pablo Povarchik wrote:
> 	Albert, thanks a lot for your answer.
>
> 	The only log i can trace back now is 
> Device: /dev/sda, ATA error count increased from 11605 to 11610
> because it's stored on our RequesTracker 
>
> I have also some notes about
> ATA: abnormal status 0xD0 on port 0xE407
>
> Jan 26 06:12:29 ns kernel: ata2: command 0x25 timeout, stat 0xd0
> host_stat 0x21
> Jan 26 06:12:29 ns kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI
> SK/ASC/ASCQ 0xb/47/00
> Jan 26 06:12:29 ns kernel: ata2: status=0xd0 { Busy }
> Jan 26 06:12:29 ns kernel: SCSI disk error : host 3 channel 0 id 0 lun 0
> return code = 8000002
> Jan 26 06:12:29 ns kernel: Current sd08:10: sns = 70  b
> Jan 26 06:12:29 ns kernel: ASC=47 ASCQ= 0
> Jan 26 06:12:29 ns kernel: Raw sense data:0x70 0x00 0x0b 0x00 0x00 0x00
> 0x00 0x0a 0x00 0x00 0x00 0x00 0x47 0x00 0x00 0x00 0x00 0x00
> Jan 26 06:12:29 ns kernel:  I/O error: dev 08:10, sector 0
> Jan 26 06:12:29 ns kernel: ATA: abnormal status 0xD0 on port 0xE407
>
> This is taken from of those broken disks that's still attached on the
> second port of one of the servers, i left it only to try figuring this
> out.
>
> I tried replacing the cables, etc. But the disks are really broken.
> replacing them it works, maybe with no further errors.
>
> Yes, i can remember about different error messages.
>
> And i can only wait for it to happen again, if you need more logs
>
> How did you work the 3Ware firmware thing?
>
> Thanks a lot for the help
>
> Pablo
>
>
>
> On Fri, 2007-01-26 at 04:57 +0000, Albert Graham wrote:
>   
>> Hi Pablo,
>>
>> What kind of failures are these ? hardware or  disk corruption/software ?
>>
>> I have about 40 SM servers (with SATA2, SG 500GB etc. running FC5) also 
>> using Reiser 3, I also had failures which I eventually traced to 3ware 
>> controller firmware, however I have not had any hardware failures.
>>
>>
>> Thanks.
>> Albert.
>>
>>
>>  
>> Pablo Povarchik wrote:
>>     
>>> Hello there
>>>
>>> Im starting here because i really don't know which would the best place
>>> to look for help. If this is not the correct list, please advise. And if
>>> you can recommend any ML right for this, please let me know.
>>>
>>> Words said, let's go to the point:
>>>
>>> We have recently added 20 servers to our little farm, 7 of which were
>>> having hard failuers on disks (SATA, Seagate, good brand new SuperMicro
>>> boxes)
>>>
>>> The fact is that this failures are coming up right after we decided to
>>> move to reiserfs.
>>>
>>> Can 7 out of 20 hard drives be defective (yes, of course, but what is
>>> the % probability for this)?
>>> Can this anyhow be related with reiserfs?
>>>
>>> Sata2
>>> Seagate
>>> SuperMicro 
>>> Fedora core 5
>>>
>>> Any help will be more than appreciated
>>>
>>>
>>> Thanks a lot
>>>
>>>   
>>>       
>>     




More information about the users mailing list