My main server has 4 WD 1 TB drives in RAID 5, and within its first 2 months
there were a number of errors on one drive.  Western Digital RMA'd it with
zero hassle (just fill out the web form), even shipped me the new one first.
With a new setup I was happy to swap it right away.

I need to dig into the potential kernel issue that Rob pointed out...


-----Original Message-----
From: Marc Skinner [mailto:marc at e-skinner.net] 
Sent: Sunday, May 30, 2010 2:24 PM
To: Jeff Jensen
Subject: Re: [tclug-list] ata "failed command: WRITE DMA", "ATA bus error"
messages

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have the same thing happing to 2 of my 3 1.5TB drives in my
workstation.  They are in a RAID5 setup, so I'm just hoping they don't
both die at the same time.  But yes, your right, at some point if the
disk continues to fail and you get reallocated sector errors the OS will
mark the drive as bad and not use it.  I have been running mine with a
couple of reallocated sector errors for almost 2 years.  They may never
get any worse - and then again, they could fail tomorrow.  I have
Seagates, which have a 5 year warranty, so I'm fine with just waiting
for them to die.

Good luck.

On 05/30/2010 01:40 PM, Jeff Jensen wrote:
> Thanks Marc.  sda & sdb say "passed".
> 
> I've used the "Disk Utility" and the self test of the disk completed OK.
> The details though show "Reallocated sector count" of 1.
> 
> The machine/drive seems to work fine, but I understand that the rise of
> reallocated sectors means it is on its way to failure.
> 
> 
> -----Original Message-----
> From: Marc Skinner [mailto:marc at e-skinner.net] 
> Sent: Sunday, May 30, 2010 1:04 PM
> To: TCLUG Mailing List
> Cc: Jeff Jensen
> Subject: Re: [tclug-list] ata "failed command: WRITE DMA", "ATA bus error"
> messages
> 
> What does smartctl -H /dev/sd* say - ie: * would be a,b, or c.  or if
> its /dev/hd* do that as well.
> 
> That should give you the health of the disks.
> 
> 
> On 05/28/2010 08:19 AM, Jeff Jensen wrote:
>> Searching through the last boot log (trying to determine what is ata2), I
>> see:
> 
>> For the old 40G:
>> ata2.00: ATAPI: CRD-8400B, 1.04, max UDMA/33
>> ata2.00: configured for PIO4
>> ata2.00: device is on DMA blacklist, disabling DMA
>> ata2.01: 78165360 sectors, multi 16: LBA 
>> ata2.01: ATA-5: WDC WD400BB-00CLB0, 05.04E05, max UDMA/100
>> ata2.01: configured for UDMA/33
> 
>> For the new 1TB:
>> ata6.00: ATA-8: WDC WD1001FALS-00J7B1, 05.00K05, max UDMA/133
>> ata6.00: configured for UDMA/133
>> ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> ata6: SATA max UDMA/133 mmio m1048576 at 0xf4100000 port 0xf4128000 irq 5
> 
> 
>> It looks like ata1 & ata2 are the IDEs and ata3-ata6 are the new SATA
card
>> (?).  So perhaps these messages are potential drive failure ones?
> 
> 
>> -----Original Message-----
>> From: Jeff Jensen [mailto:jjensen at apache.org] 
>> Sent: Friday, May 28, 2010 8:02 AM
>> To: 'tclug-list at mn-linux.org'
>> Subject: ata "failed command: WRITE DMA", "ATA bus error" messages
> 
>> To my old backup server (running BackupPC), I recently added a PCI SATA
> card
>> and 1TB drive, and installed Fedora 13 (and removed 2 IDE smaller drives;
>> the boot drive is still an older IDE 40G; was running Fedora 11).
> Messages
>> log regularly has this set of messages, much more frequent when the
backup
>> is running:
> 
>> May 28 06:49:38 nacho kernel: ata2.01: exception Emask 0x0 SAct 0x0 SErr
> 0x0
>> action 0x6
>> May 28 06:49:38 nacho kernel: ata2.01: BMDMA stat 0x44
>> May 28 06:49:38 nacho kernel: ata2.01: failed command: WRITE DMA
>> May 28 06:49:38 nacho kernel: ata2.01: cmd
>> ca/00:c0:4f:8c:1a/00:00:00:00:00/f0 tag 0 dma 98304 out
>> May 28 06:49:38 nacho kernel:         res
>> 51/84:00:0e:8d:1a/00:00:00:00:00/f0 Emask 0x10 (ATA bus error)
>> May 28 06:49:38 nacho kernel: ata2.01: status: { DRDY ERR }
>> May 28 06:49:38 nacho kernel: ata2.01: error: { ICRC ABRT }
>> May 28 06:49:38 nacho kernel: ata2: soft resetting link
>> May 28 06:49:38 nacho kernel: ata2.00: device is on DMA blacklist,
> disabling
>> DMA
>> May 28 06:49:38 nacho kernel: ata2.00: configured for PIO4
>> May 28 06:49:38 nacho kernel: ata2.01: configured for UDMA/33
>> May 28 06:49:38 nacho kernel: ata2: EH complete
> 
>> I've googled various words from the messages, but what I find are CD
drive
>> related messages and "now it takes longer to boot, so change modprobe"
> type
>> things.  I think mine is related to the new drive (just a hint from the
> "ATA
>> bus error" and 'ata' all over the messages ;-), possibly harmless
messages
>> or maybe a SATA card config or compatibility problem(?).  I'm surprised
> they
>> keep repeating - if it was just a config error, I would think it would
>> adjust once (maybe at boot) and then be done.
> 
>> Can anyone point me to an RTFM or hints how to research/what is the cause
>> pretty-please?!
> 
> 
> 
> 
> 
> 
>> _______________________________________________
>> TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
>> tclug-list at mn-linux.org
>> http://mailman.mn-linux.org/mailman/listinfo/tclug-list
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAkwCu2sACgkQvE9HrEfeE4clSQCeLuHcpgIYr+gefCDDgfP2umLC
xu8AoOjnftaJC+BYHEWDukeyZIeG0TSn
=guCK
-----END PGP SIGNATURE-----