[tclug-list] Software RAID issues...

Thu Mar 21 12:28:47 CDT 2013

+1 on Munir's comprehensive answer. 

I'd also advocate using UUID references rather than /dev/sdX for portability. 

It sounds like you might have a failing drive. If the motor is having issues spinning the drive up in time or, less commonly, keeping it going, then the drive won't respond to the kernel's raid probing in time and the result will match the symptoms you're reporting. (There are other causes, though. One of my PCIe SATA controller cards times out with two drives attached; it is reliably fine with one drive. )

Thomas

On Mar 21, 2013, at 10:28 AM, Munir Nassar <tclug at beitsahour.net> wrote:

> on modern systems there is little need for an mdadm.conf to configure the array.
> 
> a couple of things to look for,
> if you are using linux-md/partitions, are the partitions marked linux raid autodetect? if not, then you may want to so that you do so that the linux kernel can build the array automatically.(note that do not below, but i also have an entry in mdadm.conf) 
> ARRAY /dev/md/datastore metadata=1.2 UUID=ce33ff1a:82c18dff:ef8d5149:60e9281e name=snakeman:datastore
> 
> you can get the UUID from the mdadm command below
> 
> also, what is the md version that you are using? you can use mdadm --detail to find out:
> $ sudo /sbin/mdadm --detail /dev/md127
> /dev/md127:
>         Version : 1.2
>   Creation Time : Sat May 26 17:55:25 2012
>      Raid Level : raid5
>      Array Size : 5128012288 (4890.45 GiB 5251.08 GB)
>   Used Dev Size : 732573184 (698.64 GiB 750.15 GB)
>    Raid Devices : 8
>   Total Devices : 8
>     Persistence : Superblock is persistent
> 
>     Update Time : Thu Mar 21 10:23:59 2013
>           State : clean
>  Active Devices : 8
> Working Devices : 8
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>            Name : snakeman:datastore  (local to host snakeman)
>            UUID : ce33ff1a:82c18dff:ef8d5149:60e9281e
>          Events : 2114
> 
>     Number   Major   Minor   RaidDevice State
>        0       8        0        0      active sync   /dev/sda
>        1       8       16        1      active sync   /dev/sdb
>        2       8       32        2      active sync   /dev/sdc
>        3       8       48        3      active sync   /dev/sdd
>        4       8       64        4      active sync   /dev/sde
>        5       8       80        5      active sync   /dev/sdf
>        9       8       96        6      active sync   /dev/sdg
>        8       8      112        7      active sync   /dev/sdh
> 
> also, does the array finish building before you reboot? /proc/mdstat should show you the status of the rebuild, do not power down the system before it has finished rebuilding.
> 
> finally, check the smart status on the drives, it could be that one of the drives is failing without you even knowing it, check with smartctl -a /dev/sdX, maybe run a long scan on each drive in turn: smartctl -t long /dev/sdX
> 
> 
> 
> On Thu, Mar 21, 2013 at 9:57 AM, Yaron <tclug at freakzilla.com> wrote:
>> There are no commented out options that indicate delaying mdadm, but I will try and google for that.
>> 
>> 
>> On Thu, 21 Mar 2013, gregrwm wrote:
>> 
>>> could be that mdadm.conf ought to wait for an event it isn't waiting for
>> 
>> 
>> 
>> --
>> _______________________________________________
>> TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
>> tclug-list at mn-linux.org
>> http://mailman.mn-linux.org/mailman/listinfo/tclug-list
> 
> _______________________________________________
> TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
> tclug-list at mn-linux.org
> http://mailman.mn-linux.org/mailman/listinfo/tclug-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mn-linux.org/pipermail/tclug-list/attachments/20130321/05015bd0/attachment.html>