Koozali.org: home of the SME Server

Raid (SOLVED)

Offline jumba

  • ****
  • 291
  • +0/-0
  • Donations: July 2007 - $ 20.00
    • Smeserver på svenska!
Raid (SOLVED)
« on: January 16, 2008, 03:39:59 PM »
...so I just received the message:

Code: [Select]
A DegradedArray event has been detected on md device /dev/md2.
Here is the situation:

Code: [Select]
[root@server ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[1]
      244035264 blocks [2/1] [_U]
     
md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]
     
unused devices: <none>
[root@server ~]# mdadm --detail --verbose /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Tue Dec 18 15:33:31 2007
     Raid Level : raid1
     Array Size : 244035264 (232.73 GiB 249.89 GB)
    Device Size : 244035264 (232.73 GiB 249.89 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Wed Jan 16 15:34:50 2008
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : e9b25f43:328a09e4:d0e0305e:94051d6a
         Events : 0.862266

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8       18        1      active sync   /dev/sdb2
[root@server ~]# mdadm --detail --verbose /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Tue Dec 18 15:33:31 2007
     Raid Level : raid1
     Array Size : 104320 (101.89 MiB 106.82 MB)
    Device Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Wed Jan 16 15:10:09 2008
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 8b3124ee:f2c663f6:5a197366:46d9e75b
         Events : 0.1030

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
[root@server ~]#

Yes, I have searched the forum, but since there are real "experts" aroud here, what would be the solution in this case?

Many thanks in advance :lol:

Edit:

My own guess how to solve the problem:

Code: [Select]
mdadm /dev/md2 -a /dev/sda2
Could somebody please confirm that I'm on the right track?

Edit2:

OK, I was in a hurry, read http://wiki.contribs.org/Raid even one more time and decided to go with my theory:

Code: [Select]
[root@server ~]# mdadm /dev/md2 -a /dev/sda2
mdadm: hot added /dev/sda2
[root@server ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda2[2] sdb2[1]
      244035264 blocks [2/1] [_U]
      [>....................]  recovery =  0.1% (288448/244035264) finish=84.4min speed=48074K/sec
md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]
     
unused devices: <none>

It seems I was on the right track, but I'll have to keep a cloose look on that drive :-?
« Last Edit: January 16, 2008, 04:31:03 PM by jumba »

Offline raem

  • *
  • 3,972
  • +4/-0
Re: Raid (SOLVED)
« Reply #1 on: January 16, 2008, 09:28:07 PM »
jumba

You should run a disk check on both drives asap as one may fail soon.
...

Offline jumba

  • ****
  • 291
  • +0/-0
  • Donations: July 2007 - $ 20.00
    • Smeserver på svenska!
Re: Raid (SOLVED)
« Reply #2 on: January 16, 2008, 10:45:14 PM »
You should run a disk check on both drives asap as one may fail soon.

Thanks, I will. The strange thing is that this is an almost brand new server (DELL), - only 1 month old :???:

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #3 on: September 20, 2008, 05:59:37 AM »
I have exact same message:
A DegradedArray event has been detected on md device /dev/md2.

And both of my drives are practically new, only 3 month on a server that does not get a lot of usage.

Is it just a coincidence that we both have new drives and in both cases same drive has issues?
Or could it be  a case of the drives not dying but something wrong with software RAID?

And what about code that jumba posted?
 mdadm /dev/md2 -a /dev/sda2
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline jumba

  • ****
  • 291
  • +0/-0
  • Donations: July 2007 - $ 20.00
    • Smeserver på svenska!
Re: Raid (SOLVED)
« Reply #4 on: September 20, 2008, 09:41:32 AM »
Hi again. I've had no further problems with that server or the drives since then....

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #5 on: September 22, 2008, 05:27:53 PM »
What was your solution? Run the code you posted? Replaced drive??

Also what code did you use to test the drive? 
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline jumba

  • ****
  • 291
  • +0/-0
  • Donations: July 2007 - $ 20.00
    • Smeserver på svenska!
Re: Raid (SOLVED)
« Reply #6 on: September 22, 2008, 10:21:22 PM »
What was your solution? Run the code you posted? Replaced drive??

Also what code did you use to test the drive? 

Actually, I did nothing more than I wrote here.

I ran the code posted, and kept read the logs more carefully for a couple of weeks afterwards.

The disks are still operating well in that server, with signs of failure...

Not much of help for you, but that's still the truth :?

//Jumba

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #7 on: December 21, 2008, 09:56:31 PM »
OK, I did:
Code: mdadm /dev/md2 -a /dev/sda2

Soon after I did that, admin got an email:
RebuildStarted event on /dev/md2

couple minutes after that I got an email:
RebuildFinished event on /dev/md2

and couple minutes after that I got an email:
FailSpare event on /dev/md2

it said:
This is an automatically generated mail message from mdadm running on
A FailSpare event has been detected on md device /dev/md2.
Device /dev/sda2 is now an active member of md device /dev/md2.

So if I am reading this right, one of the drives failed and the one that failed is sda1. I am I reading this right?
 
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #8 on: December 21, 2008, 10:10:08 PM »
Yup, after pulling more info, one of the drives is dead. I just want to make sure that it is sda1, I don't want to replace the wrong drive. Also after I replace the drive, do I need to do anything, or will raid sinc the two drives?


[root@x ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda2[2](F) sdb2[1]
      732467520 blocks [2/1] [_U]
     
md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]
     
unused devices: <none>

[root@x ~]# mdadm --detail --verbose /dev/md2
/dev/md2:
        Version : 00.90.01
  Creation Time : Sat Feb  2 10:21:21 2008
     Raid Level : raid1
     Array Size : 732467520 (698.54 GiB 750.05 GB)
    Device Size : 732467520 (698.54 GiB 750.05 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sun Dec 21 13:01:06 2008
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

           UUID : 738d1166:ec014667:f7dcc221:126c798f
         Events : 0.6595410

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8       18        1      active sync   /dev/sdb2

       2       8        2        -      faulty   /dev/sda2
[root@x ~]#
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: Raid (SOLVED)
« Reply #9 on: December 21, 2008, 11:04:22 PM »
Yup, after pulling more info, one of the drives is dead. I just want to make sure that it is sda1, I don't want to replace the wrong drive.

sda1 is not a drive. sda1 is a partition. It is on drive sda.



Offline electroman00

  • ****
  • 491
  • +0/-0
Re: Raid (SOLVED)
« Reply #10 on: December 21, 2008, 11:43:33 PM »
Unplug sda (bad drive) and set the bios to boot to sdb, if it boots then it's a good drive.
Don't make any changes to the server, just test things out and shutdown.

Then test the other drive the same way.

Plug one drive in at a time and set the bios to boot to that drive.

You want to be sure you have the bad drive.

sda might boot and appear as ok when is by itself, doesn't mean it is.

Most important, mark good / bad on each drive, so you don't loose track of what your doing and before you start any destructive i.e. format command
wait 15 sec. and rethink all your steps.

Sure does look like sda md2 is the bad boy.

md2 : active raid1 sda2[2](F) sdb2[1]
      732467520 blocks [2/1] [_U]

Check sata cable or scsi jumpers and mark it sda with a felt pen.

Doesn't mean the drive is bad, just give it a mfg diag format.

Once you know which drive for sure, then remove the good drive and mfg format the bad drive, MAKE SURE the good drive is unplugged.
Then there's no chance of formatting the wrong one.

You can also run mfg diag on it to test it.

You can put it back into sda and keep the bios set to boot to sdb.

Then server admin #5 manage raid.

About 30-60sec. it will start a rebuild, atl-f2 and login to watch the rebuild

watch -n .1 cat /proc/mdstat

oops EDIT:If you get this far then reboot after the rebuild and make sure you set the bios to boot sda.....not sdb.

=======================================

You could also do  a manual restore of sda md2 partition from sdb, but if you make a mistake, your screwed, like forever.
Like if you restore (bad) sda md2 to (good) sdb md2 your screwed.

Plus you won't be able to re-format the entire drive and test it with mfg diag, the preferred method.

In short.... it's worth a try, might just be a corrupt sda md2

hth
 
« Last Edit: December 21, 2008, 11:48:58 PM by electroman00 »

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: Raid (SOLVED)
« Reply #11 on: December 22, 2008, 01:50:35 AM »
OP should not do anything without first scanning /var/log/messages.* for hard drive related error messages and using smartctl to query the drive error and self-test information.

Offline calisun

  • *
  • 601
  • +0/-0
Re: Raid (SOLVED)
« Reply #12 on: December 22, 2008, 08:48:59 AM »
My server is in a colocation facility, so I don't have time to sit there playing with the drive. Plus I have couple of very active web pages hosted on that server, so I can't afford to have the site offline for extended period of time. So I will just replace the drive with a new drive and later when I get home I will see if I can rescue the drive, if I can I will keep it as a spare.

Basically I need to know is how to figure out which drive is dead before I get there so I don't replace the wrong drive.
SME user and community member since 2005.
Want to install Wordpress in iBay of SME Server?
See my step-by-step How-To wiki here:
http://wiki.contribs.org/Wordpress_Multisite

Offline electroman00

  • ****
  • 491
  • +0/-0
Re: Raid (SOLVED)
« Reply #13 on: December 22, 2008, 09:43:33 AM »
md2 : active raid1 sda2[2](F) sdb2[1]
      732467520 blocks [2/1] [_U]

Offline Frank VB

  • ***
  • 127
  • +0/-0
Re: Raid (SOLVED)
« Reply #14 on: December 22, 2008, 10:50:48 AM »
Use the smartctl command to find out the serial number of your drive (hopefully, you can still retrieve this information form your dead drive). This number is also printed on the label of the hard drive:

Code: [Select]
# smartctl -i /dev/sda
smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD2500YS-18SHB2
Serial Number:    WD-WCANY4116755
Firmware Version: 20.06C07
User Capacity:    250,000,000,000 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Dec 22 10:43:42 2008 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

« Last Edit: December 22, 2008, 10:52:41 AM by frankvb »