Koozali.org: home of the SME Server

md2 is gone but LVM still exists: need help to recover RAID

Offline christian

  • *
  • 369
  • +0/-0
    • http://www.szpilfogel.com
md2 is gone but LVM still exists: need help to recover RAID
« on: March 21, 2010, 03:17:27 AM »
This is a new one for me and I'm hoping someone has a quick fix.

I have two drives in conventional SME software RAID1, sda and sdb. sdb was showing signs of failure so I replaced it today with the intention of adding it back into the RAID1 array. The new sdb is the same manufacturer and model as the original.

When I booted the system md1 still existed but md2 was nowhere to be found. I repartitioned sdb and successfully added sdb1 to md1. However without md2 neither sda2 nor sdb2 can be added.

Further:
Code: [Select]
# mdadm -E /dev/sda2
mdadm: No super block found on /dev/sda2 (Expected magic a92b4efc, got 12341234)

However! The LVM (ie. /dev/mapper/main-root) is still there. I suspect LVM went and picked it up from the physical disk so I'm not completely toasted here. as per:
Code: [Select]
# pvs
  PV         VG     Fmt  Attr PSize   PFree
  /dev/sda2  main   lvm2 a-   931.41G 160.00M

Question:
Is there a way to easily convert this back to RAID1? e.g. create md2 with sda2 and then add in the new sdb2?

The only way I can think of is time consuming and will take me out of service for quite a while. Namely, I need to recreate the RAID1 on sdb2 and then create a new LVM on that; copy the old LVM to the new one and then add sda2 into the RAID array.

Any ideas will be quite welcome...
« Last Edit: March 21, 2010, 03:22:43 AM by christian »
SME since 2003

Offline christian

  • *
  • 369
  • +0/-0
    • http://www.szpilfogel.com
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #1 on: March 21, 2010, 04:48:53 PM »
Some progress. Since I have a reliable back up, I decided to throw a little caution to the wind.

I recreated md2 with
Code: [Select]
mdadm --create /dev/md2 --verbose --level=1 --raid-devices=2 /dev/sda2 missing
However when I boot the system, when it goes to assemble md2 it can't find any drives for md2 and then panics.

using sme rescue, I can manually assemble md2 and then load "main" with LVM. All files are then visible and apparently intact.

So I think I am just missing the incantation to ensure SME properly builds md2 on boot.

Thoughts?
SME since 2003

Offline christian

  • *
  • 369
  • +0/-0
    • http://www.szpilfogel.com
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #2 on: March 21, 2010, 11:36:33 PM »
Well I'm stuck. I was getting nowhere with this thus far so I zeroed the md superblock again and rebooted. I'm back to just an unprotected lvm on sda2 with no md2 active.

I can see in the logs that prior to my drive replacement the system reported an active md2 with both sda2 and sdb2 in it as it was shutting down. I replaced sdb rebooted and md2 never came back. While I don't think it is an SME bug, I'll move this to the bug tracker now given what I'm seeing in the logs.

bug: http://bugs.contribs.org/show_bug.cgi?id=5861

man, not my day! browser crashed after fully documenting the issue...
« Last Edit: March 22, 2010, 01:39:06 AM by christian »
SME since 2003

Offline christian

  • *
  • 369
  • +0/-0
    • http://www.szpilfogel.com
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #3 on: March 28, 2010, 04:23:36 PM »
Bring this back to the forums from the bug tracker, below is information related to my system.

As more back ground from the logs, md2 existed and seemed to be fine prior to replacing sdb. I ran diagnostics on sdb and sda with Samsung's diagnostic tool to ensure sdb was indeed bad and that sda was clean. I then replaced sdb and when the system rebooted. md2 was no longer there.

I have a total of 4 drive groups in the server. However, I have left the base
config alone. The RAID1 pair is as it was when it was installed with the
installer and I have made symlinks within the /home/e-smith/files/ibays/
directories to the other drives.

So in summary:
md1/md2: I have made no modifications since install.

I added:
md11: RAID5 set with 5 drives (5x 1TB)
md12: RAID1 set with 2 drives (2x 2TB)
beast2: LVM on top of 4x2TB drives. (this is my back up drive)

]# cat /proc/mdstat
Personalities : [raid1] [raid5]
md12 : active raid1 sdd1[1] sdc1[0]
      1953511936 blocks [2/2] [UU]

md11 : active raid5 sdi1[4] sdh1[3] sdg1[2] sdf1[1] sde1[0]
      3907039744 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]

md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]

unused devices: <none>

====
# mdadm -D /dev/md*
mdadm: md device /dev/md0 does not appear to be active.
/dev/md1:
        Version : 00.90.01
  Creation Time : Fri Oct 17 21:44:14 2008
     Raid Level : raid1
     Array Size : 104320 (101.89 MiB 106.82 MB)
    Device Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Mon Mar 22 13:29:03 2010
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : b9e11ac0:807692c3:508b266c:15baac96
         Events : 0.7863

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
mdadm: md device /dev/md10 does not appear to be active.
/dev/md11:
        Version : 00.90.01
  Creation Time : Mon Dec 28 15:07:54 2009
     Raid Level : raid5
     Array Size : 3907039744 (3726.04 GiB 4000.81 GB)
    Device Size : 976759936 (931.51 GiB 1000.20 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 11
    Persistence : Superblock is persistent

    Update Time : Tue Mar 23 10:44:29 2010
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 3c37d819:42f47d16:2cd4e655:899cba19
         Events : 0.278274

    Number   Major   Minor   RaidDevice State
       0       8       65        0      active sync   /dev/sde1
       1       8       81        1      active sync   /dev/sdf1
       2       8       97        2      active sync   /dev/sdg1
       3       8      113        3      active sync   /dev/sdh1
       4       8      129        4      active sync   /dev/sdi1
/dev/md12:
        Version : 00.90.01
  Creation Time : Sun Dec 27 20:12:11 2009
     Raid Level : raid1
     Array Size : 1953511936 (1863.01 GiB 2000.40 GB)
    Device Size : 1953511936 (1863.01 GiB 2000.40 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 12
    Persistence : Superblock is persistent

    Update Time : Tue Mar 23 10:44:29 2010
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : ecc2a956:0a52d87f:cc339002:d803301e
         Events : 0.910518

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8       49        1      active sync   /dev/sdd1
mdadm: md device /dev/md13 does not appear to be active.
mdadm: md device /dev/md14 does not appear to be active.
mdadm: md device /dev/md15 does not appear to be active.
mdadm: md device /dev/md16 does not appear to be active.
mdadm: md device /dev/md17 does not appear to be active.
mdadm: md device /dev/md18 does not appear to be active.
mdadm: md device /dev/md19 does not appear to be active.
mdadm: md device /dev/md2 does not appear to be active.
mdadm: md device /dev/md20 does not appear to be active.
mdadm: md device /dev/md21 does not appear to be active.
mdadm: md device /dev/md22 does not appear to be active.
mdadm: md device /dev/md23 does not appear to be active.
mdadm: md device /dev/md24 does not appear to be active.
mdadm: md device /dev/md25 does not appear to be active.
mdadm: md device /dev/md26 does not appear to be active.
mdadm: md device /dev/md27 does not appear to be active.
mdadm: md device /dev/md28 does not appear to be active.
mdadm: md device /dev/md29 does not appear to be active.
mdadm: md device /dev/md3 does not appear to be active.
mdadm: md device /dev/md30 does not appear to be active.
mdadm: md device /dev/md31 does not appear to be active.
mdadm: md device /dev/md4 does not appear to be active.
mdadm: md device /dev/md5 does not appear to be active.
mdadm: md device /dev/md6 does not appear to be active.
mdadm: md device /dev/md7 does not appear to be active.
mdadm: md device /dev/md8 does not appear to be active.
mdadm: md device /dev/md9 does not appear to be active.
SME since 2003

Offline christian

  • *
  • 369
  • +0/-0
    • http://www.szpilfogel.com
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #4 on: March 28, 2010, 04:32:38 PM »
My recovery plan if it it passes muster is to do the following:
  • create md2 on sdb2 with sda2 missing
  • reboot and see if it sticks (I have my doubts given the above account)
  • rename "main" to "oldmain"
  • create lmv "main" on md2
  • copy "oldmain" to "main"
  • shutdown and disconnect sda
  • boot and see if it sticks.
  • if it does then clear sda, re-partition, and re-add to md1 and md2

Alternatively I considered but felt it didn't give me a good recovery path:
  • I would create md2 with sdb2
  • expand "Main" lvm to md2
  • move physical volume from sda2 to md2
  • remove sda2 from "Main" lvm.

Suggestions and input welcome (encouraged).
« Last Edit: March 28, 2010, 04:35:50 PM by christian »
SME since 2003

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #5 on: March 28, 2010, 04:52:59 PM »
As more back ground from the logs, md2 existed and seemed to be fine prior to replacing sdb. I ran diagnostics on sdb and sda with Samsung's diagnostic tool to ensure sdb was indeed bad and that sda was clean. I then replaced sdb and when the system rebooted. md2 was no longer there.

What I see missing is md0. Where did it go?

Offline christian

  • *
  • 369
  • +0/-0
    • http://www.szpilfogel.com
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #6 on: March 28, 2010, 05:28:17 PM »
What I see missing is md0. Where did it go?
I don't recall having md0. This system was built from a fresh install on either 7.3 or 7.4 and then the data was "restored" from the older drive. This was done about 18 (maybe 24) months ago.

My memory is fuzzy on this but as I recall on more recent SME installs only md1 and md2 are created where as in SME6.x and possibly early SME 7.x md0, md1, and m2 were created.
SME since 2003

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #7 on: March 28, 2010, 08:30:57 PM »
My memory is fuzzy on this but as I recall on more recent SME installs only md1 and md2 are created where as in SME6.x and possibly early SME 7.x md0, md1, and m2 were created.

My memory also fuzzy. And I notice my own home server has no md0 either.

Some of your problem is clear anyway. You lost both mirrors of md2. sdb2 because you replaced the disk, and sda2 because the md superblock was corrupted. Reason unknown, but the Samsung utility is suspect. See the bug tracker.

Offline christian

  • *
  • 369
  • +0/-0
    • http://www.szpilfogel.com
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #8 on: March 28, 2010, 08:46:41 PM »
Thanks Charlie.

Any suggestion on recovery options? I tried to recreate md2 as per post #2 above but I couldn't get it to stick on a normal boot. It would find md2 and then not find any drives for it and then it would kernel panic. manually assembling md2 and then mounting "main" works fine. Do I also need to involve grub for this?

Does my recovery plan above look reasonable?

I'm just concerned I'm missing something that SME needs specifically for md1 and md2. I've tried looking in cvs for the code to see what is done on install but I am having trouble finding the relevant code. I found the anaconda scripts but not the included files which specify the SME build.
SME since 2003

Offline byte

  • *
  • 2,183
  • +2/-0
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #9 on: March 29, 2010, 09:03:18 PM »
[..] so I replaced it today with the intention of adding it back into the RAID1 array. The new sdb is the same manufacturer and model as the original.

What were the steps/commands did you do to add the drive back in to the array ?

I had my RAID1 array go down last week and I just simply removed faulty drive, then added the new drive copied the partition info over by doing:

sfdisk -d /dev/sdb |sfdisk /dev/sda

where sdb had my working drive and sda was my brand new drive, added my drive back to the array by doing:

mdadm /dev/md1 --add /dev/sda1
mdadm /dev/md2 --add /dev/sda2

Don't forget to add the grub loader back on to sda as well, then watched it sync. Now happily running again.
--[byte]--

Have you filled in a Bug Report over @ http://bugs.contribs.org ? Please don't wait to be told this way you help us to help you/others - Thanks!

Offline christian

  • *
  • 369
  • +0/-0
    • http://www.szpilfogel.com
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #10 on: March 29, 2010, 09:14:13 PM »
What were the steps/commands did you do to add the drive back in to the array ?

Per my earlier post:
Code: [Select]
mdadm --create /dev/md2 --verbose --level=1 --raid-devices=2 /dev/sda2 missing
Your procedure is the normal scenario when a single drive fails. In my case the superblock for md2 got wiped out (although md1 was fine) on the surviving disk. I strongly suspect the samsung diag tool did this.

So currently I have an intact md1 with sda1 and sdb1; but no md2. The LVM "main" is mounting directly from sda2 now without md2. sdb2 is a partition on the new disk which will eventually be added into an md2.

I used the above mdadm procedure in an effort to recreate md2 on sda2 with the hopes to then add sdb2 back in. However when I would then boot, md2 would be found but it would not automatically assemble and thus the LVM could not be found and would kernel panic. I can however boot with "sme rescue" and manually assemble md2 with sda2 and then mount "main".

So I think I am just missing something to ensure on normal boot that md2 will auto assemble. I tried to recreate initrd with no success. I'm not sure if grub needs to know about md2; I wouldn't think so.

SME since 2003

Offline byte

  • *
  • 2,183
  • +2/-0
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #11 on: March 29, 2010, 09:33:21 PM »
Per my earlier post:
Code: [Select]
mdadm --create /dev/md2 --verbose --level=1 --raid-devices=2 /dev/sda2 missing

Sorry, I thought that was after you tried your first step.

Quote
I'm not sure if grub needs to know about md2; I wouldn't think so.

No, but your sd(x) will need to know about grub.

As long as you have a good backup I'd probably do a clean install as it seems to be quite messed up...but that's just my thoughts.
--[byte]--

Have you filled in a Bug Report over @ http://bugs.contribs.org ? Please don't wait to be told this way you help us to help you/others - Thanks!

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #12 on: March 29, 2010, 10:06:17 PM »
So I think I am just missing something to ensure on normal boot that md2 will auto assemble.

I think your approach should be to start in rescue mode, create a new md2 on sdb2, then copy all files from sda1 to sdb2. Then restart in normal mode, and add sda2 to md2.

For added safety, I would however boot with only sdb (after doing the above). Then provide a brand new sda to make a new mirror, keeping your existing sda just in case you need to try again to recover from it. This is more expensive, and will add more down-time, but you will not be taking as much risk with your existing sda (and the data it contains).

Offline christian

  • *
  • 369
  • +0/-0
    • http://www.szpilfogel.com
Re: md2 is gone but LVM still exists: need help to recover RAID
« Reply #13 on: March 29, 2010, 11:15:44 PM »
Thank you for the sanity check.
SME since 2003