Koozali.org: home of the SME Server

[SOLVED] made an fdisk mistake... md2 gone, 'main' lvm still working

Offline nekatreven

  • 20
  • +0/-0
Hello everyone,

Well first of all I did search the forum as best I could and this post http://forums.contribs.org/index.php/topic,45741 has many similarities to my predicament but I think the solution in my case is much easier (I hope).

The short version is that my SME 7.5.1 install used to run on two 250gb disks, using the built-in md raid1. I then wanted to upgrade the disk size and purchased two 500gb disks. Actually physically moving to the new disks went fine (failed the disks one at a time, copied partition table with sfdisk, added back to raid, resync, made sure grub was loaded on both disks, etc). This left me running on the 500gb disks with no issues except that the /dev/sd[ab]2 partitions both still stopped at the 250gb mark; so I had not expanded to use the new space. I recently was ready to do so and that is where it went wrong.

I was working remotely from home last night and thought I could start part of the 'grow' process without booting to a live CD. In retrospect I think I goofed pretty good...blame it on late work hours.  :-? What I did was use fdisk to remove /dev/sd[ab]2 and recreate them in the same place, but I let them now use the space all the way to the end of the 500gb disks. Then I rebooted. I had forgotten that the md superblocks are supposed to be located near the end of the partition and so now neither member of /dev/md2 has a superblock that it can locate.

The system still works...it just gripes a few times that md2 had to be stopped and it gripes that it now sees a duplicate LVM. I'm a little worried that the LVM might decide to use sda2 on one boot and sdb2 on another so I have not rebooted again; it is still up since the first reboot after the mistake. It seems like you can manually create/assemble/restart an array without losing data but since the LVM is live and I'm booted to it, adding a raid layer on the fly sounds like a very economical way to destroy my install.

So here is my theory...
I still have the old 250gb disks, so I can plug one in to another machine and find out at what sector sd[ab]2 used to end at. Once I have that, I could resize my production /dev/sd[ab]2 partitions to end at that sector. At that point I think the superblocks (which should still be there) will once again appear to be at the end of the partition and md2 will spring back to life when I reboot. I would greatly appreciate any input that might verify that theory, and would also appreciate any reassurances that if I do this and it works that the array will be rebuilt from the correct drive. The server has been up for 10 hours handling email and vpn so I really could not afford for it to rebuild from the disk that has been idle that whole time. I was planning on using parted because fdisk puts the '+' on the sector count to indicate rounding and I'm not sure if that will allow me to be accurate enough for this. UPDATE: scratch that...I could just use sfdisk again and use one of the old 250gb disks as the source for the partition table.

TIA for any assistance you may be able to provide!
Mark
« Last Edit: December 04, 2011, 11:11:23 PM by nekatreven »

Offline idp_qbn

  • ****
  • 346
  • +0/-0
Re: made an fdisk mistake... md2 gone, 'main' lvm still working
« Reply #1 on: December 02, 2011, 09:52:52 PM »
Hi - I found these old notes about increasing the HDD size.
=============================
Upgrading the Hard Drive Size INCREASE = LARGER = BIGGER
Note: these instructions are only applicable if you have a RAID system with more than one drive. They are not applicable to a single-drive RAID 1 system, and increasing the useable space on such a system by cloning the existing single drive to a larger drive is not supported. See http://bugs.contribs.org/show_bug.cgi?id=5311
CAUTION MAKE A FULL BACKUP!
Ensure you have e-smith-base-4.16.0-33 or newer installed. [or Update to at least 7.1.3]
Shut down and install larger drive in system.
Boot up and manage raid to add new (larger) drive to system.
Wait for raid to fully sync.
Repeat steps 1-3 until all drives in system are upgraded to larger capacity.
Ensure all drives have been replace with larger drives and array is in sync and redundant!

Issue the following commands:
_________________________________________________
mdadm --grow /dev/md2 --size=max
pvresize /dev/md2
lvresize -l +100%FREE main/root
ext2online -C0 /dev/main/root   
_________________________________________________
In the last command above, the -C0 is: dash C zero
Notes :
All of this can be done while the server is up and running with the exception of #1.
These instructions should work for any raid level you have as long as you have >= 2 drives
If you have disabled lvm
you don't need the pvresize or lvresize command
the final line becomes ext2online -C0 /dev/md2 (or whatever / is mounted to)
=============================
I would start again, with the old 250 Gb disks, replacing one with the 500Gb and letting it sync. Then replace the other 250 with the second 500 and letting it sync. THEN (and only then) do the GROW thingy(the bit inside the _________________________________________________).

Good luck!
Ian
___________________
Sydney, NSW, Australia

Offline nekatreven

  • 20
  • +0/-0
Re: made an fdisk mistake... md2 gone, 'main' lvm still working
« Reply #2 on: December 02, 2011, 10:25:04 PM »
Unfortunately I did not do this all in one pass. I swapped in these new disks over a month ago; I just had not grown it to use the full 500gb yet. So I can't really go back to the 250gb disks.

I do believe those instructions will be very helpful once I get this part fixed though! I have never done this exact disk operation before and I had the idea of it a little wrong in my head...hence the goof. I think that (partially based on what you posted) mdadm will expand /dev/sd[ab]2 on its own, whereas before I thought I'd have to handle that myself.

Thank you for the extra info.
Mark

Offline nekatreven

  • 20
  • +0/-0
Re: made an fdisk mistake... md2 gone, 'main' lvm still working
« Reply #3 on: December 04, 2011, 08:49:18 AM »
It worked, I was able to get md2 back without a live cd.

sfdisk -d /dev/sdd > partition.txt  (sdd was one of my old 250gb disks)
pvdisplay (to determine that sdb was the active drive, not sda.)
sfdisk --force /dev/sdb < partition.txt
reboot

Then I looked at recent received email to make sure I was booted to the right disk. Looked good and /proc/mdstat showed that md2 was running only on member sdb2, so I then applied the partition.txt file to sda as well and added it back with mdadm --add. 40 minutes later all is well.

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: made an fdisk mistake... md2 gone, 'main' lvm still working
« Reply #4 on: December 04, 2011, 06:45:07 PM »
so I then applied the partition.txt file to sda as well and added it back with mdadm --add. 40 minutes later all is well.
Very well. Please add solved to the subject of your opening post.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline johnp

  • ****
  • 312
  • +0/-0
Re: made an fdisk mistake... md2 gone, 'main' lvm still working
« Reply #5 on: December 04, 2011, 09:49:16 PM »
Nice to see that you recovered. Please don't take this wrong, but maybe there should be a link in the wiki to this to help those that choose not to follow instructions. If you would have followed the raid how to, things would have been much simplier IMHO. Really not wanting to insult anyone, so go easy on me :)

Offline nekatreven

  • 20
  • +0/-0
Re: made an fdisk mistake... md2 gone, 'main' lvm still working
« Reply #6 on: December 04, 2011, 11:10:44 PM »
Excuse me? I didn't even know those instructions existed when I started working on this. I'm still learning my way through linux and dealing with open source projects in general. You should take a minute to try and remember how much you once did not know. When I have an issue, I just have to guess whether to search on the forum, the bug tracker, or the wiki (and hope they don't contradict one another).

Even if I was following those instructions, the same thing would have happened at step 2:

Quote
2. Boot up and manage raid to add new (larger) drive to system.

Normally all of the devices in a mirror have to be the same size. That is what I'd always known. Here though, there is an exception in that the sdX2 fd/raid autodetect partitions on the new larger disks should actually use the full space of the new disk and will be larger than the other original member devices. Your instructions don't mention this and so, had I been following them, I'd have ended up in the exact same place I did by going it alone. Namely...I would have added the new drives by cloning the partition tables of the original members so I could be sure they were the same size, and so running the grow operation with mdadm wouldn't work.
« Last Edit: December 06, 2011, 03:24:31 AM by nekatreven »

Offline johnp

  • ****
  • 312
  • +0/-0
Re: [SOLVED] made an fdisk mistake... md2 gone, 'main' lvm still working
« Reply #7 on: December 05, 2011, 10:40:48 PM »
Like I said, I didn't mean any insult. Yes there is still alot that I can learn myself. I did follow those instructions earlier this year and replaced a set of 250G with a couple of 2T drives and grew the array as described. Couldn't have been easier.

Moreover, I just thought it would potentially help others if they end up in with a similar problem. Obviously you are very resourceful and found a working solution. Isn't that what really matters?

Offline nekatreven

  • 20
  • +0/-0
Re: [SOLVED] made an fdisk mistake... md2 gone, 'main' lvm still working
« Reply #8 on: December 06, 2011, 03:22:59 AM »
Crap on a cracker. I just now found out about the 'manage raid' option on the console. That would have helped a lot! I'm not too bent out of shape though; I'm very rarely physically in front of that machine and I knew it wasn't an option in the web config!

I did try to keep my response toned down, and now that I know what you were on about, I'm glad. At one point I was referring to that wiki page as 'those blessed instructions', but I realized that was excessive and changed it. :D

Here is some more info on finishing the job after I undid the mess. It worked here but ymmv and my understanding of it may still have some holes:

Don't use this unless you're in the same predicament I had at the start of the thread (or you already know most of what you're doing). This can go very wrong and destroy your data. Make a full backup!

Drop sda1 and sda2 from md1 and md2, and then increase the size of sda2
Code: [Select]
mdadm /dev/md2 --fail /dev/sda2 --remove /dev/sda2
mdadm /dev/md1 --fail /dev/sda1 --remove /dev/sda1

fdisk -u /dev/sda

In fdisk I deleted sda2, then re-created it at same starting sector but let it end at the 'new' end of the disk (the default start and end points were correct in my case). Be sure and set it to type 'fd'. The commands in fdisk were:
Code: [Select]
d <return>
2 <return>
n <return>
p <return>
2 <return>
<return>
<return>
t <return>
2 <return>
fd <return>
w <return>
(and q if it doesn't close itself)

Add sda partitions back to md1 and md2.
Code: [Select]
mdadm /dev/md2 --add /dev/sda2
mdadm /dev/md1 --add /dev/sda1

At this point sda2 is larger than sdb2 and the md 'magic' on sda2 is in the wrong place, but it still lets you add it back. mdadm seems to ignore this problem and hold onto the data it found at boot-time, as long as the array is not stopped completely (which, in this situation would likely crash the system anyway). It will be resync'ed to the size of sdb2, which has not been enlarged yet.

Wait for resync before moving on.
Code: [Select]
watch cat /proc/mdstat

Drop sdb1 and sdb2 from md1 and md2, then run the grow operation on md2 (which at that point will only be made up of sda2, and should fix the md magic superblock on that partition)
Code: [Select]
mdadm /dev/md2 --fail /dev/sdb2 --remove /dev/sdb2
mdadm /dev/md1 --fail /dev/sdb1 --remove /dev/sdb1

mdadm --grow /dev/md2 --size=max

Grow sdb2 in fdisk (just like with sda2 before). Then re-add sdb's partitions to md1 and md2, and let sdb2 inherit sda2's grown and corrected state.
Code: [Select]
fdisk -u /dev/sdb

mdadm /dev/md2 --add /dev/sdb2
mdadm /dev/md1 --add /dev/sdb1

When the resync finishes md2 will be using the full available capacity. It and its members (sda2 and sdb2) should be stable but the root filesystem on it (at /dev/main/root) will still need to be enlarged. Be sure and wait for the resync to finish before moving on.

Fail a member, as another backup in case the operation fails. Then resize the pv, the lv, and the filesystem.
Code: [Select]
mdadm /dev/md2 --fail /dev/sda2 --remove /dev/sda2

pvresize /dev/md2
lvresize -l +100%FREE main/root
ext2online -C0 /dev/main/root

Assuming this is successful add back the other partition and, you guessed it, wait for resync.
Code: [Select]
mdadm /dev/md2 --add /dev/sda2

That should do it. This was all doable with the system online (only thanks to google, of course).
« Last Edit: December 06, 2011, 03:49:29 AM by nekatreven »