Koozali.org: home of the SME Server

Replacing a disk in RAID 5 and RAID1 array?

Offline holck

  • ****
  • 317
  • +1/-0
Replacing a disk in RAID 5 and RAID1 array?
« on: July 21, 2018, 12:56:20 PM »
I run RAID5 and RAID1 with 4 hard disks:

Code: [Select]
$ cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4] [raid1]
md0 : active raid1 sdc1[2] sdb1[1] sda1[0] sdd1[3]
      255936 blocks super 1.0 [4/4] [UUUU]
     
md1 : active raid5 sdc2[2] sdd2[4] sda2[0] sdb2[1]
      2929119744 blocks super 1.1 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 3/8 pages [12KB], 65536KB chunk

unused devices: <none>

However, one of the disks (sdb) is failing
Code: [Select]
Jul 21 10:54:12 karoline smartd[1958]: Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!
Jul 21 10:54:12 karoline smartd[1958]: Device: /dev/sdb [SAT], Failed SMART usage Attribute: 5 Reallocated_Sector_Ct.

So now I want to replace the failing disk. I found https://wiki.contribs.org/Raid:Manual_Rebuild, but it mostly describes RAID 1.

I don't have room to physically add an extra disk, so I guess the correct procedure is something like

1. mark sdb as faulty:
Code: [Select]
# mdadm --manage /dev/md0 --fail /dev/sdb1
# mdadm --manage /dev/md1 --fail /dev/sdb2

2. remove sdb1 and sdb2 from the arrays:

Code: [Select]
# mdadm --manage /dev/md0 --remove /dev/sdb1
# mdadm --manage /dev/md1 --remove /dev/sdb2

3. power down

4. replace the faulty disk with a new one

5. power up

Is this correct? Should the new disk be partitioned first?

Is it a problem that md0 is RAID1 and md1 is RAID5?

Any help appreciated :-)
......

Offline Stefano

  • *
  • 10,836
  • +2/-0
Re: Replacing a disk in RAID 5 and RAID1 array?
« Reply #1 on: July 21, 2018, 01:35:28 PM »
After you added the new disk and rebooted, open the console (login as admin or as root and give "console" command) and then choose the "manage redundancy disks" (or similar menu item)

Offline ReetP

  • *
  • 3,722
  • +5/-0
Re: Replacing a disk in RAID 5 and RAID1 array?
« Reply #2 on: July 21, 2018, 10:02:36 PM »
Take some good backups first, and don't muddle up your disks.... number them or something.
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline holck

  • ****
  • 317
  • +1/-0
Re: Replacing a disk in RAID 5 and RAID1 array?
« Reply #3 on: July 22, 2018, 06:40:14 PM »
Thanks a lot for your advice. I will let you know the result ...

......

Offline mab974

  • *
  • 84
  • +1/-0
Re: Replacing a disk in RAID 5 and RAID1 array?
« Reply #4 on: July 23, 2018, 01:06:42 PM »
Hi,

As an example, here is a tool I use which generates 2 scripts (one with the old disk and one with the new one).
All commands in scripts have to be uncommented before executing.

Code: [Select]
#!/bin/sh
#
# change these values =================
DISK_DEAD="b"
DISK_MODEL="a"
# partitions concerned by RAID (!!! VERY IMPORTANT !!!)
LIST_PARTS=( 1 2 3 5 6 7)
# ================================

PARTS=${LIST_PARTS[@]}
cat > ch_dk_rd-1.sh <<EOF
#         Change one Raid Disk
# disk /dev/sd${DISK_DEAD} is dead and is replaced by a new disk clone of /dev/sd${DISK_MODEL}
# concerned partitions are : ${LIST_PARTS[@]}
# and there are ${#LIST_PARTS[@]} raids elements.
#   GOOD LUCK and TAKE CARE of your disks !!
EOF

echo "" >> ch_dk_rd-1.sh
echo "echo \"Step 1 : failed partitions\"" >> ch_dk_rd-1.sh
MD_NO=0
for PART_NO in $PARTS
do
    echo "#mdadm --manage /dev/md$MD_NO --fail /dev/sd$DISK_DEAD$PART_NO" >> ch_dk_rd-1.sh
    let "MD_NO++"
done
echo "cat /proc/mdstat" >> ch_dk_rd-1.sh

echo "" >> ch_dk_rd-1.sh
echo "echo \"Step 2 : remove partitions\"" >> ch_dk_rd-1.sh
MD_NO=0
for PART_NO in $PARTS
do
    echo "#mdadm --manage /dev/md$MD_NO --remove /dev/sd$DISK_DEAD$PART_NO" >> ch_dk_rd-1.sh
    let "MD_NO++"
done
echo "cat /proc/mdstat" >> ch_dk_rd-1.sh

echo "" >> ch_dk_rd-1.sh
echo "echo \"Step 3 : shutdown and replace disk\"" >> ch_dk_rd-1.sh
echo "echo \"then execute second part (ch_dk_rd-2.sh) on new disk (VERY DANGEROUS !!!)\"" >> ch_dk_rd-1.sh


echo "# Second part on new disk (VERY DANGEROUS !!!) sfdisk as comment" > ch_dk_rd-2.sh
echo "echo \"Step 4 : Duplicate partitions table from disk $DISK_MODEL to new disk $DISK_DEAD\"" >> ch_dk_rd-2.sh
echo "echo \"    ARE YOU SURE ? don't erase a GOOD disk (/dev/sd$DISK_DEAD !?)\"" >> ch_dk_rd-2.sh
echo "" >> ch_dk_rd-2.sh
echo "#sfdisk -d /dev/sd${DISK_MODEL} | sfdisk -f /dev/sd${DISK_DEAD}" >> ch_dk_rd-2.sh
echo "fdisk -l /dev/sd${DISK_DEAD}" >> ch_dk_rd-2.sh

echo "" >> ch_dk_rd-2.sh
echo "echo \"Step 5 : add partitions\"" >> ch_dk_rd-2.sh
MD_NO=0
for PART_NO in $PARTS
do
    echo "#mdadm --manage /dev/md$MD_NO --add /dev/sd$DISK_DEAD$PART_NO" >> ch_dk_rd-2.sh
    let "MD_NO++"
done
echo "cat /proc/mdstat" >> ch_dk_rd-2.sh
echo "echo \"Synchronization has normally begun. See mdstat above !\"" >> ch_dk_rd-2.sh
echo "" >> ch_dk_rd-2.sh

echo "echo \"Step 6 : to make partitions resync go faster try : echo 400000 > proc/sys/dev/raid/speed_limit_max;  echo 400000 > proc/sys/dev/raid/speed_limit_min\"" >> ch_dk_rd-2.sh
echo "echo  \"         and RESET these 2 parameters after resyncing\"" >> ch_dk_rd-2.sh
echo "" >> ch_dk_rd-2.sh

echo "echo \"Step 7 : take a look at lilo or grub, new disk must be able to boot too !\"" >> ch_dk_rd-2.sh
echo "" >> ch_dk_rd-2.sh

echo " \"ch_dk_rd-1.sh and ch_dk_rd-2.sh created.\""
echo " \"Verify(, and again), uncomment et exec the first one !\""

and i am sure the different steps are at least consistent.

hth