Koozali.org: home of the SME Server

[SOLVED] Failed upgrade...trying not to panic

Offline nekatreven

  • 20
  • +0/-0
[SOLVED] Failed upgrade...trying not to panic
« on: August 18, 2012, 04:30:12 PM »
Good morning...I guess  :shock:

I attempted an upgrade via yum last night from 7.6 to 8.0.

During the step "Perform the upgrade"...
Quote
yum --disablerepo '*' \
    $(echo --enablerepo\ sme{os,updates,extras,addons}) \
    upgrade

...my ssh session died and of course I had failed to start the command from within screen so I think yum died halfway through. I remember that yum said it wanted to upgrade/install a total of 601 packages, but when I look at the end of /var/log/yum/yum.log I've only got 468.

Right now yum is broken.

Quote
[root@host ~]# yum update
Plugin "fastestmirror" uses deprecated constant TYPE_INTERFACE.
Please use TYPE_INTERACTIVE instead.
Loaded plugins: fastestmirror, installonlyn, protect-packages, smeserver
Loading mirror speeds from cached hostfile
Traceback (most recent call last):
  File "/usr/bin/yum", line 29, in ?
    yummain.user_main(sys.argv[1:], exit_code=True)
  File "/usr/share/yum-cli/yummain.py", line 309, in user_main
    errcode = main(args)
  File "/usr/share/yum-cli/yummain.py", line 178, in main
    result, resultmsgs = base.doCommands()
  File "/usr/share/yum-cli/cli.py", line 345, in doCommands
    self._getTs(needTsRemove)
  File "/usr/lib/python2.4/site-packages/yum/depsolve.py", line 101, in _getTs
    self._getTsInfo(remove_only)
  File "/usr/lib/python2.4/site-packages/yum/depsolve.py", line 112, in _getTsInfo
    pkgSack = self.pkgSack
  File "/usr/lib/python2.4/site-packages/yum/__init__.py", line 662, in <lambda>
    pkgSack = property(fget=lambda self: self._getSacks(),
  File "/usr/lib/python2.4/site-packages/yum/__init__.py", line 502, in _getSacks
    self.repos.populateSack(which=repos)
  File "/usr/lib/python2.4/site-packages/yum/repos.py", line 232, in populateSack
    self.doSetup()
  File "/usr/lib/python2.4/site-packages/yum/repos.py", line 79, in doSetup
    self.ayum.plugins.run('postreposetup')
  File "/usr/lib/python2.4/site-packages/yum/plugins.py", line 179, in run
    func(conduitcls(self, self.base, conf, **kwargs))
  File "/usr/lib/yum-plugins/fastestmirror.py", line 78, in postreposetup_hook
    repo.set('urls', repomirrors[str(repo)])
AttributeError: 'YumRepository' object has no attribute 'set'

My initial attempts to prod at yum by (re)/installing it and its deps were met with nice messages like:
Quote
file /usr/lib/yum-plugins/fastestmirror.py from install of yum-fastestmirror-1.1.16-21.el5.centos.noarch conflicts with file from package yum-plugin-fastestmirror-0.2.4-3.c4.noarch
file /usr/lib/yum-plugins/protect-packages.py from install of yum-protect-packages-1.1.16-21.el5.centos.noarch conflicts with file from package smeserver-yum-2.0.0-16.el4.sme.noarch

I have all of /home rsynced off to two other locations but with 180gb of mail it seemed like the simplest way to make a quick pre-upgrade system backup was to fail the RAID1 out. So I did:
Quote
mdadm --manage /dev/md2 --fail /dev/sdb2
...right before starting the upgrade command that failed.

So right now I can either try to fix the booted system which is still up from when the error happened, or revert to the pre-upgrade disk and just copy the last few hours of mail over to it. I'm leaning towards the latter, even though there will likely be at least some fighting with mdadm.

I'm about to head back in to work so if anyone has any heroic advice I'm all ears!

TIA,
Mark
« Last Edit: August 18, 2012, 10:46:41 PM by nekatreven »

Offline nekatreven

  • 20
  • +0/-0
Re: Failed upgrade...trying not to panic
« Reply #1 on: August 18, 2012, 04:35:34 PM »
I do also have 1 week of backups on usb disk in case my decision to degrade the array comes back to bite me; I just don't want to have to server offline long enough to do a full restore.

Offline nekatreven

  • 20
  • +0/-0
Re: Failed upgrade...trying not to panic
« Reply #2 on: August 18, 2012, 05:43:52 PM »
...I'm just going to post progress here to thin things out a little from "help it's broke" and increase the chance of free help or confirmation. You have to make it exciting to have any hope of pulling people in on the weekend!  :D

So having given this a little more thought, my main concern is getting rebooted onto the disk that I failed-out, and still have the RAID working when it is done booting.

If/when I unplug the disk I'm running on now (sda) and reboot, md will only see sdb and will see that sdb2 is marked failed so md2 will probably not start...only md1 will be up. At that point, [shame]based on times I've broken it in the past[/shame], LVM will likely start up the server's /dev/mapper/main-root LV directly from /dev/sdb2 and ignore the md level entirely. That won't cause data loss but (I think) it does introduce a chicken and egg problem with getting md back running. That would have to be done offline.

So I think what I'm going to do is physically pull the sdb disk from my server and plug it into another system and boot up on a Knoppix CD. That way I can fight mdadm until it starts md2 in degraded mode on sdb2 only, while still keeping the current crippled but functioning system up on /dev/sda in case I have to go with fixing the failed upgrade after all.

Offline janet

  • ****
  • 4,812
  • +0/-0
Re: Failed upgrade...trying not to panic
« Reply #3 on: August 18, 2012, 08:40:21 PM »
nekatreven

Download the sme8.0 iso, create the sme boot/install CD & run the upgrade from CD. Hopefully that gets the server upgraded and running. Then rebuild the RAID later.
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline nekatreven

  • 20
  • +0/-0
Re: Failed upgrade...trying not to panic
« Reply #4 on: August 18, 2012, 10:46:23 PM »
Quote
create the sme boot/install CD & run the upgrade from CD

Hello, and thanks for the suggestion.

I wasn't too sure that the CD upgrade would be able to fix this! I definitely would have given it a shot, but I'd already pulled the trigger on trying the other method before I got back around to the forum.

Fortunately that worked. I was however slightly mistaken on the md stuff.

Before pulling sdb I went ahead and failed sdb1 out of md1, then did --remove on both sdb partitions, and then pulled sdb out of the case. However when I got into knoppix the fact I'd failed those devices didn't carry over and both md devices started up in degraded state no problem. Now I remember that stuff about the system booting without the md layer was what may happen if you mess up md's magic block on all of your disks, which wasn't the case here.

So I was making it too hard. All I had to do was rsync /home/e-smith/files over yesterday's backup copy on another server, boot up on the sdb disk I'd failed-out, and I was back in business. With all mail services stopped I rsynced /home/e-smith/files from the backup server back onto SME and then started everything back up. I'm obviously missing some log data about those emails but I'm not too worried about that.

When the rebuild finishes I should be able to try again.