Koozali.org: home of the SME Server

SME wont start after May 27 update - revisited Bug 5980

Offline tlicht

  • **
  • 56
  • +0/-0
  • Imagination is more important than knowledge.
Sorry to bother you with the same problem, but the problem is still very much a problem.
It was suggested that I report this as a bug, which I did. During the rather frequent restarts and retries once the the last screen before the boot stopped hinted a disk error - this only happened once:

  Reading all physical volumes.    This may take a while...
  /dev/hda: open failed: Read-only file system
  Attempt to close device '/dev/hda' which is not open.
Activating logical volumes
hda: media error (bad sector): status=0x51 { DriveReady SeekComplete Error }
hda: media error (bad sector): error=0x34
ide: failed opcode was 100
ATAPI device hda:
  Error: Medium error -- (Sense key=0x03)
  Unrecovered read error -- (asc=0xll, ascq=0x00)
  The failed "Read 10" packet command was:
  "28 00 00 01 64 40 00 00 01 00 00 00 00 00 00 00 "
end_request:  I/O error, dev hda, sector 364800
  /dev/hda: read failed after 0 of 2048 at 186777600:  Input/output error
  Volume group "main" not found
ERROR: /bin/lvm exited abnormally! (pid 445)
Creating root device
Mounting root filesystem
mount: error 6 mounting ext3
mount: error 2 mounting none
Switching to new root
switchroot: mount failed: 22
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!


all other times - before and after the above the last screen said this:

md: raidl personality registered as nr 3
Loading jbd.ko module
Loading ext3.ko module
Loading dm-mirror.ko module
Loading dm-zero.ko module
Loading dm-snapshot.ko module
md : mdl stopped.
mdadm: no devices found for /dev/mdl
md: md2 stopped.
mdadm: no devices found for /dev/md2
Making device-mapper control node
Scanning logical volumes
  Reading all physical volumes. This may take a while
Activating logical volumes
  Volume group "main" not found
ERROR: /bin/lvm exited abnormally! (pid 445)
Creating root device
Mounting root filesystem
mount: error 6 mounting ext3
mount: error 2 mounting none
Switching to new root
switchroot: mount failed: 22
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!


I tried to start with the CD and rescue mode, but I really didn't know what to do once I was there.....

I also tried the update from the CD which also did not change anything.
Then I removed one drive thinking that with some luck only one of the drives/partitions was corrupt, but either drive alone still gives the maroon coloured last screen version.

I would be enormously grateful for a way to save the ibays at least.
If it is of any use, I managed to capture most of what is flashing past on the screen during the crippled boot (both drives):

VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
Initializing Cryptographic API
ksign: Installing public key data
Loading keyring
- Added public key C0174E43404E09D
- User ID: CentOS (Kernel Module GPG key)
PCI: Found HT MSI mapping on 0000:00:0b.0 with capability disabled
PCI: Found HT MSI mapping on 0000:00:00.0 with capability enabled
PCI: Found HT MSI mapping on 0000:00:0c.0 with capability disabled
PCI: Found HT MSI mapping on 0000:00:00.0 with capability enabled
PCI: Found HT MSI mapping on 0000:00:0d.0 with capability disabled
PCI: Found HT MSI mapping on 0000:00:00.0 with capability enabled
PCI: Found HT MSI mapping on 0000:00:0e.0 with capability disabled
PCI: Found HT MSI mapping on 0000:00:00.0 with capability enabled
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
ACPI: Fan [FAN] (on)
ACPI: Processor [CPU0] (supports C1)
ACPI: Thermal Zone [THRM] (35 C)
Real Time Clock Driver v1.12
Linux agpgart interface v0.100 (c) Dave Jones
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8942 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 68 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NFORCE-CK804: IDE controller at PCI slot 0000:00:06.0
NFORCE-CK804: chipset revision 162
NFORCE-CK804: not 100x native mode: will probe irqs later
NFORCE-CK804: 0000:00:06.0 (rev a2) UDMA133 controller
    ide0: BM-DMA at 0xe800-0xe807, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xe808-0xe80f, BIOS settings: hdc:DMA, hdd:DMA
hda: BENQ DVD DD DW1620, ATAPI CD/DVD-ROM drive
Using cfq io scheduler
ide0 at 0xlf0-0xlf7,0x3f6 on irq 14
hda: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.26
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
mice: PS/2 mouse device common for all mice
input: AT Tränslated Set 2 keyboard on isa0060/serio0
input:  ImExPS/2 Logitech Explorer Mouse on isa0060/serio1
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
NET: Registered protocol family 2
HUB0 XVR0 XVR1 XVR2 XVR3 USB0 USB2 MMAC MMCI UAR1
ACPI: (supports S0 S3 S4 S5)
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 131672 (order: 9, 3670016 bytes)
TCP: Hash tables configured (established 131072 bind 131072)
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
ACPI wakeup devices:
HUB0 XVR0 XVR1 XVR2 XVR3 USB0 USB2 MMAC MMCI UAR1
ACPI: (supports S0 S3 S4 S5)
Freeing unused kernel mernory: 172k freed
Red Hat nash version 4.2.1.13 starting
Mounted /proc filesystem
Mounting sysfs
Creating /dev
Starting udev
Loading scsi_mod.ko module
SCSI subsystem initialized
Loading sd-mod.ko moduie
Loading libata.ko nodole
Loading ata_piix.ko module
Loading dm-mod.ko module
device-mapper: 4.5.5-ioctl (2006-12-01) initial ised : dm-develeredhat.com
Loading raid1.ko module
md: raidl personality registered as nr 3
Loading jbd.ko module
Loading ext3.ko module
Loading dm-mirror.ko module
Loading dm-zero.ko module
Loading dm-snapshot.ko module
md : mdl stopped.
mdadm: no devices found for /dev/mdl
md: md2 stopped.
mdadm: no devices found for /dev/md2
Making device-mapper control node
Scanning logical volumes
  Reading all physical volumes. This may take a while...
Activating logical volumes
  Volume group "main" not found
ERROR: /bin/lvm exited abnormally! (pid 445)
Creating root device
Mounting root filesystem
mount: error 6 mounting ext3
mount: error 2 mounting none
Switching to new root
switchroot: mount failed: 22
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!
« Last Edit: June 05, 2010, 02:43:04 PM by wellsi »

Offline jester

  • *
  • 496
  • +1/-0
Re: SME wont start after May 27 update - revisited
« Reply #1 on: May 30, 2010, 08:36:27 PM »
Please follow up on your problem in the bug-tracker to keep it in one place, not on the forums. There you are most likely to get the devs attention... and thus the people who are again most likely to be able to help you.

Offline Igi2003

  • ****
  • 226
  • +0/-0
Re: SME wont start after May 27 update - revisited
« Reply #2 on: May 30, 2010, 09:02:26 PM »
Boot with old kernel and build an new initrd with "ahci" Module. Your ATA Controller works in AHCI mode.
For this modify /etc/modprobe.conf an type in "alias scsi_hostadapterX ahci". Instead of X type the next free number.
Type in console "mkinitrd -v -f /boot/initrd-2.6.9-89.0.25.EL.img ´uname-r´" for Uniprozessor Kernel or
"mkinitrd -v -f /boot/initrd-2.6.9-89.0.25.ELsmp.img ´uname-r´" for Multiprozessor Kernel.

Then boot with the new kernel.

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: SME wont start after May 27 update - revisited
« Reply #3 on: May 31, 2010, 01:48:34 AM »
Igi2003, please provide your suggestions via http://bugs.contribs.org/show_bug.cgi?id=5980. Thanks.

Offline tlicht

  • **
  • 56
  • +0/-0
  • Imagination is more important than knowledge.
Re: SME wont start after May 27 update - revisited
« Reply #4 on: May 31, 2010, 01:33:49 PM »
CharlieBrady:
The bug is closed and declared a nobug... also Gavin politely pointed out that bugzilla is not a support forum... It seems therefore most logical to handle support questions in this forum - but I am not as experienced as you, so I might be wrong naturally.

Igi20035:
Thank you for your kind suggestion. I am convinced that your analysis is correct - the problem is probaly that wrong driver for the sata disks is used. Since everything worked fine before the 'latest' update, I would still consider this a bug, but I am not the one to decide....
The slight problem is that the server wouldn't boot with previous kernels either. Also it was suggested (for someone else and also for me privately) that starting with the CD the update alternative would put everything back to normal. Unfortunately it didn't.... neither with the 7.4 or the 7.5 CD
I downloaded the Trinity Rescue Kit which lets me mount the md1 which actually is just sda1, since the update/upgrade exercises seem to have left sdb1 unchanged. md1 contains just the OS in the root and the grub folder. The update managed in any case remove all earlier versions of system.map-xxx, config-xxx, initrd-xxx,symvers-xxx and vmlinuz-xxx.... They are still  there on sdb1 though.
I don't know how to mount md2 (or sda2 or sdb2).... (must specify the filesystem type, obviously not ext2 or ext3). I imagine that the ibays reside there.

cat /proc/mdstat returns:
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid1 sda1[1]
      104320 blocks [2/1] [_U]
     
md2 : active raid1 sda2[1]
      976655488 blocks [2/1] [_U]
unused devices: <none>

which is expected, I guess, under current circumstances...

So in short, I cannot boot the SME server, but I can make changes to sda1 (which is in effect now md1). The problem is I don't know what to change to bring back the server.....

Offline janet

  • ****
  • 4,812
  • +0/-0
Re: SME wont start after May 27 update - revisited
« Reply #5 on: May 31, 2010, 02:09:45 PM »
tlicht

Quote
The bug is closed and declared a nobug... also Gavin politely pointed out that bugzilla is not a support forum... It seems therefore most logical to handle support questions in this forum - but I am not as experienced as you, so I might be wrong naturally.

The idea is you post to bugzilla first if there is a problem (which may or may not be a bug). Of course you can post to the forums first, but if people suggest it should be posted at bugzilla, then you should do that. If your issue is determined not to be a bug, then you come back to the forums with your problem.

Charlie suggested a drive failure in the bug report.
You should run the drive manufacturers diagnostics on all disks in your server asap.
Boot from a floppy or CD eg try the Ultimate Boot CD (UBCD) which has a lot of tools on it or download from maufacturers website, and/or use smartctl to test the drives.

It may be a driver issue, but good to know your drives are OK before proceeding further.

Quote
So in short, I cannot boot the SME server...


If you cannot boot and you say you don't know what to do when you boot up to CD in rescue mode, then how did you get the cat /proc/mdstat output ?

Looking at the bug report there seems to be issues with hda which is a non sata drive.
How many drives are in the server, it seems you may have 1 or 2 sata and 1 IDE ?

Also see
http://wiki.contribs.org/Monitor_Disk_Health

http://wiki.contribs.org/Booting

http://wiki.contribs.org/Raid

If you can start up in rescue mode, then follow the onscreen suggestions and then you get to a command prompt.
It is there you will need to see what is going on etc, and can use smartctl etc etc.
Please search before asking, an answer may already exist.
The Search & other links to useful information are at top of Forum.

Offline tlicht

  • **
  • 56
  • +0/-0
  • Imagination is more important than knowledge.
Re: SME wont start after May 27 update - revisited
« Reply #6 on: May 31, 2010, 05:51:21 PM »
Thanks Mary,

Quote
Looking at the bug report there seems to be issues with hda which is a non sata drive.
How many drives are in the server, it seems you may have 1 or 2 sata and 1 IDE ?
Interesting thing with hda - I don't have one unless the cdrom counts for one. I have (apart from the cd/dvd drive) two sata disks.

Quote
If you cannot boot and you say you don't know what to do when you boot up to CD in rescue mode,
then how did you get the cat /proc/mdstat output ?
I got the mdstat after booting from the Trinity Rescue Kit  - a CD linux, which tries to scan and mount everything it finds. It also lets you edit files etc.

I got the UBCD - it doesn't seem to find any errors on the sata disks. I admit I just run the short tests since the drives are 1TB each.....

In the meanwhile the bug 5980 has been reopened (thanx Charlie) meaning that I will again post what is happening there.

Offline byte

  • *
  • 2,183
  • +2/-0
Re: SME wont start after May 27 update - revisited
« Reply #7 on: May 31, 2010, 06:08:26 PM »
CharlieBrady:
The bug is closed and declared a nobug... also Gavin politely pointed out that bugzilla is not a support forum... It seems therefore most logical to handle support questions in this forum - but I am not as experienced as you, so I might be wrong naturally.

It appears I might have been wrong to close the bug, please do follow up on the bug tracker and we will investigate the issue further. Thanks.

I'll lock this thread, so that all involved can follow up on http://bugs.contribs.org/show_bug.cgi?id=5980. Thanks again.
--[byte]--

Have you filled in a Bug Report over @ http://bugs.contribs.org ? Please don't wait to be told this way you help us to help you/others - Thanks!