Koozali.org: home of the SME Server

Buffer I/O error on device dm-3 after kernel upgrade to 2.6.18-348.1.el5.i686

Offline jufra

  • *
  • 34
  • +0/-0
Hi,

had a few updates coming through today including a kernel upgrade. After this I found the following messages in my log:
Code: [Select]
Sep 28 16:57:28 bionfileserver kernel: scsi 2:0:0:0: rejecting I/O to dead device
Sep 28 16:57:33 bionfileserver last message repeated 30 times
Sep 28 16:57:33 bionfileserver kernel: Buffer I/O error on device dm-3, logical block 488383936
Sep 28 16:57:33 bionfileserver kernel: Buffer I/O error on device dm-3, logical block 488383937
Sep 28 16:57:33 bionfileserver kernel: Buffer I/O error on device dm-3, logical block 488383938
Sep 28 16:57:33 bionfileserver kernel: Buffer I/O error on device dm-3, logical block 488383939
Sep 28 16:57:33 bionfileserver kernel: scsi 2:0:0:0: rejecting I/O to dead device
Sep 28 16:57:33 bionfileserver kernel: Buffer I/O error on device dm-3, logical block 488383936
Sep 28 16:57:33 bionfileserver kernel: Buffer I/O error on device dm-3, logical block 488383937
Sep 28 16:57:33 bionfileserver kernel: Buffer I/O error on device dm-3, logical block 488383938
Sep 28 16:57:33 bionfileserver kernel: Buffer I/O error on device dm-3, logical block 488383939
Sep 28 16:57:33 bionfileserver kernel: scsi 2:0:0:0: rejecting I/O to dead device
Sep 28 16:57:33 bionfileserver kernel: Buffer I/O error on device dm-3, logical block 0
Sep 28 16:57:33 bionfileserver kernel: Buffer I/O error on device dm-3, logical block 1
Sep 28 16:57:33 bionfileserver kernel: scsi 2:0:0:0: rejecting I/O to dead device
Sep 28 16:57:43 bionfileserver last message repeated 3 times
Sep 28 16:57:43 bionfileserver kernel: printk: 54 messages suppressed.
Sep 28 16:57:43 bionfileserver kernel: Buffer I/O error on device dm-3, logical block 488383936
Sep 28 16:57:43 bionfileserver kernel: scsi 2:0:0:0: rejecting I/O to dead device
Sep 28 16:57:43 bionfileserver yum: Installed: kernel-2.6.18-348.18.1.el5.i686

Is this something I need to worry about (HDD about to fail etc) or something to be expected?

Thanks a lot
Cheers
Frank

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
It's something you should be concerned about. You need to determine what device had been declared 'dead' and work out what needs to be done about that.

Offline jufra

  • *
  • 34
  • +0/-0
Thank you Charlie. (Stupid) question, how do I determine what dm-3 actually is? Couldn't find it when I used mount, dmesg, smartctl...
Also this one didn't show it
Code: [Select]
ls -l /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root  9 Sep 28 17:01 ata-LEXAR_ATA_FLASH_92102680119299097289 -> ../../hda
lrwxrwxrwx 1 root root 10 Sep 28 17:01 ata-LEXAR_ATA_FLASH_92102680119299097289-part1 -> ../../hda1
lrwxrwxrwx 1 root root  9 Sep 28 17:01 scsi-SATA_Hitachi_HTS7232_E3820062G083JA -> ../../sda
lrwxrwxrwx 1 root root 10 Sep 28 17:01 scsi-SATA_Hitachi_HTS7232_E3820062G083JA-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Sep 28 17:01 scsi-SATA_Hitachi_HTS7232_E3820062G083JA-part2 -> ../../sda2
lrwxrwxrwx 1 root root  9 Sep 28 17:01 scsi-SATA_Hitachi_HTS7232_E3820062G1GZEA -> ../../sdb
lrwxrwxrwx 1 root root 10 Sep 28 17:01 scsi-SATA_Hitachi_HTS7232_E3820062G1GZEA-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Sep 28 17:01 scsi-SATA_Hitachi_HTS7232_E3820062G1GZEA-part2 -> ../../sdb2
lrwxrwxrwx 1 root root  9 Sep 28 17:24 usb-Generic_External_57442D574341563944363135 -> ../../sde
lrwxrwxrwx 1 root root 10 Sep 28 17:24 usb-Generic_External_57442D574341563944363135-part1 -> ../../sde1
lrwxrwxrwx 1 root root  9 Sep 28 17:01 usb-Lexar_JumpDrive_AAUDUOYG87E3GS9K -> ../../sdd
lrwxrwxrwx 1 root root 10 Sep 28 17:01 usb-Lexar_JumpDrive_AAUDUOYG87E3GS9K-part1 -> ../../sdd1

I did a badblocks on the sda/sdb drives, and that indicated no bad blocks...
So any pointers are much appreciated.
Thanks
Frank

Offline jufra

  • *
  • 34
  • +0/-0
PS: ALso, there is no scsi 2 device as the original error message suggests:
Code: [Select]
cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: Hitachi HTS72322 Rev: ECBO
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: Hitachi HTS72322 Rev: ECBO
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi3 Channel: 00 Id: 00 Lun: 00
  Vendor: Lexar    Model: JumpDrive        Rev: 1100
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi4 Channel: 00 Id: 00 Lun: 00
  Vendor: Generic  Model: External         Rev: 2.10
  Type:   Direct-Access                    ANSI SCSI revision: 02

Weird?

Offline jufra

  • *
  • 34
  • +0/-0
I think I solved the mystery. I restarted the server to see if the messages would come up again. They didn't. But now I cat the /proc/scsi/scsi file and get

Code: [Select]
cat scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: Hitachi HTS72322 Rev: ECBO
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: Hitachi HTS72322 Rev: ECBO
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: Generic  Model: External         Rev: 2.10
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi3 Channel: 00 Id: 00 Lun: 00
  Vendor: Lexar    Model: JumpDrive        Rev: 1100
  Type:   Direct-Access                    ANSI SCSI revision: 02
So my external backup drive has moved to scsi 2 after the reboot. I did notice in the past that every time I reboot the server the usb backup needs to be unplugged and plugged back in for the backup to work. I did that also yesterday and maybe the system was accessing the USB when I unplugged it, hence the dead device message...?


Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
I did that also yesterday and maybe the system was accessing the USB when I unplugged it, hence the dead device message...?

That sounds quite likely to me. You have to make sure that USB devices are not mounted before you unplug them. Symptoms such as you reported are very likely if you just physically remove a device while it is in use.