Koozali.org: home of the SME Server

Kernel panic, Aiiieee!

Offline Denbert

  • *
  • 156
  • +0/-0
    • hegnstoften.net
Kernel panic, Aiiieee!
« on: October 24, 2005, 08:23:52 PM »
Hi,

I’ve been upgrading my hardware over the weekend.

Today the server has crashed twice! First time with so many bad blocks on /dev/hda3 that it couldn’t fix it. I added a new primary disk installed fresh SME 6.0.1 and updated via yum – then I mounted the crashed disk and ran fsck and that helped on the disk.

I noticed that the disk( Brand new ATA133 MAXTOR 160GB) was VERY warm.

I tried to install the server again and made a perfect restore – 2 hours later same error!

Then I installed the server with the old 40 gb. Disk and made a restore of the server (off course applied with the updates by yum)

 Then the server crashed again, and same story with a VERY warm disk.

Now the server is on another disk with an extra cooler on it. For some strange reason the disks runs like hell and then the server crashes!

Looking at message log shows this, witch I’ve newer seen before:

Oct 24 20:10:24 server kernel: denylog:IN=eth1 OUT= MAC=00:60:52:0a:ab:bf:00:08:20:ee:bc:38:08:00 SRC=61.120.206.13 DST=83.95.154.52 LEN=64 TOS=0x00 PREC=0x00 TTL=40 ID=60165 DF PROTO=TCP SPT=39098 DPT=25 WINDOW=32850 RES=0x00 SYN URGP=0
Oct 24 20:10:27 server kernel: denylog:IN=eth1 OUT= MAC=00:60:52:0a:ab:bf:00:08:20:ee:bc:38:08:00 SRC=61.120.206.13 DST=83.95.154.52 LEN=64 TOS=0x00 PREC=0x00 TTL=40 ID=60166 DF PROTO=TCP SPT=39098 DPT=25 WINDOW=32850 RES=0x00 SYN URGP=0
Oct 24 20:10:44 server kernel: denylog:IN=eth1 OUT= MAC=00:60:52:0a:ab:bf:00:08:20:ee:bc:38:08:00 SRC=61.120.206.13 DST=83.95.154.52 LEN=64 TOS=0x00 PREC=0x00 TTL=40 ID=60168 DF PROTO=TCP SPT=39098 DPT=25 WINDOW=32850 RES=0x00 SYN URGP=0
Oct 24 20:10:54 server kernel: denylog:IN=eth1 OUT= MAC=00:60:52:0a:ab:bf:00:08:20:ee:bc:38:08:00 SRC=61.120.206.13 DST=83.95.154.52 LEN=64 TOS=0x00 PREC=0x00 TTL=40 ID=60169 DF PROTO=TCP SPT=39098 DPT=25 WINDOW=32850 RES=0x00 SYN URGP=0
Oct 24 20:11:04 server kernel: denylog:IN=eth1 OUT= MAC=00:60:52:0a:ab:bf:00:08:20:ee:bc:38:08:00 SRC=61.120.206.13 DST=83.95.154.52 LEN=64 TOS=0x00 PREC=0x00 TTL=40 ID=60170 DF PROTO=TCP SPT=39098 DPT=25 WINDOW=32850 RES=0x00 SYN URGP=0
Oct 24 20:11:14 server kernel: denylog:IN=eth1 OUT= MAC=00:60:52:0a:ab:bf:00:08:20:ee:bc:38:08:00 SRC=61.120.206.13 DST=83.95.154.52 LEN=64 TOS=0x00 PREC=0x00 TTL=40 ID=60171 DF PROTO=TCP SPT=39098 DPT=25 WINDOW=32850 RES=0x00 SYN URGP=0
Oct 24 20:11:24 server kernel: denylog:IN=eth1 OUT= MAC=00:60:52:0a:ab:bf:00:08:20:ee:bc:38:08:00 SRC=61.120.206.13 DST=83.95.154.52 LEN=64 TOS=0x00 PREC=0x00 TTL=40 ID=60172 DF PROTO=TCP SPT=39098 DPT=25 WINDOW=32850 RES=0x00 SYN URGP=0

[snip]

Goes on every 7 – 10 second

Anyone got an idea what’s going on???

I’m getting desperate, my e-smith/sme servers has been serving me and my family for years with no problemo.
/ Denbert
"Success is not final, failure is not fatal: it is the courage to continue that counts" - Sir Winston Churchill

Offline hanscees

  • *
  • 267
  • +0/-0
    • nl.linkedin.com/in/hanscees/
Kernel panic, Aiiieee!
« Reply #1 on: October 24, 2005, 11:16:26 PM »
Hi,

two things going on:
- somehow someone is trying to send you email (DPT or destination port 25). This traffic is blocked by iptables.
- your crashes. Is there any logging? Please post it.

I would test the memory. If you have a crashed disk often the ram is not good. If you have kernel oopses usually also ram.

suse install disks have a ram test program on them for instance.

But without logging one cannot be sure.

Hans-Cees
nl.linkedin.com/in/hanscees/

Offline raem

  • *
  • 3,972
  • +4/-0
Re: Kernel panic, Aiiieee!
« Reply #2 on: October 25, 2005, 09:13:43 AM »
Denbert

What version of sme were you running previously (I assume without problems) ?

The masq logging level can be  altered, search the SME FAQs (see link on left) for how to do this. Earlier versions of sme did not have the same logging level enabled. That "noise" is there all the time on most public servers, it's just being logged more on sme 6.x.

Modern large hard disks do run very hot !

Is your setup a software RAID1, if so the array may be syncing and therefore the disks are busy and hot.

Charlie Brady has a memory test utility in his contrib area. Note that sme 6.x was more strict about memory, and memory that was seemingly OK in earlier sme versions (but faulty), caused problems in sme 6.x
see
ftp://ftp.ibiblio.org/pub/linux/distributions/e-smith/contrib/CharlieBrady/memtest/


Usage notes from devinfo
http://www.mail-archive.com/devinfo%40lists.e-smith.org/msg11704.html

Problem: you suspect the RAM in your server, or you wish to stress the
system a little and see that it is still reliable

Answer: install my background memory tester!

# F=ftp://ftp.e-smith.org/pub/e-smith/contrib/CharlieBrady/memtest
# rpm -Uhv $F/e-smith-memtest-0.0.1-04.noarch.rpm  \
    $Fmemtester-2.93.1-01.i386.rpm
# /sbin/e-smith/config set memtest service status enabled
# svc-start memtest
Logs will be found in /var/log/memtest

Do:
# svc-stop memtest
# /sbin/e-smith/config set memtest service status disabled
to stop testing
...

thedude

Kernel panic, Aiiieee!
« Reply #3 on: October 26, 2005, 08:11:28 AM »
I had this exact same thing happen to me, and it turned out to be a bad motherboard. It was an ECS board.

The problem started with a locked up server, and after troubleshooting I saw that the harddrive was running constantly. It also got really hot, and then finally crashed.

I replaced the motherboard (all other hardware remained the same), reinstalled SME, and it went right back into production.

Offline Denbert

  • *
  • 156
  • +0/-0
    • hegnstoften.net
Re: Kernel panic, Aiiieee!
« Reply #4 on: October 27, 2005, 11:23:13 PM »
Hi Ray,

Finally back with new build server!

Quote from: "RayMitchell"
Denbert

What version of sme were you running previously (I assume without problems) ?



I’ve always been on latest SME server without any problems, the server I was moving to new hardware was SME 6.0.1 with the “Plus” script – one of the thing I’ve been doing was to go back to an almost clean SME 6.x server, to avoid problems when upgrading to the final 7.0

Quote from: "RayMitchell"


The masq logging level can be  altered, search the SME FAQs (see link on left) for how to do this. Earlier versions of sme did not have the same logging level enabled. That "noise" is there all the time on most public servers, it's just being logged more on sme 6.x.



After looking more into my message log, I decided to install Dshield, as I don’t like these “requests”:

Oct 27 23:10:27 server kernel: denylog:IN=eth1 OUT= MAC=00:03:ce:88:b9:16:00:08:20:ee:bc:38:08:00 SRC=83.95.245.6 DST=83.95.154.52 LEN=48 TOS=0x00 PREC=0x00 TTL=120 ID=22101 DF PROTO=TCP SPT=4046 DPT=139 WINDOW=64240 RES=0x00 SYN URGP=0
Oct 27 23:10:31 server kernel: denylog:IN=eth1 OUT= MAC=00:03:ce:88:b9:16:00:08:20:ee:bc:38:08:00 SRC=83.95.105.144 DST=83.95.154.52 LEN=48 TOS=0x00 PREC=0x00 TTL=120 ID=18073 DF PROTO=TCP SPT=2885 DPT=139 WINDOW=64240 RES=0x00 SYN URGP=0
Oct 27 23:10:33 server kernel: denylog:IN=eth1 OUT= MAC=00:03:ce:88:b9:16:00:08:20:ee:bc:38:08:00 SRC=83.95.239.67 DST=83.95.154.52 LEN=48 TOS=0x00 PREC=0x00 TTL=122 ID=4945 DF PROTO=TCP SPT=1324 DPT=445 WINDOW=64240 RES=0x00 SYN URGP=0
Oct 27 23:10:36 server kernel: denylog:IN=eth1 OUT= MAC=00:03:ce:88:b9:16:00:08:20:ee:bc:38:08:00 SRC=83.95.239.67 DST=83.95.154.52 LEN=48 TOS=0x00 PREC=0x00 TTL=122 ID=5144 DF PROTO=TCP SPT=1324 DPT=445 WINDOW=64240 RES=0x00 SYN URGP=0

I used Knuddi’s howto - http://sme.swerts-knudsen.com/index.html?frame=http%3A//sme.swerts-knudsen.com/howtos/howto_13.htm


Quote from: "RayMitchell"


Modern large hard disks do run very hot !

Is your setup a software RAID1, if so the array may be syncing and therefore the disks are busy and hot.



Yep, I’ve installed raid1 and have now set the disks in active coolers, to avoid the heat.

Quote from: "RayMitchell"


Charlie Brady has a memory test utility in his contrib area. Note that sme 6.x was more strict about memory, and memory that was seemingly OK in earlier sme versions (but faulty), caused problems in sme 6.x
see
ftp://ftp.ibiblio.org/pub/linux/distributions/e-smith/contrib/CharlieBrady/memtest/


Usage notes from devinfo
http://www.mail-archive.com/devinfo%40lists.e-smith.org/msg11704.html

Problem: you suspect the RAM in your server, or you wish to stress the
system a little and see that it is still reliable

Answer: install my background memory tester!

# F=ftp://ftp.e-smith.org/pub/e-smith/contrib/CharlieBrady/memtest
# rpm -Uhv $F/e-smith-memtest-0.0.1-04.noarch.rpm  \
    $Fmemtester-2.93.1-01.i386.rpm
# /sbin/e-smith/config set memtest service status enabled
# svc-start memtest
Logs will be found in /var/log/memtest

Do:
# svc-stop memtest
# /sbin/e-smith/config set memtest service status disabled
to stop testing


Thanks, I’ll test the former “new hardware” in order to see where the problem is and then reclaim my money – I also suspect the memory.
/ Denbert
"Success is not final, failure is not fatal: it is the courage to continue that counts" - Sir Winston Churchill

Offline Denbert

  • *
  • 156
  • +0/-0
    • hegnstoften.net
Re: Kernel panic, Aiiieee!
« Reply #5 on: September 17, 2006, 10:11:16 PM »
Quote from: "RayMitchell"


Problem: you suspect the RAM in your server, or you wish to stress the
system a little and see that it is still reliable

Answer: install my background memory tester!

# F=ftp://ftp.e-smith.org/pub/e-smith/contrib/CharlieBrady/memtest
# rpm -Uhv $F/e-smith-memtest-0.0.1-04.noarch.rpm  \
    $Fmemtester-2.93.1-01.i386.rpm
# /sbin/e-smith/config set memtest service status enabled
# svc-start memtest
Logs will be found in /var/log/memtest

Do:
# svc-stop memtest
# /sbin/e-smith/config set memtest service status disabled
to stop testing


Tried to install memtest on a updated SME 7.0:

[root@server memtest]# rpm -ivh *.rpm
Preparing...                ########################################### [100%]
   1:memtester              ########################################### [ 50%]
groupadd: gid 452 is not unique
error: %pre(e-smith-memtest-0.0.1-04.noarch) scriptlet failed, exit status 1
error:   install: %pre scriptlet failed (2), skipping e-smith-memtest-0.0.1-04

rpm -Uvh *.rpm
Preparing...                ########################################### [100%]
        package memtester-2.93.1-01 is already installed

[root@server memtest]# /sbin/e-smith/config set memtest service status enabled
[root@server memtest]# svc-start memtest
/usr/bin/svc-start: line 3: /etc/rc.d/init.d/memtest: No such file or directory
/usr/bin/svc-start: line 3: exec: /etc/rc.d/init.d/memtest: cannot execute: No such file or directory

Anyone got a memtest solution for SME 7.x ?
/ Denbert
"Success is not final, failure is not fatal: it is the courage to continue that counts" - Sir Winston Churchill

Offline raem

  • *
  • 3,972
  • +4/-0
Re: Kernel panic, Aiiieee!
« Reply #6 on: September 18, 2006, 03:47:04 AM »
Denbert

> Tried to install memtest on a updated SME 7.0:
> Anyone got a memtest solution for SME 7.x ?

Your efforts were not necessary
see
http://forums.contribs.org/index.php?topic=31011.0
& look for memtest
...

Offline Denbert

  • *
  • 156
  • +0/-0
    • hegnstoften.net
Kernel panic, Aiiieee!
« Reply #7 on: September 20, 2006, 02:29:16 PM »
Thanks Ray,

I’ll remember to read the full release announcement in the future :-D

I have a failure, as my server says Kernel Panic – no syncing and then it locks.

It happens in the reboot process, in the end after release of /dev/md1

I ran the fantastic memtest86, but there where no errors at all.

Anyone got a suggestion to put me in the right direction – I can’t backup the server, as it stalls in the backup process in from the Server-Manager
/ Denbert
"Success is not final, failure is not fatal: it is the courage to continue that counts" - Sir Winston Churchill

Offline raem

  • *
  • 3,972
  • +4/-0
Kernel panic, Aiiieee!
« Reply #8 on: September 20, 2006, 08:39:59 PM »
Denbert

> I have a failure, as my server says Kernel Panic – no syncing and then it locks.

You have told us very little, look in the log files.

I suggest you search the forums AND bugtracker on Kernel panic and/or the exact phrase you see in your log files.
...

Offline Denbert

  • *
  • 156
  • +0/-0
    • hegnstoften.net
Kernel panic, Aiiieee!
« Reply #9 on: October 19, 2006, 10:59:12 AM »
All right, I’m finally back, after some more investigating on the server.

The hardware is:

Initializing CPU#0
Detected 1733.405 MHz processor.
Calibrating delay loop... 3460.30 BogoMIPS
Memory: 511720k/524224k available (1160k kernel code, 9940k reserved, 983k data, 120k init, 0k highmem)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 0383fbff c1c3fbff 00000000 00000000
CPU:             Common caps: 0383fbff c1c3fbff 00000000 00000000
CPU: AMD Athlon(tm) XP 2100+ stepping 02
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SIS5513: IDE controller at PCI slot 00:02.5
SIS5513: chipset revision 0
SIS5513: not 100% native mode: will probe irqs later
SiS746    ATA 133 controller
    ide0: BM-DMA at 0xff00-0xff07, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xff08-0xff0f, BIOS settings: hdc:DMA, hdd:DMA
hda: Maxtor 6Y120L0, ATA DISK drive
hdb: Maxtor 6Y120L0, ATA DISK drive
hdc: ASUS DVD-E616A, ATAPI CD/DVD-ROM drive
hda: 240121728 sectors (122942 MB) w/2048KiB Cache, CHS=14946/255/63, UDMA(133)
hdb: 240121728 sectors (122942 MB) w/2048KiB Cache, CHS=14946/255/63, UDMA(133)
hda: hda1 hda2 hda3
 hdb: hdb1 hdb2 hdb3
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.

The server has been running a year on an updated SME 6.0.1 with no problems. After backup of IBAYs via SMB I jumped in to UPGRADE the server to 7.0 – The server had more than 2 gb (mainly emails) and therefore I couldn’t backup the server via Server-Manager.

The Upgrade runs fine, but when rebooting I get Kernel panic!

I’ve tried to change HDD, Memory and CPU – Same error. So it must be something with the SIS chipset, but here is my question:

Why does the server runs ROCK STABLE on SME 6.0.1, but fails to run on SME 7.0

I’ve tried clean install with both HDD and in single HDD configuration, but the server gets Kernel panic! In the end of a reboot after “flushing hardware”.

Did the developers discontinued the support for SIS chipset?

The server is back on SME 6.0.1 with latest Maintenance Updates installed – And yes, it’s rock stable.

I’ll hate to scrap the motherboard and just hope that the Maintenance Team will keep the SME 6.x version alive the next 12 month.
/ Denbert
"Success is not final, failure is not fatal: it is the courage to continue that counts" - Sir Winston Churchill

Offline raem

  • *
  • 3,972
  • +4/-0
Kernel panic, Aiiieee!
« Reply #10 on: October 19, 2006, 11:42:22 AM »
Denbert

> The Upgrade runs fine, but when rebooting I get Kernel panic!

You really need to post the full error message from the log file, not just summarise it.
Like I said before, have you searched the bugtracker (and google too) for that exact error message ?


> Why does the server runs ROCK STABLE on SME 6.0.1, but fails to run on SME 7.0

sme6 was built on RedHat7.3 whereas sme7 is built on Centos4.3. There  may be hardware issues if something in your old system is not supported by Centos. Check RHEL hardware compatibility lists, and check (search) the bugtracker !
...