Koozali.org formerly Contribs.org

System Stall

System Stall
« on: March 03, 2005, 01:22:09 AM »
we are experiencing some problems with out SME 6.0.1 server. We receive the following messages (from the log file) and the system then stalls and can only be accessed via a reset;

[start extract]
Mar  3 00:33:32 elex kernel:  <1>Unable to handle kernel paging request at virtual address 680b74b0
Mar  3 00:33:32 elex kernel:  printing eip:
Mar  3 00:33:32 elex kernel: c0127bf7
Mar  3 00:33:32 elex kernel: *pde = 00000000
Mar  3 00:33:32 elex kernel: Oops: 0000
Mar  3 00:33:32 elex kernel: appletalk via-rhine mii ide-scsi scsi_mod ide-cd cdrom ehci-hcd usb-uhci usbcore ext3 jbd  
Mar  3 00:33:32 elex kernel: CPU:    0
Mar  3 00:33:32 elex kernel: EIP:    0010:[find_vma+55/96]    Not tainted
Mar  3 00:33:32 elex kernel: EIP:    0010:[<c0127bf7>]    Not tainted
Mar  3 00:33:32 elex kernel: EFLAGS: 00010206
Mar  3 00:33:32 elex kernel:
Mar  3 00:33:32 elex kernel: EIP is at find_vma [kernel] 0x37 (2.4.20-18.7)
Mar  3 00:33:32 elex kernel: eax: c013fae8   ebx: 001ef037   ecx: c013fae8   edx: 680b74c0
Mar  3 00:33:32 elex kernel: esi: c96b1c28   edi: 00000000   ebp: c25147bc   esp: c162ff08
Mar  3 00:33:32 elex kernel: ds: 0018   es: 0018   ss: 0018
Mar  3 00:33:32 elex kernel: Process kswapd (pid: 5, stackpage=c162f000)
Mar  3 00:33:32 elex kernel: Stack: c1000030 001ef037 c0139622 c96b1c28 001ef037 c02da520 c96b1c28 001ef037
Mar  3 00:33:32 elex kernel:        c118d890 c118d890 c118d890 00000006 00000000 ffffffff c013971a 00726e00
Mar  3 00:33:32 elex kernel:        c118d890 c013518f c0344c80 00000000 00000000 00000000 c118d890 00000000
Mar  3 00:33:32 elex kernel: Call Trace:   [try_to_unmap_one+306/448] try_to_unmap_one [kernel] 0x132 (0xc162ff10))
Mar  3 00:33:32 elex kernel: Call Trace:   [<c0139622>] try_to_unmap_one [kernel] 0x132 (0xc162ff10))
Mar  3 00:33:32 elex kernel: [try_to_unmap+106/400] try_to_unmap [kernel] 0x6a (0xc162ff40))
Mar  3 00:33:32 elex kernel: [<c013971a>] try_to_unmap [kernel] 0x6a (0xc162ff40))
Mar  3 00:33:32 elex kernel: [add_to_swap+95/128] add_to_swap [kernel] 0x5f (0xc162ff4c))
Mar  3 00:33:32 elex kernel: [<c013518f>] add_to_swap [kernel] 0x5f (0xc162ff4c))
Mar  3 00:33:32 elex kernel: [launder_page+1118/1648] launder_page [kernel] 0x45e (0xc162ff6c))
Mar  3 00:33:32 elex kernel: [<c0130e4e>] launder_page [kernel] 0x45e (0xc162ff6c))
Mar  3 00:33:32 elex kernel: [rebalance_dirty_zone+89/128] rebalance_dirty_zone [kernel] 0x59 (0xc162ff84))
Mar  3 00:33:32 elex kernel: [<c0132d09>] rebalance_dirty_zone [kernel] 0x59 (0xc162ff84))
Mar  3 00:33:32 elex kernel: [do_try_to_free_pages_kswapd+81/672] do_try_to_free_pages_kswapd [kernel] 0x51 (0xc162ffa4))
Mar  3 00:33:32 elex kernel: [<c0133191>] do_try_to_free_pages_kswapd [kernel] 0x51 (0xc162ffa4))
Mar  3 00:33:32 elex kernel: [kswapd+289/1056] kswapd [kernel] 0x121 (0xc162ffd4))
Mar  3 00:33:32 elex kernel: [<c0133591>] kswapd [kernel] 0x121 (0xc162ffd4))
Mar  3 00:33:32 elex kernel: [_stext+0/48] stext [kernel] 0x0 (0xc162ffe8))
Mar  3 00:33:32 elex kernel: [<c0105000>] stext [kernel] 0x0 (0xc162ffe8))
Mar  3 00:33:32 elex kernel: [arch_kernel_thread+38/48] arch_kernel_thread [kernel] 0x26 (0xc162fff0))
Mar  3 00:33:32 elex kernel: [<c01070d6>] arch_kernel_thread [kernel] 0x26 (0xc162fff0))
Mar  3 00:33:32 elex kernel: [kswapd+0/1056] kswapd [kernel] 0x0 (0xc162fff8))
Mar  3 00:33:32 elex kernel: [<c0133470>] kswapd [kernel] 0x0 (0xc162fff8))
Mar  3 00:33:32 elex kernel:
Mar  3 00:33:32 elex kernel:
Mar  3 00:33:32 elex kernel: Code: 39 5a f0 8d 42 e8 76 f1 39 5a ec 89 c1 77 e2 85 c9 74 03 89
[end extract]

Any help or pointers would be appreciated.
thanks

Buddha_Joe

System Stall
« Reply #1 on: March 03, 2005, 04:35:51 AM »
Yikes!!

ummmm.. Ok I'm Just guessing here..

It's obviously having a problem reading something in memory LOL.. Having never delt with any thing like this and not knowing the specifics of any of what you pasted in It could either be physical mem or the swapping from the page file..

Prorbably the easiest thing to check first would be to swap out the physical mem and see if it crops up again. if it does Then I would consider replacing the HDD... If it resurfaces after that then it may be a Motherboard issue.. Now mind you I am guessing and I am leaning more towards the physical mem being the problem.

Have you noticed any data corruption on the disks.. (problems opeining files, unable to find files that you knew were there, file content garbled etc..)?


does anyone else have a better understanding or another idea?

System Stall
« Reply #2 on: March 05, 2005, 12:49:36 PM »
Thanks for the pointers Buddha. Will get some testing done and hopefully track down the problem. Am starting to think a Motherboard or NIC issue.

Buddha_Joe

System Stall
« Reply #3 on: March 05, 2005, 08:06:32 PM »
Keep us posted.. I'm interested to see the outcome,

System Stall
« Reply #4 on: May 24, 2005, 05:29:39 PM »
For anyone that may be interested .....

This server passed out of my control soon after the original post but had some contact with subsequent support. Seems the system had a major crash with an inexperienced user turning off the server half way through a file system check due to the number of errors that were appearing.

Turned out that the physical memory was faulty and caused all the problems.

Hope this helps someone out there.