Koozali.org formerly Contribs.org

MDADM Monitoring Reports Rebuild Events Detected and SME Server Slows to a Crawl

I have an SME 8 server with two SATA drives which were automatically configured as RAID1 during initial install.  The system supports two fr3equent email users and a couple of others who log in occasionally.

For the second Sunday in a row, I get emails from mdadm monitoring To: admin_raidreport. I have received no other messages indicating disk errors.

Here are the messages I received today (in order received).

A RebuildStarted event has been detected on md device /dev/md1.
A Rebuild60 event has been detected on md device /dev/md1.
A RebuildStarted event has been detected on md device /dev/md2.
A RebuildFinished event has been detected on md device /dev/md1.
A Rebuild20 event has been detected on md device /dev/md2.  (5 hours after start)
A Rebuild40 event has been detected on md device /dev/md2. (10+ hours after start)

At about when the Rebuild20 message was received, 5 hours after the RebuildStart message, incoming emails stopped, and trying to access the server-manager through a browser and log in with SSH become essentially non-responsive. Maybe one out of 10 tries on either will provide a login prompt but won't display anything else. As a result, I cannot look at the mdadm log at the moment. Incoming emails are also delayed 30 minutes to over an hour.

I found no open bugs, although I read the closed bugs referring to mdadm running every Sunday. I have been getting messages about mdadm every Sunday for months. It is only last week and this week that SME Server has sent the sequence of messages and slowed down. Last week about the time of a Rebuild80 email, system response returned to normal.

I also tried searching the forums, but at least today search of the forums opens a new page that says:
SME Server:GoogleSearch

Loading

and the rest of the page is blank.

Since updating my SME 8 server on 25th Jan (160 or so updates, incl Linux version 2.6.18-348.1.1.el5), I received similar rebuild messages yesterday. I had previously only received the weekly 04:22/04:23 am spurious rebuild messages but yesterday I received the spurious ones followed by actual md1 & md2 rebuilds which completed after 2 1/2 hours.

Offline Stefano

  • *
  • 10,805
  • Skype account: maghissimo
    • Smeserver italian community
quack and engdev: please raise a bug in bugzilla, thank you..

please remember to report here the reference, thank you
Consulente di Smeserver.it -  Soluzioni e supporto su Sme server in Italia

OK Stefano.
Quack, I will assume you'll raise the bug as you initiated the topic, is that ok?

Bug reported entitled "mdadm monitoring initiates rebuild on two consecutive Sundays" referencing this forum thread. Thanks endgev and Stefano for answering my post.

Offline Stefano

  • *
  • 10,805
  • Skype account: maghissimo
    • Smeserver italian community
Consulente di Smeserver.it -  Soluzioni e supporto su Sme server in Italia

Also using SME server 8, all updates.

1) I've received the MDADM report on Sundays - normally the item finishes within a minute - I've assumed this meant nothing was wrong and it could be considered a test.

2) However, lately,  I was receiving this report and then getting a long rebuild with messages at 20, 40, 60, 80 and finished - replacing with new Western digital RE4  Enterprise drives jumpered at 1.5 gb took me back to the  quick report with no rebuild.

I used Western Digitals test routine on a Windows computer after formatting the old Western Digital RE4 Enterprise drives to NTFS and found no errors, consequently 2 weeks ago I ran with 1 of the old drives as drive 2 and had only the quick report as per 1. Last week I exchanged the drive 2 again to the other old drive and had the issue this Sunday. I'm thus assuming this 1 drive which has 2 more years warranty is at fault and am going to contact Western Digital.

It would be nice to know the command line to run the routine manually that is invoked Sunday to see if switching the drive has fixed the issue, without having to wait until next Sunday.


HP Proliant ML110 G7 with 4 gigs HP ECC RAM.        Dual Western Digital RE4 Enterprise drives, drives are 5 year warranted.