Koozali.org: home of the SME Server

Update regarding contribs.org outage (Jan 21 - Jan 23 2009)

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Update regarding contribs.org outage (Jan 21 - Jan 23 2009)
« on: January 23, 2009, 09:26:33 AM »
On 21st of January 2009 around 21.00 (MST) contribs.org suffered a second severe hardware failure. The disk array of our server farm crashed again. Unfortunately we where not done setting up the promised more robust setup, so contribs.org was unreachable for 2 days.

The past days we have worked vigorously to try and salvage the disks and their content and restore missing information from backup in order to get contribs.org back online again.

We will continue reengineering our servers and network architecture to prevent such undesired and long outage of contribs.org from happening again in the future.

Since contribs.org and SME Server are a community driven effort, sponsored by free time and volunteers we would like you to consider a small donation in order to raise the money needed to keep contribs.org stable and running problem free in the future. If you would like to donate you will find the needed information here (when logged in to the forums).

Last but not least we would like to thank you for your patience and hope you will continue to make SME Server such a success.

On behalf of the Contribs.org team
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: Update regarding contribs.org outage (Jan 21 - Jan 23 2009)
« Reply #1 on: January 23, 2009, 04:24:42 PM »
The first downtime of contribs was caused by the metadata of a 3.0TB array being corrupt. The time to bring the array back online was due to the time it took to work with the vendor to recover the array.

The second downtime was caused by a faulty raid controller that completely corrupted 800GB of the 3.0TB array.  The drives and controller were sent off for recovery and the controller is being replaced.

The vendors of both the drives and controller have been very cooperative.  As space and time were extremely limited we have only brought back the critical services needed to run contribs.

The buildsystem and repository management system are offline and will remain offline until the controller and drives have been fixed/replaced.  We apologize for the inconvenience these outages have caused.  Due to the fact that after the second downtime everything needed to be recovered from backup there is a possibility that things still aren't functioning as they should.  We ask for your cooperation and patience as we work through these.

If you notice anything not working as it should please raise a bug in the bugtracker or, if that does not work, email admin@contribs.org.

There is a possibility that some of the information entered between Jan 20 1:00am and Jan 20 7:37pm was lost.
« Last Edit: January 23, 2009, 04:27:52 PM by cactus »
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)