What went wrong part 2...

Please register or login

Welcome to ScubaBoard, the world's largest scuba diving community. Registration is not required to read the forums, but we encourage you to join. Joining has its benefits and enables you to participate in the discussions.

Benefits of registering include

  • Ability to post and comment on topics and discussions.
  • A Free photo gallery to share your dive photos with the world.
  • You can make this box go away

Joining is quick and easy. Log in or Register now!

Ted S

SB Co-Founder
ScubaBoard Supporter
Messages
4,548
Reaction score
7
Location
SF / Bay Area
Ok, now that things are back up you may be wondering “what went wrong and why wasn’t it fixed over the weekend”. To put it briefly (or not so briefly), a small hardware issue led to some serious database corruption which brought down the entire site over the weekend. Following a normal procedure we restored the data from a backup and turned the site back on but two days later it became severely corrupted and all the post data had to be erased. Of course when you have two failures in a week it becomes obvious that there are more problems then just a little error. To prevent problems down the line we did another restore using old post data from the 24th which was carefully and forcefully rebuilt to remove any possible issues.

So, where does this leave us? Well, the board is back up and running as you can see. All of the user information and pms were transferred from the moment the board went offline so that data should not be lost. As for posts, unfortunately we couldn’t use the corrupted data from this week so all posts from Sunday, Monday and Tuesday were lost. This means you will either need to repost messages or post a summary if there is a need to wrap up an old post. Obviously this is going to be inconvenient but such is life.

Can this happen again? I wish I could say that this is a 100% fix but truthfully, no one knows. Like I mentioned before, we have done a lot to remove any corruption but you really never know. Several backups exist at this stage and a new one will be made every night. As a long term precaution we have ordered another server to handle database files only, this machine will be added to our network within the next week giving us a 3-system redundancy. We will also be updating our backup process to keep more backups for a longer time and on multiple machines, just incase. Basically this problem has caused us to look at everything we can think of, to add hardware before it is needed and to build a detailed emergency procedure process should anything break again.

Obviously loosing 3 days of the board is annoying and hurts everyone in the community but we are back up and 100% committed to keeping the board growing and stable. Thank you for your understanding and enjoy!
 
Excellent Job Guys! Thanks for all the hard work restoring Scubaboard for us!
 
Hurray to TA and all the great staff at SB!
 
Three cheers for Tech Admin!

Hip! Hip! Hooray! :dazzler1:

Christian
 
I'm a secretary at a major university. I deal with databases all the time. I don't envy you the problems you've had. We had our entire e-mail go out here a couple of years ago. That was a nightmare!

Tech Admin, you've done a wonderful job!
 
Excellent job.

Scubaboard uses mySql for a backend, right? Does mySql have any capability for replication? Were the transaction logs also corrupted? Just curious.
 
Reefguy,
Mysql has limited replication abilities (it's getting good but it won't be to the level you're thinking of yet). With the addition of our new server we will be using replication of some sorts, not instant but nightly perhaps.
 
https://www.shearwater.com/products/perdix-ai/

Back
Top Bottom