Go Live! ….. Go Dead!

We took our new site live at work yesterday. The new site is a ton nicer than the old site. I was all happy that the new site was up and running until I went to check it at about midnite to find that it was down.

After some investigation I realized my worst nightmare had happend: the FILES were missing. Not a down DB server, not a runaway Apache thread, but actual data loss. This is any sysadmin’s worst nightmare, especially when all of your files sit on a central file system. We run nightly tape backups, but you can’t ever be sure of their integrity until you get onsite.

Upon further investigation this morning, after having turned off every server in a panic last night, we found that our RAID array went nuts. We rebooted the array, checked the drives, and rebooted the NFS server. All seems to be well now. The very first thing I did? Back up to tape.

