When A Backup Goes Bad

I’ve always preached the importance of backups, but one of the most frequently overlooked things in a backup workflow is the need to regularly test your backups. Remember, backups are just as prone to failure as your primary data source. Some would say backups are more prone to failure because they tend to be used less and backup drives can hang round longer and are recycled from system to system while primary drives are replaced.

I just had a situation recently where I found corruption in one of my backup drives. In my case, things were a little scary because just the week prior, I used that very same drive to migrate data to my new MacBook Air, which now has me questioning the integrity of the data on my Air.

Here’s how my problem occurred: A couple of weeks ago I bought a new MacBook Air. Of course, before retiring my old (2010) Air I made several backups. One of those backups was a SuperDuper! clone to a Western Digital Portable hard drive which I connected to my new (2012) Air and transferred the user data over using Migration Assistant. Everything has been running fine for the past few weeks. I've been continuing with my normal backups. However, I intentionally held back that WD drive out of rotation for a while just incase I later found something I forgot to migrate.

Fast-forward to present and I decide I want to archive the data off the WD drive to a disk image on the Drobo for archival purposes so I can put the WD drive back into circulation. About 50% through the SuperDuper clone job the process failed with a bizarre “out of disk space” error. I checked and double checked and there was plenty of space on both the portable drive and my Drobo. Searching the Shirt-Pocket forums I found the developer explained this particular error was likely caused by underlying disk corruption and recommended running Disk Utility to repair the disk and try again. Sure enough, Disk Utility found corruption, but couldn’t repair it. Uh uo.

So far, I haven’t run into any problems other than on the backup drive itself and my primary drive tests clean. So why am I concerned?  If one more individual files are corrupted due to the bad disk, it may go undetected for months or even years until a specific file is needed. By that time, despite my redundant backups, I may have overwritten any remaining good copies of the data with corrupt versions.

I searched around the web for a utility for Mac that would test the integrity of individual files for corruption and I couldn’t come up with anything though there were plenty of utilities that would test the integrity of the drive. What I ended up doing was a multi-part process of manually copying sections of files from the corrupt drive and noting where failures occurred and then individually locating and removing those files. In the end, I identified 5 images in my iPhoto library that were damaged. Once removed, I was able to successfully complete the clone operation without error. I even ran the clone from scratch again a second time to verify it completed from start to finish. I then removed those same 5 files from my Air’s iPhoto library and re-imported the photos from known good backups.

For good measure, per the suggestions of a few twitter followers I ran an Antivirus scan with ClamXav. I didn’t suspect any malware or virus, but it was suggested that an antivirus scan would touch every file on my computer and may report an error if a file was corrupt and unable to be read. That came back clean. I also performed a fresh SuperDuper! clone to a new hard drive from my Air to determine whether any errors were encountered in the duplication process, they weren’t. I think I may have dodged a bullet.

This just goes to show that even the best backup methods can have weaknesses and that having a backup of something isn’t the same as having an archive. Although I do regularly test my backups, given this experience I’m going to look at putting some new safety measures into place. For starters:

  1. I’ve setup a repeating task to twice a month, run disk utility on all my backup drives. Every six months or before any major project I’m going to run a lower-level disk scan using a third party utility like Drive Genius.
  2. Every so often I’m going to erase my clone backups and re-start them fresh (though not at the same time!). While I normally use SuperDuper!’s “Smart Update” function, I think it’s good practice to get a fresh backup every now and again.
  3. Backup hard drives are going to be regularly replaced. The hard drive that gave me problems was an older drive I’ve been using for several years. With drives becoming less expensive, there’s no excuse. I typically tell people the lifespan of a drive is 3-5 years. Whether the drive shows problems or not, after the 3 year mark I’m going to be planning for replacement.
  4. I'm also going to be more diligent about archiving. This experience has taught me that a problem can quickly populate throughout redundant backups. Archives and backups are not the same. Time Machine is great for recovering old files, but realistically my Time Machine archive only goes back 6 months, if that. Documents, photos and movies especially need to be archived for safekeeping.