I finally got around to trying
Amazon S3 for data backup.
Just in case you've missed out on S3, it's a commercial network-based data storage service offered by Amazon. They claim to use the same technology for S3 that lies behind the Amazon stores.
S3 comes with no service level guarantees and there have been some reports of occasional unavailability and/or slow transfer. I'm not too worried; the things that are critical to me are data security and cost.
I'm using the J
etS3t java(tm) tool kit to manage the interaction with the S3 service. The
cockpit application gives a GUI view of what's in your S3 buckets; the
synchronize application gives a simple command line interface that allows you to keep data on S3 in step with data on your machine(s), and retrieve that data when necessary.
You need to be a little careful, as synchronization can delete data as well as add or update it. To mitigate this, there is a
--noaction option which allows you to see what
would happen to your data without actually changing anything.
Minor snagsI've hit a couple of minor snags so far; neither are caused by the S3 service itself, but you need to be aware of them.
The first arose when I tried to back up a large amount of data. I'm keen to keep a copy of my cvs repository off-site; it's well backed-up, but all the recent backups are in my home office. If we had a fire or a burglary I'd be in real trouble.
I realised soon after the upload started that it would take a while. The repository is about 2.7Gb and although I have fast broadband, my
upstream speed is only 256 kb/s. The upload took
25 hours!
Smart synchronizationSynchronize is smart enough not to upload files if they are unchanged. I figured that the next upload would take a few minutes at most. The second snag arose when I tried to check that.
Part way through the synchronize application bombed with an
out of memory error. I've now modified the script to give it 512M of heap space, and it runs just fine. And now a burglary would be inconvenient but not disastrous.