Thursday, February 23, 2006

Personal backup, shallow and deep

---------- Forwarded message ----------
Date: Mon, 6 Feb 2006 17:44:11 -0500 (EST)
From: Daniel Saunders <xxxx@qlink.queensu.ca>

I've been thinking about the personal backup system, and I wanted to put
forward this idea, of a simple two-level backup:

* Shallow backup. Purpose is to allow you to continue to work on your current
projects, and not lose recent work (including recently finished projects) and
correspondence etc. The window is about 2 years. This could be taken care of by
an automatic snapshot system such as Jim has the benefit of. This would ideally
be updated at least weekly.

* Deep backup. Conceptualized as a single backup system spanning your *entire
life as a computer user*. Across every computer system you have ever worked on
(which is now technically possible), work, school, home, etc. We would expect
to save in here things such as
- completed projects (including data that went into old papers)
- uncompleted projects, indefinitely suspended
- personal records
- long term personal reference material
- personal history (souvenirs, emails etc)
So this window could conceptually be 40 years wide. It could be updated on a
monthly, or term basis. It would also include everything that's in the shallow
backups.

Now here's the major principle: that there should only be *one* instance of
each, that is, one current shallow backup and one current deep backup, and each
of them must be *complete*. So that doesn't mean there couldn't be multiple
copies of your shallow and multiple of your deep - but unless they're throwaway
ones, all the copies should be *identical* - that is, they must be synced.

I think the completeness is particularly important - with some of my adhoc
backups I have left off things I believe to already be backed up somewhere
else. Especially given I've not been very good at tracking my backup CDs, that
puts you at very great risk of things falling through the cracks. Even if it
takes 20 DVDs, or buying external harddrives, there should be only one: you
always know where it is, you can take steps to protect it. And you don't have
to worry about any others.

The only reasons I can think of for violating this rule is if you have to
preserve a kind of very high volume data, or if you have to deal with a kind of
data that apparently can't be integrated with the rest (eg Apple II floppies -
I believe) But there's no problem with adding extra systems to deal with those
issues, as long as they're conceptualized to fit under your deep or shallow
backup. And of course this two-level system doesn't preclude adding more backup
systems, for instance one for your ripped mp3s. But I feel this is what I would
need to have peace of mind.

What do you think?

3 comments:

Anonymous said...

I'm pretty new to the whole backing up thing. What I do at the moment is I have an automated backup using rsync over ssh happen every couple of days to an account on my webserver.

There are probably more user friendly ways of doing this, but using rsync instead some sissy GUI program makes me feel 1337.

D said...

Does rsync allow you to select which directories to copy over, or do you just do your entire harddrive? Does it check to see which files have changed? Do you have to skip some larger files, like big movie files, and do you have a secondary backup system for those?

Anonymous said...

Yes to both questions. rsync uses the format
rsync -av [source folder] [destination folder]
(the -av switch means [a]rchive and [v]erbose)

The big thing with rsync, though, is that it not only checks which files have changed, but it only uploads the changed data. So if you have a 100k text file that you change one letter in, is only has to upload the one letter change, not the entire file.

I use this backup system only for my work files (currently ~3.5GB) which are primarily text files (HTML, PHP, etc) and images with some Photoshop and Illustrator files sprinkled in. My personal movie and TV show colection is stored on DVDs (probably over 1TB)