git @ Cat's Eye Technologies Space-Madness / master
master

Tree @master (Download .tar.gz)

Space Madness

Space Madness is a backup system. It is evolving. It is my attempt to describe the backup system that I try to use. This repository contains those notes, and some tools that may or may not help in impementing this backup system.

The main tool used here is rsync. If it is not installed as a standard part of your operating system, install it first.

Depositories

The main unit of organizing files in Space Madness is the depository, which is a directory tree. The subdirectories in a depository are called subjects. Subjects are hierarchical (for example, you might have art and art/paintings and art/paintings/watercolours).

A subject is expected to have the same content in all the different depositories in which it appears. (This is not 100% true, but it's useful to think of them this way.)

There are three kinds of depositories:

  • canonical depository

    This contains the canonical copy of the files it contains. You might have more than one canonical depository, but their contents do not overlap; e.g. one canonical depository for text files, another canonical depository for movies.

    Canonical depositories can be stored on removable media, e.g. a USB stick or external hard drive. This is convenient if the files are used on more than one computer, e.g. your laptop and your desktop.

  • cache depository

    This is a redundant copy of a canonical depository or part of a canonical depository. Each canonical depository should have at least one cache depository containing an entire copy (that's what makes this a backup system.) But it might also have cache depositories for convenience of access. For example, I might want to keep a copy of my software documentation on my netbook.

    Files saved to a cache depository are in jeopardy; they are apt to be overwritten or deleted when the cache is updated. So don't do that.

  • incoming depository

    This contains files which have not yet been put into a canonical depository (perhaps because the filesystem on which the canonical depository is stored is not available at the moment) but which are intended to be put there soon.

With these definitions, we can say that a canonical depository is backed up if and only if every subject in it exists in at least one cache depository which resides on a different physical medium than the canonical depository.

Note that two partitions on the same hard disk are not different physical media!

The goal of the backup system is that every interesting file is in a canonical depository, and that every canonical depository is backed up.

To accomplish this goal, there are two important actions on depositories.

Update a cache depository from a canonical depository

To update a cache depository from a canonical depository,

rsync --archive --verbose --delete $CANONICAL/subject/ $CACHE/subject/

To check if a cache depository is up-to-date with a canonical depository,

rsync --archive --verbose --delete --dry-run $CANONICAL/subject/ $CACHE/subject/

We might provide a lightweight wrapper for those.

Add the contents of an incoming depository to a canonical depository

To update a canonical depository from an incoming depository,

deposit $CANONICAL $INCOMING

In fact this is virtually the same as saying

rsync --archive --verbose $INCOMING/ $CANONICAL/

but incorporates some consistency checking, i.e. that you don't trample something in the canonical depository with something in the incoming with the same name. (Also, rsync is pretty picky about directory names, and if you forget the trailing slash, bad things can happen. So it helps there too.)

The deposit tool takes an additional flag, --clean, which deletes all the files (but not the directories) in the incoming depository after they are copied over. However, as a safety check, it will not function unless the files have already been copied into the canonical depository, and are identical.

It is an excellent idea to update a cache depository from the canonical depository after running deposit (and before running deposit --clean.)

Advanced Topics in Backup Subjects

Because subjects are hierarchical, it is also possible to think of (and treat) a subject as a sort of sub-depository.

It is up to you to decide if you have any backup use cases that are complex enough to justify doing that.

It is also useful to keep certain backups in version-controlled repositories, e.g. git repos. In this case, a repo directory can usually be treated as a subject. Instead of depositing files with deposit, one can push changes to the repo-subject residing in the canonical depository.

Read-only Depositories

Need to research this, but the basic idea would be to chmod -R a cache depository so that you don't accidentally change files in it or add files to it. (But then changing the permissions back when you need to update it from a canonical depository.)

Additional tools

It is unlikely that you will always use a backup system perfectly, and even less likely that you started backing up your files with a perfect backup system. It is in fact likely that at some point you just copied important files from one place to another to ensure that you had a backup copy.

Thus, Space Madness includes some tools to assist with cleaning up backups and making them meaningful.

They are:

  • find-unique

  • find-dups

We should eventually document them here.