Tree @master (Download .tar.gz)
Space Madness
Space Madness is a backup system. It is evolving. It is my attempt to describe the backup system that I try to use. This repository contains those notes, and some tools that may or may not help in impementing this backup system.
The main tool used here is rsync
. If it is not installed as a standard part
of your operating system, install it first.
Depositories
The main unit of organizing files in Space Madness is the depository, which
is a directory tree. The subdirectories in a depository are called subjects.
Subjects are hierarchical (for example, you might have art
and
art/paintings
and art/paintings/watercolours
).
A subject is expected to have the same content in all the different depositories in which it appears. (This is not 100% true, but it's useful to think of them this way.)
There are three kinds of depositories:
-
canonical depository
This contains the canonical copy of the files it contains. You might have more than one canonical depository, but their contents do not overlap; e.g. one canonical depository for text files, another canonical depository for movies.
Canonical depositories can be stored on removable media, e.g. a USB stick or external hard drive. This is convenient if the files are used on more than one computer, e.g. your laptop and your desktop.
-
cache depository
This is a redundant copy of a canonical depository or part of a canonical depository. Each canonical depository should have at least one cache depository containing an entire copy (that's what makes this a backup system.) But it might also have cache depositories for convenience of access. For example, I might want to keep a copy of my software documentation on my netbook.
Files saved to a cache depository are in jeopardy; they are apt to be overwritten or deleted when the cache is updated. So don't do that.
-
incoming depository
This contains files which have not yet been put into a canonical depository (perhaps because the filesystem on which the canonical depository is stored is not available at the moment) but which are intended to be put there soon.
With these definitions, we can say that a canonical depository is backed up if and only if every subject in it exists in at least one cache depository which resides on a different physical medium than the canonical depository.
Note that two partitions on the same hard disk are not different physical media!
The goal of the backup system is that every interesting file is in a canonical depository, and that every canonical depository is backed up.
To accomplish this goal, there are two important actions on depositories.
Update a cache depository from a canonical depository
To update a cache depository from a canonical depository,
rsync --archive --verbose --delete $CANONICAL/subject/ $CACHE/subject/
To check if a cache depository is up-to-date with a canonical depository,
rsync --archive --verbose --delete --dry-run $CANONICAL/subject/ $CACHE/subject/
We might provide a lightweight wrapper for those.
Add the contents of an incoming depository to a canonical depository
To update a canonical depository from an incoming depository,
deposit $CANONICAL $INCOMING
In fact this is virtually the same as saying
rsync --archive --verbose $INCOMING/ $CANONICAL/
but incorporates some consistency checking, i.e. that you don't trample
something in the canonical depository with something in the incoming with the
same name. (Also, rsync
is pretty picky about directory names, and if
you forget the trailing slash, bad things can happen. So it helps there too.)
The deposit
tool takes an additional flag, --clean
, which deletes all the
files (but not the directories) in the incoming depository after they are
copied over. However, as a safety check, it will not function unless the files
have already been copied into the canonical depository, and are identical.
It is an excellent idea to update a cache depository from the canonical
depository after running deposit
(and before running deposit --clean
.)
Advanced Topics in Backup Subjects
Because subjects are hierarchical, it is also possible to think of (and treat) a subject as a sort of sub-depository.
It is up to you to decide if you have any backup use cases that are complex enough to justify doing that.
It is also useful to keep certain backups in version-controlled repositories,
e.g. git repos. In this case, a repo directory can usually be treated as
a subject. Instead of depositing files with deposit
, one can push changes
to the repo-subject residing in the canonical depository.
Read-only Depositories
Need to research this, but the basic idea would be to chmod -R
a
cache depository so that you don't accidentally change files in it
or add files to it. (But then changing the permissions back when you
need to update it from a canonical depository.)
Additional tools
It is unlikely that you will always use a backup system perfectly, and even less likely that you started backing up your files with a perfect backup system. It is in fact likely that at some point you just copied important files from one place to another to ensure that you had a backup copy.
Thus, Space Madness includes some tools to assist with cleaning up backups and making them meaningful.
They are:
-
find-unique
-
find-dups
We should eventually document them here.
Commit History
@master
git clone https://git.catseye.tc/Space-Madness/
- Add --dry-run option to `deposit` utility. Chris Pressey 7 years ago
- Define 'backed up' for depositories, and talk a bit about repos. Chris Pressey 7 years ago
- Add headings and more notes to README and reformat to 80 columns. Chris Pressey 7 years ago
- Explain things more fully in the README. Chris Pressey 7 years ago
- Initial import of tools, and rudimentary documentation in README. Chris Pressey 7 years ago
- Initial commit Chris Pressey 7 years ago