git @ Cat's Eye Technologies ellsync / master README.md
master

Tree @master (Download .tar.gz)

README.md @masterview rendered · raw · history · blame

`ellsync`
=========

<!--
Copyright (c) 2024 Chris Pressey, Cat's Eye Technologies
This file is distributed under an MIT license.  See LICENSES directory.
SPDX-License-Identifier: LicenseRef-MIT-X-ellsync
-->

_Version 0.6_
| _Entry_ [@ catseye.tc](https://catseye.tc/node/ellsync)
| _See also:_ [yastasoti](https://codeberg.org/catseye/yastasoti#yastasoti)
∘ [tagfarm](https://codeberg.org/catseye/tagfarm#tagfarm)
∘ [shelf](https://codeberg.org/catseye/shelf#shelf)

- - - -

<img align="right" src="images/ellsync-logo.png?raw=true" />

**`ellsync`** is an opinionated poka-yoke for [`rsync`][].

*   [opinionated][]: it was designed for a particular use case for `rsync`
    (offline backups).
*   [poka-yoke][]: it exposes a restricted interface to `rsync`, which
    prevents using it in dangerous ways.

Because the restricted interface that `ellsync` presents can be accessed
by shorthand form, it also happens to provide some convenience over
using `rsync` directly — but its real purpose is to increase safety.
(I've been burned more than once when I've made a mistake using `rsync`.)

Quick start
-----------

Make sure you have Python (2.7 or 3.x) installed, clone this repository,
and put its `bin` directory on your executable search path.  You will
then be able to run `ellsync` from your terminal.

Usage guide
-----------

### Backup router

`ellsync`'s operation is based on a *backup router* which is a JSON file
that looks like this:

    {
        "art": {
            "from": "/media/user/External1/art/",
            "to": "/home/user/art/"
        }
    }

In this, `art` is the name of a _backup stream_, in which files in
`/media/user/External1/art/` (called the *canonical*) are periodically
synced to `/home/user/art/` (called the *cache*).

The idea is that all changes to the contents of the canonical directory
are bona fide changes, but any change to the contents of the cache can be
discarded.

### `sync` command

With the above router saved as `router.json` we can then say

    ellsync router.json sync art

and this will in effect run

    rsync --archive --verbose --delete --dry-run /home/user/art/ /media/user/External1/art/

Note that by default it only runs a `--dry-run`.  It's a good practice to
do a dry run first, to see what will be changed.  As a bonus, the files
involved will often remain in the filesystem cache, meaning a subsequent
actual run will go quite quickly.  To do that actual run, use `--apply`:

    ellsync router.json sync art --apply

Note that, since the contents of the canonical and the cache normally
have the same directory structure, `ellsync` allows specifying that
only a subdirectory of a stream is to be synced:

    ellsync router.json sync art:painting/oil/ --apply

While `rsync` is sensitive about whether a directory name ends in a slash or
not,  `ellsync` detects when a trailing slash is missing and adds it.  Thus

    ellsync router.json sync art:painting/oil --apply

will work as well as the above.  (But note that the directories specified
in the router *do* need to have the trailing slashes.)

#### `--thorough` option

By default, `rsync` does not attempt to sync the contents of an existing file
if the destination file has a same-or-newer timestamp as the source file.

However, this means that if the destination file has become corrupted (a not-
uncommon occurrence on inexpensive removable media), `rsync` will not attempt
to repair the corruption, as the timestamp of the corrupted file did not change.

To compensate for this, `ellsync` provides the `--thorough` option:

    ellsync router.json sync art:painting/oil --thorough

This invokes `rsync` with the `--checksum` flag, to force it to do a thorough
check of the files.  See `man rsync` for more details.

#### `--reverse` option

This flag causes the meaning of the `from` and the `to` of the selected
route to be flipped.  However, to prevent this from being done accidentally
(which could be catastrophic, file-backup-wise), this action must be confirmed
by the user, and this confirmation takes the form of requiring that a file
called `.reverse-to-here` must be present in the (non-usual) destination
directory before the reverse sync takes place.  In practice, the user runs
`touch destination/.reverse-to-here` before running the reverse sync, and
the reverse sync deletes the `.reverse-to-here` file as part of its operation.

### `verify` command

This provides a more robust and flexible option than the `--thorough` option
to `sync` for verifying that the contents on both sides of the stream are
identical.

    ellsync router.json verify art:painting

This creates a list of all files under `/media/user/External1/art/painting/`
in a text file in your temporary directory.  It then goes through each file in
that list and runs `diff` on it, comparing it to its corresponding file in
`/home/user/art/painting/`.  If there are any differences, the name of the
file is printed out.

The reason the filenames are written to a text file is that for huge filesets,
or on slow media (or both), verification of all files can take a long, long
time.  The user may wish to interrupt it before it is done, and continue it
later.  Persisting the list of files to a text file supports this.  If the
text file already exists, the user can supply the `--continue-from` option
naming a file to continue the verification process from.  All files up until
that file in the existing text file will be skipped.

### `list` command

Either the canonical or the cache (or both) may be offline storage (removable
media), therefore neither directory is assumed to exist (it might not exist
if the volume is not mounted.)  If either of the directories does not exist,
`ellsync` will refuse to use this backup stream.  Based on this, there is a
subcommand to list which streams are, at the moment, backupable:

    ellsync router.json list

In practice, this command is often the command you run first, before running
`sync`, as it will let you know what backup streams are available to sync.

### `rename` command

Sometimes you want to rename a subdirectory somewhere under the canonical of
one of the streams.  It's completely fine to do this, but the next time it is synced,
`rsync` will treat it, in the cache, as the old subdirectory being deleted and
a new subdirectory being created.  If there are a large number of files in the
subdirectory, this delete-and-create sync can take a long time.  It's also not
obvious from `rsync`'s logging output that everything being deleted is also being
created somewhere else.

To ease this situation, `ellsync` has a `rename` command that works like so:

    ellsync router.json rename art: sclupture sculpture

This renames the `/media/user/External1/art/sclupture` directory to
`/media/user/External1/art/sculpture` and *also* renames the `/home/user/art/sclupture`
directory to `/home/user/art/sculpture`.  If the contents of the source and
destination directories were in sync before this rename occurred, they will
continue to be in sync after the rename happens.

Hints and Tips
--------------

You might have a router you use almost always, in which case you might
want to establish an alias like

    alias myellsync ellsync $HOME/my-standard-router.json

(or whatever.)

TODO
----

*   If `rsync` encounters an error, it will abort, having only partially completed.
    In particular, if it encounters a directory which it cannot read, because it
    is for example owned by another user and not world-readable, it will abort.
    `ellsync` does not currently detect this properly.  It should be made to handle
    it gracefully, if possible.
*   (Aspirational) Ability to convert the backup router to a `dot` file (`graphviz`)
    so that the relationships between the streams can be easily visualized.

History
-------

### 0.6

Added the `verify` subcommand.

Added the `--reverse` mode of operation to the `sync` subcommand.

### 0.5

The output of the `list` subcommand is now sorted by stream name.

The `sync` subcommand now supports multiple streams.  Each stream will be synced
in the order they are given on the command line.  OS-level `sync` will only be
performed once, at the very end.

A bash tab-completion script is included in the `script` directory.  It enables
tab-completion of both subcommand names, and stream names in the `sync` subcommand.

Internally, shell expansion is no longer used when executing system commands, and
several new tests have been added to the test suite.

### 0.4

The `:` in a backup stream identifier is optional, when no subdirectory is being
specified.

### 0.3

Argument parser was refactored to use subparsers, improving usage info and usage
error output.

Removed `syncdirs` as it introduces some redundancy and I never use it.

After `sync` is performed, the system `sync` command is run, to ensure all buffers
are flushed to devices before the `ellsync` tool actually exits.

The `--thorough` options now invokes `rsync` with `--checksum` flag, to cause it
to thoroughly check if files differ, even if their datestamps have not changed.

Added `--stream-name-only` option to `list` command.

### 0.2

Every `ellsync` functionality has an explicit subcommand (`list` and `sync` to
start.)

`sync` was split into `sync` (takes a stream) and `syncdirs` (takes to and
from dirs).

Added `rename` command.

### 0.1

Initial release.

[`rsync`]: https://rsync.samba.org/
[opinionated]: https://softwareengineering.stackexchange.com/questions/12182/what-does-opinionated-software-really-mean
[poka-yoke]: https://en.wikipedia.org/wiki/Poka-yoke