Prerequisites

  1. A system on the network with sufficient free storage
  2. openssh-server and rsync installed on the remote system
  3. A user on the remote system, say remoteuser, that can do key-based passwordless login via SSH and has sudo access
  4. remoteuser configured in /etc/sudoers for passwordless sudo access for rsync (add remoteuser ALL=NOPASSWD:/usr/bin/rsync)
  5. Appropriate firewall ports are open for SSH and rsync
- lineinfile:
    path: /etc/sudoers
    state: present
    regexp: '^remoteuser ALL=NOPASSWD:/usr/bin/rsync'
    line: 'remoteuser ALL=NOPASSWD:/usr/bin/rsync'
    validate: '/usr/sbin/visudo -cf %s'
  tags:
    - rsync
ansible task

Create a systemd service unit

[Unit]
Description=Data backup
Requires=network.target
After=network.target

[Service]
Type=oneshot
Nice=19
StandardOutput=journal
IOSchedulingClass=best-effort
IOSchedulingPriority=5
ExecStart=/usr/bin/rsync \
      --rsync-path="sudo rsync" \
      --archive \
      --prune-empty-dirs \
      --compress \
      --update \
      --quiet \
      --acls \
      --xattrs \
      --progress \
      --human-readable \
      --exclude node_modules \
      --exclude .DS_Store \
      --exclude '*cache*' \
      --exclude tmp \
      /path/to/local/source/data \
      remoteuser@network-1.local:/path/to/remote/archive/data

[Install]
WantedBy=multi-user.target
/etc/systemd/system/data-backup.service

In the above script, change:

  • /path/to/local/source/data to point to the source data directory.
  • remoteuser@network-1.local the system to access via SSH
  • /path/to/remote/archive/data destination directory on the remote system

You can copy the final rsync command and run it in a shell with the --dry-run switch (and remove --quiet) to ensure it works as intended.

Create a systemd timer

Set up a systemd timer to run the backup task daily. There are many options on how to set the frequency and nature of repetition (e.g. OnBootSec=15min and OnUnitActiveSec=15min options under [Timer] to run every 15 mins in a non-overlapping fashion)

[Unit]
Description=Data backup timer
Requires=data-backup.service

[Timer]
OnCalendar=daily
Unit=data-backup.service

[Install]
WantedBy=timers.target
/etc/systemd/system/data-backup.timer

As root, enable and run both systemd units:

systemctl daemon-reload

systemctl enable data-backup.service
systemctl enable data-backup.timer

systemctl start data-backup.service
systemctl start data-backup.timer

# Ensure all is well
journalctl -f -u iridium-data-backup.service

Background

My system has a paltry 250 GB internal disk that keeps running out of space. This gets complicated when I build large, complex projects from source.

I don't like the idea of upgrading to a larger SSD because it makes full disk backups slower, harder. I (stubbornly) believe all programming-related data (code, not training data sets) that is useful and worth long term storage should realistically fit in ~100GB. That makes incremental backups easier. Everything else is transient stuff: node_modules, build objects and intermediate artifacts, docker images/volumes, npm/pip/gradle/mvn package cache, etc.

SSD's also have an inherent life, which means the idea of having a 1TB SSD just not wake up one day is a scary thought. A network backup is the least I can do.