This post is to document the procedure to run Tarsnap on MacOS. Tarsnap is basically an online service that runs on top of Amazon cloud infrastructure for online backup. If you are reading this than you already know much about it but needless to say it is a cool program.
First thing, we need to install it which can be easily accomplished using homebrew.
brew install tarsnap
It can also be installed by compiling it from source but it is outside the scope of this post.
Once installed, it is a good idea to run the dry-stat command to see how much space it will take and what is the compression ratio. It is supposed to de-duplicate the data and store the those unique bytes.
For example, to see if we want to only upload pdf and .Rdata files from "Analysis" folder, we can run the following command.
find /Users/xyz/Analysis -type f \( -name '*.pdf' -o -name '*.Rdata' \) -print0 | tarsnap --dry-run --no-default-config --print-stats --humanize-numbers -c --null -T-
Now this command is doing lot of things, first it is "finding" files with "-type f" and then finds only files ending with Rdata and pdf extension. Notice the use of -o to indicate "or" operator within find command. If we have more than two file type extensions, then we need to use parenthesis to enclose all the files types. We are using "-print0" to separate the filenames using the null character so it won't fail with some weird characters in the filenames. This is then piped to the tarsnap program. The keywords here are --null to account for the passing files with "-print0" option.
--null (use with -I, -T, or -X) Filenames or patterns are separated by
null characters, not by newlines. This is often used to read
filenames output by the -print0 option to find(1).
-T filename
(c, x, and t modes only) In x or t mode, tarsnap will read the
list of names to be extracted from filename. In c mode, tarsnap
will read names to be archived from filename. The special name
“-C” on a line by itself will cause the current directory to be
changed to the directory specified on the following line. Names
are terminated by newlines unless --null is specified. Note that
--null also disables the special handling of lines containing
“-C”. If filename is “-” then the list of names will be read
from the standard input. Note: If you are generating lists of
files using find(1), you probably want to use -n as well.
so the "-" following the "-T" option allows to pass the name using std-in via find command. Once you run the command, we should see output like this.
tarsnap: Removing leading '/' from member names
Total size Compressed size
All archives 8.4 MB 3.4 MB
(unique data) 8.4 MB 3.4 MB
This archive 8.4 MB 3.4 MB
New data 8.4 MB 3.4 MB
Now this is just the test. In order to run the Tarsnap, we need to register for an account and get ready for some configuration for which we need:
- tarsnap.conf
- tarsnap.key
- setting cache directory
According to the documentation (https://www.tarsnap.com/gettingstarted.html#configuration-file),
If you would prefer to run Tarsnap as a normal user,
cp /opt/homebrew/etc/tarsnap.conf.sample ~/.tarsnaprc
since we will be running it as normal user, we use this above alternative. Now for the "tarsnap.key", it is generated as part of the registration of the computer with tarsnap server.
sudo tarsnap-keygen \
--keyfile /Users/xyz/tarsnap.key \
--user me@example.com \
--machine mybox
Make sure the key can be read by the user otherwise it won't work.
sudo chmod 0444 tarsnap.key
Now let us set the cache directory
sudo chmod 700 /Users/xyz/tarsnap_cache
Make sure to point your .tarsnaprc file to the location of the key and cachedir. If it all goes write, then use this command to actually run the real backup:
find /Users/xyz/Analysis -type f \( -name '*R' -o -name "*.pdf" -o -name "*.Rdata" \) -print0 | tarsnap --print-stats -c -f "analysis_back-$(date +%Y-%m-%d_%H-%M-%S)" --null -T-
Notice we added $(date +%Y-%m-%d_%H-%M-%S) to note the date of the archive. Tarsnap won't allow us to back with the same archive name. It won't let us delete the archive unless explicitly told to do so.
we can list the archives using this command:
tarsnap --list-archives
Now we can set up launchd https://web.archive.org/web/20230627074009/https://www.launchd.info/ to run that command every week or every day as needed. Hopefully, this will help someone who is looking to set tarsnap on MacOS. That someone will be mostly likely me.