Sunday, September 3, 2023

Running Tarsnap

 This post is to document the procedure to run Tarsnap on MacOS. Tarsnap is basically an online service that runs on top of Amazon cloud infrastructure for online backup. If you are reading this than you already know much about it but needless to say it is a cool program.

First thing, we need to install it which can be easily accomplished using homebrew. 

brew install tarsnap

It can also be installed by compiling it from source but it is outside the scope of this post. 

Once installed, it is a good idea to run the dry-stat command to see how much space it will take and what is the compression ratio. It is supposed to de-duplicate the data and store the those unique bytes. 

For example, to see if we want to only upload pdf and .Rdata files from "Analysis" folder, we can run the following command. 

find /Users/xyz/Analysis -type f \( -name '*.pdf' -o -name '*.Rdata' \) -print0 | tarsnap --dry-run --no-default-config --print-stats --humanize-numbers -c --null -T-

Now this command is doing lot of things, first it is "finding" files with "-type f" and then finds only files  ending with Rdata and pdf extension. Notice the use of -o to indicate "or" operator within find command. If we have more than two file type extensions, then we need to use parenthesis to enclose all the files types. We are using "-print0" to separate the filenames using the null character so it won't fail with some weird characters in the filenames. This is then piped to the tarsnap program. The keywords here are --null to account for the passing files with "-print0" option. 

   --null  (use with -I, -T, or -X) Filenames or patterns are separated by

             null characters, not by newlines.  This is often used to read

             filenames output by the -print0 option to find(1).

     -T filename

             (c, x, and t modes only) In x or t mode, tarsnap will read the

             list of names to be extracted from filename.  In c mode, tarsnap

             will read names to be archived from filename.  The special name

             “-C” on a line by itself will cause the current directory to be

             changed to the directory specified on the following line.  Names

             are terminated by newlines unless --null is specified.  Note that

             --null also disables the special handling of lines containing

             “-C”.  If filename is “-” then the list of names will be read

             from the standard input.  Note:  If you are generating lists of

             files using find(1), you probably want to use -n as well.

so the "-" following the "-T" option allows to pass the name using std-in via find command. Once you run the command, we should see output like this. 

tarsnap: Removing leading '/' from member names

                                       Total size  Compressed size

All archives                               8.4 MB           3.4 MB

  (unique data)                            8.4 MB           3.4 MB

This archive                               8.4 MB           3.4 MB

New data                                   8.4 MB           3.4 MB

Now this is just the test. In order to run the Tarsnap, we need to register for an account and get ready for some configuration for which we need:

  1. tarsnap.conf
  2. tarsnap.key
  3. setting cache directory
If you installed tarsnap simply using homebrew than the location of tarsnap.conf will be in /opt/homebrew/etc so just copy that file to your home directory using 

cp /opt/homebrew/etc/tarsnap.conf.sample ~/tarsnap.conf

According to the documentation (,

If you would prefer to run Tarsnap as a normal user,

 cp /opt/homebrew/etc/tarsnap.conf.sample ~/.tarsnaprc

since we will be running it as normal user, we use this above alternative. Now for the "tarsnap.key", it is generated as part of the registration of the computer with tarsnap server. 

sudo tarsnap-keygen \

    --keyfile /Users/xyz/tarsnap.key \

    --user \

    --machine mybox

Make sure the key can be read by the user otherwise it won't work. 

     sudo chmod 0444 tarsnap.key

 Now let us set the cache directory

     sudo chmod 700 /Users/xyz/tarsnap_cache

Make sure to point your .tarsnaprc file to the location of the key and cachedir. If it all goes write, then use this command to actually run the real backup:

find /Users/xyz/Analysis  -type f \( -name '*R' -o -name "*.pdf" -o -name "*.Rdata" \)  -print0 | tarsnap  --print-stats  -c -f "analysis_back-$(date +%Y-%m-%d_%H-%M-%S)" --null -T-  

Notice we added $(date +%Y-%m-%d_%H-%M-%S) to note the date of the archive. Tarsnap won't allow us to back with the same archive name. It won't let us delete the archive unless explicitly told to do so. 

we can list the archives using this command:

tarsnap --list-archives

Now we can set up launchd to run that command every week or every day as needed. Hopefully, this will help someone who is looking to set tarsnap on MacOS. That someone will be mostly likely me. 

No comments:

Post a Comment

Comparing R and Python

 I have used R for quite some time for data analysis. Especially with the use of Tidyverse package, it has been a very decent experience. Gg...