Simple Differential and Incremental Backups Using 7-zip

    
2020-04-05

Quick, cheap and easy (cheap as in “free”, “free” as in “freedom”) alternative to both heavy backup solutions and lightweight git-repo based backups with much less overhead and better compression: 7-zip.

7-zip’s -u switch provides fine-grained control over creating and updating archives based on states of already-archived and to-be-archived files. The switch is specified as a combination of state-action flags (more info):

StateState conditionFile on DiskFile in Archive
pFile exists in archive, but is not matched with wildcard?Exists, but is not matched
qFile exists in archive, but doesn’t exist on diskDoesn’t existExists
rFile doesn’t exist in archive, but exists on diskExistsDoesn’t exist
xFile in archive is newer than the file on diskOlderNewer
yFile in archive is older than the file on diskNewerOlder
zFile in archive is same as the file on diskSameSame
wCan not be detected what file is newer (times are the same, sizes are different)??
ActionDescription
0Ignore file (don’t create item in new archive for this file)
1Copy file (copy from old archive to new)
2Compress (compress file from disk to new archive)
3Create Anti-item (item that will delete file or directory during extracting)

Full combination of the above states-actions covers every possible backup scenario without any extra file comparison logic.

Examples:

  • update existing full backup of $HOME/* directory:
    • 7z u full_backup.7z $HOME/* -up0q0r2x2y2z1w2
      • p0 - ignore files not matched by wildcard (irrelevant in case of $HOME/* wildcard)
      • q0 - ignore removed files
      • r2 - if new file was created, compress it
      • x2, y2 - if file is newer or older, compress it
      • z1 - if file is the same, copy it without compression (this flag significantly reduces compression time)
      • w2 - if in doubt, compress the file
  • create a differential backup of all the files changed since the last full backup was created:
    • 7z u full_backup.7z -u- -"up0q3r2x2y2z0w2!{differential_backup.7z}" $HOME/*
      • -u- - “dash” parameter disables updates in the base archive full_backup.7z
      • q0 - if file was removed, “remember” the removal by creating an “anti-item”
      • z0 - if file is the same, skip it since backup is differential
      • the rest of the flags are the same

Incremental backups can be achieved by creating “decremental” backups along the way with a rolling up-to-date full backup (order matters!) in two steps:

  • create a “decremental backup” between the existing state and the previous full backup, a.k.a. “incremental backup” between previous and current states:
    • 7z u full_backup.7z $HOME/* -u- -up1q1r3x1y1z0w1!{incremental_backup.7z}
  • update a full backup to keep track of the existing state:
    • 7z u full_backup.7z $HOME/* -up0q0r2x2y2z1w2

Using this approach, files can be rolled back to the state of any incremental backup by simply extracting all the backups in reverse chronological order, e.g. files can be rolled back to “three backups back” in four steps:

  • 7z x -y full_backup.7z -o$HOME
  • 7z x -y incremental_backup_2020_04_05.7z -o$HOME
  • 7z x -y incremental_backup_2020_04_04.7z -o$HOME
  • 7z x -y incremental_backup_2020_04_03.7z -o$HOME

Thanks to 7-zip’s open file format, you can easily peek inside of any incremental/differential backup. Combined with a strong encryption and incredible compression, this makes 7-zip my go-to choice for all of my backups.

AFAIK the only way to support development of 7-zip is to use developer’s referral link to digital ocean, so please do so if you can :) https://m.do.co/c/cab893b82fa8

Testing

Here’s a little MWE to test incremental backups.

Prepare a test folder (echo is used instead of touch so the size of files can be changed and tracked):

cd /tmp
mkdir test
echo 'test' >test/1
echo 'test' >test/2
mkdir test/3
echo 'test' >test/3/4
ls -ld $(find test)

Get the expected list of 5 byte long files:

drwxr-xr-x 3 nagimov nagimov 4096 Apr  5 21:15 test
-rw-r--r-- 1 nagimov nagimov    5 Apr  5 21:15 test/1
-rw-r--r-- 1 nagimov nagimov    5 Apr  5 21:15 test/2
drwxr-xr-x 2 nagimov nagimov 4096 Apr  5 21:15 test/3
-rw-r--r-- 1 nagimov nagimov    5 Apr  5 21:15 test/3/4

Create initial full backup and list its files:

7z a test.7z test/*
7z l test.7z

Note the timestamps and file sizes:

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2020-04-05 21:15:06 D....            0            0  test/3
2020-04-05 21:15:06 ....A            5           19  test/1
2020-04-05 21:15:06 ....A            5               test/2
2020-04-05 21:15:06 ....A            5               test/3/4
------------------- ----- ------------ ------------  ------------------------
2020-04-05 21:15:06                 15           19  3 files, 1 folders

Increase the size of file 2 and create new folder 5 with file 6:

echo 'testtest' >test/2
mkdir test/5
echo 'testtest' >test/5/6
ls -ld $(find test)

Get an expected output - file 2 is larger, new file 5/6 is appeared:

drwxr-xr-x 4 nagimov nagimov 4096 Apr  5 21:15 test
-rw-r--r-- 1 nagimov nagimov    5 Apr  5 21:15 test/1
-rw-r--r-- 1 nagimov nagimov    9 Apr  5 21:15 test/2
drwxr-xr-x 2 nagimov nagimov 4096 Apr  5 21:15 test/3
-rw-r--r-- 1 nagimov nagimov    5 Apr  5 21:15 test/3/4
drwxr-xr-x 2 nagimov nagimov 4096 Apr  5 21:15 test/5
-rw-r--r-- 1 nagimov nagimov    9 Apr  5 21:15 test/5/6

Create an incremental backup:

7z u test.7z -u- -up1q1r3x1y1z0w1'!'test_inc1.7z test/*
7z l test_inc1.7z

Expectedly, there are only pre-modified versions of modified files in the incremental backup - file 2 is still tiny and file 5/6 is an “anti-item” of newly created file (note that its size is 0):

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....            0            0  test/5/6
                    D....            0            0  test/5
2020-04-05 21:15:06 ....A            5            9  test/2
------------------- ----- ------------ ------------  ------------------------
2020-04-05 21:15:06                  5            9  2 files, 1 folders

Update full backup:

7z u test.7z test/* -up0q0r2x2y2z1w2
7z l test.7z

It now contains current versions of all the files - larger 2 and 5/6:

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2020-04-05 21:15:06 D....            0            0  test/3
2020-04-05 21:15:31 D....            0            0  test/5
2020-04-05 21:15:06 ....A            5           14  test/1
2020-04-05 21:15:06 ....A            5               test/3/4
2020-04-05 21:15:31 ....A            9           20  test/2
2020-04-05 21:15:31 ....A            9               test/5/6
------------------- ----- ------------ ------------  ------------------------
2020-04-05 21:15:31                 28           34  4 files, 2 folders

Make 2 even larger and remove 1:

echo 'testtesttest' >test/2
rm test/1
ls -ld $(find test)

Now 2 is the largest and 1 isn’t present anymore:

drwxr-xr-x 4 nagimov nagimov 4096 Apr  5 21:16 test
-rw-r--r-- 1 nagimov nagimov   13 Apr  5 21:16 test/2
drwxr-xr-x 2 nagimov nagimov 4096 Apr  5 21:15 test/3
-rw-r--r-- 1 nagimov nagimov    5 Apr  5 21:15 test/3/4
drwxr-xr-x 2 nagimov nagimov 4096 Apr  5 21:15 test/5
-rw-r--r-- 1 nagimov nagimov    9 Apr  5 21:15 test/5/6

Another step of incremental backup:

7z u test.7z -u- -up1q1r3x1y1z0w1'!'test_inc2.7z test/*
7z l test_inc2.7z

Pre-modified versions of 1 and 2 are archived - 1 in its original size and 2 in its intermediate size:

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2020-04-05 21:15:06 ....A            5            9  test/1
2020-04-05 21:15:31 ....A            9           13  test/2
------------------- ----- ------------ ------------  ------------------------
2020-04-05 21:15:31                 14           22  2 files

Another update of the full backup:

7z u test.7z test/* -up0q0r2x2y2z1w2
7z l test.7z

Full archive is now up to date:

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2020-04-05 21:15:06 D....            0            0  test/3
2020-04-05 21:15:31 D....            0            0  test/5
2020-04-05 21:15:06 ....A            5            9  test/3/4
2020-04-05 21:15:31 ....A            9           13  test/5/6
2020-04-05 21:16:00 ....A           13           17  test/2
------------------- ----- ------------ ------------  ------------------------
2020-04-05 21:16:00                 27           39  3 files, 2 folders

Now it’s time to roll back through every state of the test folder:

mkdir unzip
7z x -y test.7z -ounzip
ls -ld $(find unzip)

Latest state with absent 1, beefy 13 bytes long 2 and present 6:

drwxr-xr-x 3 nagimov nagimov 4096 Apr  5 21:16 unzip
drwx------ 4 nagimov nagimov 4096 Apr  5 21:16 unzip/test
-rw-r--r-- 1 nagimov nagimov   13 Apr  5 21:16 unzip/test/2
drwxr-xr-x 2 nagimov nagimov 4096 Apr  5 21:15 unzip/test/3
-rw-r--r-- 1 nagimov nagimov    5 Apr  5 21:15 unzip/test/3/4
drwxr-xr-x 2 nagimov nagimov 4096 Apr  5 21:15 unzip/test/5
-rw-r--r-- 1 nagimov nagimov    9 Apr  5 21:15 unzip/test/5/6

Going back to state inc2:

7z x -y test_inc2.7z -ounzip
ls -ld $(find unzip)

Getting 1 undeleted and 2 thinned down to 9 bytes:

drwxr-xr-x 3 nagimov nagimov 4096 Apr  5 21:16 unzip
drwx------ 4 nagimov nagimov 4096 Apr  5 21:16 unzip/test
-rw-r--r-- 1 nagimov nagimov    5 Apr  5 21:15 unzip/test/1
-rw-r--r-- 1 nagimov nagimov    9 Apr  5 21:15 unzip/test/2
drwxr-xr-x 2 nagimov nagimov 4096 Apr  5 21:15 unzip/test/3
-rw-r--r-- 1 nagimov nagimov    5 Apr  5 21:15 unzip/test/3/4
drwxr-xr-x 2 nagimov nagimov 4096 Apr  5 21:15 unzip/test/5
-rw-r--r-- 1 nagimov nagimov    9 Apr  5 21:15 unzip/test/5/6

Going back to initial state:

7z x -y test_inc1.7z -ounzip
ls -ld $(find unzip)

Getting 5/6 uncreated and 2 reduced to 5 bytes:

drwxr-xr-x 3 nagimov nagimov 4096 Apr  5 21:16 unzip
drwx------ 3 nagimov nagimov 4096 Apr  5 21:17 unzip/test
-rw-r--r-- 1 nagimov nagimov    5 Apr  5 21:15 unzip/test/1
-rw-r--r-- 1 nagimov nagimov    5 Apr  5 21:15 unzip/test/2
drwxr-xr-x 2 nagimov nagimov 4096 Apr  5 21:15 unzip/test/3
-rw-r--r-- 1 nagimov nagimov    5 Apr  5 21:15 unzip/test/3/4

Tagged #backup , #7-zip , #archiving
2020-04-05