Post

Chapter 6 The Backup Plan

Chapter 6 of Linux Shell Scripting Cookbook — archiving, compression, and backup strategies using tar, rsync, gzip, and more

Chapter 6 The Backup Plan

Chapter Overview

Backups are not optional — they’re what separates a recoverable incident from a catastrophe. This chapter covers the full stack of Linux backup and archiving tools: tar, gzip, zip, rsync, cpio, pbzip2, and disk imaging with fsarchiver. Each solves a slightly different problem, and knowing when to use which one matters.


Archiving with tar

tar (tape archive) bundles files into a single archive. It doesn’t compress by default — it just packs. Compression is a separate step, though tar can do both at once.

Basic syntax

1
tar [options] [archive] [files/dirs]

Create an archive

1
2
3
tar -cvf archive.tar /path/to/dir        # create, verbose, file
tar -cf archive.tar file1 file2 dir/     # create without verbose
tar -cvf backup.tar /etc /home /var/log  # archive multiple targets
  • -c = create
  • -v = verbose (list files as they’re added)
  • -f = the next argument is the archive filename

Create and compress in one step

1
2
3
tar -czvf archive.tar.gz  /path/   # gzip  (.tar.gz or .tgz)
tar -cjvf archive.tar.bz2 /path/   # bzip2 (.tar.bz2) — smaller, slower
tar -cJvf archive.tar.xz  /path/   # xz    (.tar.xz)  — smallest, slowest

Extract

1
2
3
4
5
tar -xvf archive.tar                     # extract here
tar -xvf archive.tar -C /target/dir/     # extract to specific directory
tar -xzvf archive.tar.gz                 # extract gzip
tar -xjvf archive.tar.bz2               # extract bzip2
tar -xJvf archive.tar.xz                # extract xz

List contents without extracting

1
2
tar -tvf archive.tar                     # list all files
tar -tvf archive.tar.gz | grep ".conf"   # search inside archive

Extract specific files

1
2
tar -xvf archive.tar path/to/file.txt   # extract one file
tar -xvf archive.tar --wildcards "*.conf"  # extract by pattern

Exclude files or directories

1
2
3
4
5
tar -czvf backup.tar.gz /home/ --exclude="/home/omar/.cache"
tar -czvf backup.tar.gz /var/ \
  --exclude="*.log" \
  --exclude="*.tmp" \
  --exclude="/var/cache"

Incremental backup with tar

1
2
3
4
5
6
7
8
9
# Full backup (snapshot file records state)
tar -czvf full_backup.tar.gz \
  --listed-incremental=snapshot.file \
  /home/

# Incremental — only changed files since last run
tar -czvf incremental_backup.tar.gz \
  --listed-incremental=snapshot.file \
  /home/

Append to an existing archive

1
tar -rvf archive.tar newfile.txt        # append (only works on uncompressed .tar)

Verify an archive

1
tar -tvf archive.tar.gz > /dev/null && echo "OK" || echo "CORRUPT"

Archiving with cpio

cpio is older than tar and less common now, but it’s still used in Linux initial ramdisks (initramfs) and some backup workflows. It reads a list of files from stdin.

Create an archive

1
find /path -type f | cpio -ov > archive.cpio
  • -o = output (create)
  • -v = verbose

Extract an archive

1
2
cpio -idv < archive.cpio                 # extract in current directory
cpio -idv --no-absolute-filenames < archive.cpio  # strip leading /
  • -i = input (extract)
  • -d = create directories as needed

List contents

1
cpio -tv < archive.cpio

Copy a directory tree (pass-through mode)

1
find /source -depth | cpio -pdv /destination

-p = pass-through (copy directly, no archive file).

Compare: cpio vs tar

Featuretarcpio
Ease of useEasierMore complex
Handles special filesGoodExcellent
Initramfs formatNoYes
Append filesYes (uncompressed)No
Common usageGeneral backupKernel/initrd

Compressing Data with gzip

gzip compresses individual files — it replaces the original file with a .gz version by default.

Basic usage

1
2
3
4
5
6
7
gzip file.txt                     # compress → file.txt.gz (original deleted)
gzip -k file.txt                  # keep original
gzip -d file.txt.gz               # decompress (same as gunzip)
gunzip file.txt.gz                # decompress

gzip -l file.txt.gz               # list compression ratio and sizes
gzip -t file.txt.gz               # test integrity

Compression levels

1
2
3
gzip -1 file.txt                  # fastest, least compression
gzip -9 file.txt                  # slowest, best compression
gzip -6 file.txt                  # default (balanced)

Compress multiple files

1
2
gzip *.log                        # compress all .log files in place
gzip -r /path/to/dir/             # recursively compress all files in directory

View compressed file without decompressing

1
2
3
4
zcat file.txt.gz                  # like cat but for .gz
zless file.txt.gz                 # like less but for .gz
zgrep "pattern" file.txt.gz       # grep inside .gz without extracting
zdiff file1.txt.gz file2.txt.gz   # diff two .gz files

stdin/stdout (for piping)

1
2
cat file.txt | gzip > file.txt.gz        # compress from stdin
gzip -d < file.txt.gz | grep "pattern"   # decompress and search
1
2
3
4
5
6
bzip2 file.txt        # better compression than gzip, slower (.bz2)
bunzip2 file.txt.bz2  # decompress bzip2
xz file.txt           # best compression of the three (.xz)
unxz file.txt.xz      # decompress xz
lz4 file.txt          # extremely fast, moderate compression (.lz4)
zstd file.txt         # modern: fast + good compression (.zst)

Archiving and Compressing with zip

zip is the standard for cross-platform archives — primarily for sharing with Windows users. Unlike tar+gzip, zip compresses each file individually inside the archive.

Create a zip archive

1
2
3
4
zip archive.zip file1 file2 file3          # add specific files
zip archive.zip *.txt                       # add by pattern
zip -r archive.zip /path/to/dir/           # recursive (include directories)
zip -j archive.zip /path/*.txt             # -j = junk paths (no directory structure)

Compression level

1
2
3
zip -0 archive.zip files    # store only (no compression)
zip -9 archive.zip files    # maximum compression
zip -6 archive.zip files    # default

Password protection

1
2
zip -e archive.zip files    # prompt for password (weak encryption)
zip -P "password" archive.zip files  # inline password (visible in history)

Extract

1
2
3
4
5
unzip archive.zip                          # extract here
unzip archive.zip -d /target/directory/    # extract to directory
unzip -l archive.zip                       # list contents
unzip -t archive.zip                       # test integrity
unzip archive.zip "*.conf"                 # extract specific files

Update an existing archive

1
2
zip -u archive.zip newfile.txt             # add/update files
zip -d archive.zip oldfile.txt             # delete a file from archive

zip vs tar.gz

 ziptar.gz
Cross-platformYes (Windows friendly)Linux/Mac primarily
Random accessYes (per-file)No (sequential)
CompressionPer fileWhole archive
Preserves Unix permissionsPartiallyFully
Best forSharing filesSystem backups

Faster Archiving with pbzip2

pbzip2 is a parallel implementation of bzip2 — it uses all CPU cores, making compression significantly faster on multi-core machines.

1
2
3
4
5
pbzip2 file.txt                   # compress using all cores → file.txt.bz2
pbzip2 -d file.txt.bz2            # decompress
pbzip2 -k file.txt                # keep original
pbzip2 -p 4 file.txt              # use 4 CPU cores explicitly
pbzip2 -9 file.txt                # maximum compression

With tar (parallel bzip2)

1
2
tar -c /path/ | pbzip2 > archive.tar.bz2        # create
pbzip2 -d < archive.tar.bz2 | tar -x           # extract

Benchmark: bzip2 vs pbzip2

1
2
time bzip2 -k largefile
time pbzip2 -k largefile

On a 4-core machine, pbzip2 is typically 3–4x faster.

pigz (parallel gzip)

Same idea but for gzip:

1
2
3
tar -c /path/ | pigz > archive.tar.gz           # parallel gzip compress
tar -c /path/ | pigz -9 > archive.tar.gz        # max compression, parallel
pigz -d archive.tar.gz                          # decompress

Use pigz/pbzip2 for large backups where speed matters.


Creating Filesystems with Compression

Compressed filesystems store data compressed at the block level — reads are transparent, and data is always compressed on disk.

SquashFS — read-only compressed filesystem

Used in live CDs, embedded systems, and container layers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Create a SquashFS image from a directory
mksquashfs /path/to/dir output.squashfs

# With specific compression
mksquashfs /path/to/dir output.squashfs -comp xz   # best compression
mksquashfs /path/to/dir output.squashfs -comp lz4   # fastest

# Mount it (read-only)
mount -t squashfs output.squashfs /mnt/sq

# List contents without mounting
unsquashfs -l output.squashfs

# Extract
unsquashfs -d /output/dir output.squashfs

Btrfs with transparent compression

Btrfs supports per-filesystem or per-directory transparent compression:

1
2
3
4
5
6
7
8
# Mount with compression
mount -o compress=zstd /dev/sdb1 /mnt/data

# Enable on existing Btrfs filesystem
btrfs property set /mnt/data compression zstd

# Check compression ratio
compsize /mnt/data    # requires compsize package

NTFS compressed files (for cross-platform)

1
ntfs-3g -o compression /dev/sdb1 /mnt/ntfs

Backup Snapshots with rsync

rsync is the gold standard for incremental backups. It only transfers changed data (delta sync), making it fast and bandwidth-efficient.

Basic syntax

1
rsync [options] source destination

Local sync

1
2
3
rsync -av /source/ /destination/       # archive mode + verbose
rsync -av --delete /source/ /dest/     # delete files in dest not in source
rsync -av --dry-run /source/ /dest/    # preview what would change

Trailing slash matters:

  • /source/ — sync the contents of source
  • /source — sync the source directory itself

Common flags

FlagMeaning
-aArchive mode: preserves permissions, timestamps, symlinks, owner, group
-vVerbose
-zCompress during transfer
-PShow progress + keep partial transfers
--deleteDelete files in dest that don’t exist in source
--excludeExclude pattern
--backupKeep backup of overwritten files
--dry-runSimulate without making changes
-nSame as --dry-run

Remote sync over SSH

1
2
3
rsync -avz /local/dir/ user@server:/remote/dir/   # local → remote
rsync -avz user@server:/remote/dir/ /local/dir/   # remote → local
rsync -avz -e "ssh -p 2222" /local/ user@host:/backup/  # custom SSH port

Exclude patterns

1
2
rsync -av --exclude="*.log" --exclude=".cache/" /source/ /dest/
rsync -av --exclude-from="exclude.txt" /source/ /dest/

exclude.txt:

1
2
3
4
5
*.log
*.tmp
.cache/
node_modules/
__pycache__/

Automated daily backup script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/bin/bash
SRC="/home/omar/"
DEST="/backup/omar/"
LOG="/var/log/rsync_backup.log"
DATE=$(date +"%Y-%m-%d %H:%M:%S")

echo "[$DATE] Starting backup" >> "$LOG"

rsync -avz --delete \
  --exclude=".cache/" \
  --exclude="*.tmp" \
  --backup \
  --backup-dir="/backup/snapshots/$(date +%Y%m%d)" \
  "$SRC" "$DEST" >> "$LOG" 2>&1

echo "[$DATE] Backup complete" >> "$LOG"
1
2
3
4
5
6
7
8
9
10
11
12
#!/bin/bash
BACKUP_DIR="/backup"
SRC="/home/"
DATE=$(date +%Y-%m-%d)
LATEST="$BACKUP_DIR/latest"

rsync -avz --delete \
  --link-dest="$LATEST" \
  "$SRC" "$BACKUP_DIR/$DATE/"

# Update the "latest" symlink
ln -snf "$BACKUP_DIR/$DATE" "$LATEST"

--link-dest creates hardlinks for unchanged files — each snapshot looks complete but only stores the differences. Disk usage is minimal.


Version Control Based Backup with Git

Git isn’t just for code — it can back up any text-based configuration or document with full history.

Basic git backup workflow

1
2
3
4
5
6
7
8
9
10
cd /etc
git init
git add .
git commit -m "Initial backup $(date +%Y-%m-%d)"

# After any changes
git add -A
git commit -m "Config update $(date +%Y-%m-%d)"
git log --oneline                # see history
git diff HEAD~1                  # what changed since last backup

Push to a remote (offsite backup)

1
2
3
4
5
git remote add origin git@github.com:user/config-backup.git
git push -u origin main

# Scheduled push
git add -A && git commit -m "Auto backup $(date +%Y-%m-%d %H:%M)" && git push

etckeeper — automated /etc version control

1
2
3
4
apt install etckeeper
etckeeper init                   # initialise /etc as a git repo
etckeeper commit "Initial"       # manual commit
# Automatically commits before apt installs packages

Restore a file from history

1
2
3
git log --oneline -- /etc/nginx/nginx.conf    # history for one file
git show HEAD~3:etc/nginx/nginx.conf          # view old version
git checkout HEAD~3 -- etc/nginx/nginx.conf  # restore old version

Backup a directory to a bare repo

1
2
3
4
5
6
7
8
9
# Create a bare repo on the backup server
ssh backup-server "git init --bare /backup/myrepo.git"

# Push from the machine you're backing up
cd /path/to/data
git init
git remote add backup ssh://backup-server/backup/myrepo.git
git add . && git commit -m "Backup"
git push backup main

Creating Disk Images with fsarchiver

fsarchiver creates filesystem images — it understands the filesystem structure (unlike dd) so it can compress and restore efficiently, and even restore to a filesystem of a different size.

Save a filesystem to an archive

1
2
3
4
fsarchiver savefs /backup/root.fsa /dev/sda1      # save root partition
fsarchiver savefs /backup/all.fsa /dev/sda1 /dev/sda2  # multiple partitions
fsarchiver savefs -z9 /backup/root.fsa /dev/sda1  # maximum compression
fsarchiver savefs -j4 /backup/root.fsa /dev/sda1  # use 4 threads

Important: The source filesystem must be unmounted or mounted read-only. Run from a live environment for the system partition.

Restore a filesystem

1
2
3
fsarchiver restfs /backup/root.fsa id=0,dest=/dev/sda1
# id=0 = first filesystem in the archive
# dest = target partition (will be formatted and restored)

Restore to a different sized partition:

1
2
fsarchiver restfs /backup/root.fsa id=0,dest=/dev/sdb1
# fsarchiver handles the resize automatically — unlike dd

Inspect an archive

1
fsarchiver archinfo /backup/root.fsa       # show archive details

fsarchiver vs dd

 fsarchiverdd
Understands filesystemYesNo (raw blocks)
CompressedYesOnly with pipes
Restore to different sizeYesNo
SpeedFaster (skips empty space)Copies everything
Handles bad sectorsBetterCan fail
Cross-filesystem restoreYesNo

Full disk backup workflow

1
2
3
4
5
6
7
8
9
# Boot from live USB, then:
fsarchiver savefs -z6 -j$(nproc) /mnt/backup/system.fsa /dev/sda1
fsarchiver savefs -z6 -j$(nproc) /mnt/backup/home.fsa /dev/sda2

# Verify the archives
fsarchiver archinfo /mnt/backup/system.fsa
fsarchiver archinfo /mnt/backup/home.fsa

echo "Backup complete."

📚 References


You can find me online at:

My signature image

This post is licensed under CC BY 4.0 by the author.