Post

Chapter 4 Texting and Driving

Chapter 4 of Linux Shell Scripting Cookbook — text processing with grep, sed, awk, cut, and regular expressions

Chapter 4 Texting and Driving

Chapter Overview

This chapter is the core of shell text processing — grep, sed, awk, cut, and regular expressions. These four tools together can parse, transform, extract, and reformat almost any structured text. Every sysadmin, developer, and security engineer uses them daily.


Regular Expressions

Regular expressions (regex) are patterns used to match text. Before diving into the tools, you need to know the syntax.

Basic (BRE) vs Extended (ERE)

Most tools support both. ERE is generally cleaner — use grep -E, sed -E, or awk (which uses ERE by default).

Character classes

1
2
3
4
5
6
.        any single character (except newline)
[abc]    a, b, or c
[^abc]   anything NOT a, b, or c
[a-z]    lowercase letters
[0-9]    digits
[a-zA-Z0-9]  alphanumeric

Anchors

1
2
3
4
^        start of line
$        end of line
\b       word boundary
\B       not a word boundary

Quantifiers

1
2
3
4
5
6
*        0 or more
+        1 or more (ERE)
?        0 or 1 (ERE)
{n}      exactly n times (ERE)
{n,}     n or more (ERE)
{n,m}    between n and m (ERE)

Special sequences

1
2
3
4
\d       digit (same as [0-9]) — works in some tools
\w       word character [a-zA-Z0-9_]
\s       whitespace
\D \W \S negations of above

Groups and alternation

1
2
(abc)    group (ERE)
a|b      a or b (ERE)

Examples

1
2
3
4
5
^ERROR           # lines starting with ERROR
\.log$           # lines ending with .log
[0-9]{1,3}       # 1 to 3 digits
\b\w+@\w+\.\w+   # rough email pattern
https?://        # http:// or https://

Searching with grep

grep searches for patterns in files or stdin.

1
2
3
4
5
6
7
8
9
10
11
12
13
grep "pattern" file.txt              # basic search
grep -i "pattern" file.txt           # case-insensitive
grep -v "pattern" file.txt           # invert — lines NOT matching
grep -n "pattern" file.txt           # show line numbers
grep -c "pattern" file.txt           # count matching lines
grep -l "pattern" *.txt              # list files that match
grep -L "pattern" *.txt              # list files that DON'T match
grep -r "pattern" /path/             # recursive search
grep -w "word" file.txt              # whole word match only
grep -x "exact line" file.txt        # whole line match
grep -A 3 "pattern" file.txt         # 3 lines after match
grep -B 3 "pattern" file.txt         # 3 lines before match
grep -C 3 "pattern" file.txt         # 3 lines before and after

Extended regex (-E)

1
2
3
grep -E "error|warn|fatal" log.txt          # OR
grep -E "^[0-9]{4}-[0-9]{2}-[0-9]{2}" log  # date pattern
grep -E "\b[A-Z]{2,}\b" file.txt            # all-caps words

Fixed string (-F) — no regex, faster for literal searches

1
grep -F "192.168.1.1" access.log     # literal IP, no regex overhead

Useful combos

1
2
3
4
5
6
7
8
9
# Count errors per log file
grep -rc "ERROR" /var/log/

# Find files containing a pattern, then open them
grep -rl "TODO" . | xargs vim

# Search only specific file types
grep -r "pattern" . --include="*.py"
grep -r "pattern" . --exclude="*.log"

Cutting Columns with cut

cut extracts columns or fields from each line. Fast and simple for structured text.

By character position

1
2
3
4
cut -c 1-5 file.txt          # characters 1 to 5
cut -c 1,3,5 file.txt        # characters 1, 3, and 5
cut -c 10- file.txt          # from character 10 to end
cut -c -20 file.txt          # first 20 characters

By field (delimiter-separated)

1
2
3
4
5
cut -d ':' -f 1 /etc/passwd        # first field (username)
cut -d ':' -f 1,3 /etc/passwd      # fields 1 and 3
cut -d ':' -f 1-3 /etc/passwd      # fields 1 through 3
cut -d ',' -f 2 data.csv           # second column of CSV
cut -d $'\t' -f 3 file.tsv         # third column of TSV

Extract IPs from an access log:

1
cut -d ' ' -f 1 access.log | sort | uniq -c | sort -rn | head

Limitation: cut can’t handle multiple consecutive delimiters as one. For that, use awk.


Text Replacement with sed

sed (stream editor) applies edits to each line of input without opening a file interactively.

Substitution — the most used command

1
2
3
4
5
sed 's/old/new/' file.txt           # replace first match per line
sed 's/old/new/g' file.txt          # replace all matches per line (global)
sed 's/old/new/i' file.txt          # case-insensitive
sed 's/old/new/2' file.txt          # replace only 2nd occurrence
sed 's/old/new/gi' file.txt         # global + case-insensitive

Edit file in-place:

1
2
sed -i 's/old/new/g' file.txt       # modify file directly
sed -i.bak 's/old/new/g' file.txt   # in-place with backup (.bak)

Address ranges (which lines to act on)

1
2
3
4
sed '5s/old/new/' file.txt          # only line 5
sed '5,10s/old/new/g' file.txt      # lines 5 to 10
sed '/pattern/s/old/new/g' file.txt # lines matching a pattern
sed '5,/end/s/old/new/g' file.txt   # from line 5 to first match of "end"

Delete lines

1
2
3
4
5
sed '5d' file.txt                   # delete line 5
sed '5,10d' file.txt                # delete lines 5-10
sed '/pattern/d' file.txt           # delete lines matching pattern
sed '/^$/d' file.txt                # delete blank lines
sed '/^[[:space:]]*$/d' file.txt    # delete whitespace-only lines
1
2
3
sed -n '5p' file.txt                # print only line 5
sed -n '5,10p' file.txt             # print lines 5-10
sed -n '/start/,/end/p' file.txt    # print between patterns

Append, insert, change

1
2
3
sed '5a\new line after' file.txt    # append after line 5
sed '5i\new line before' file.txt   # insert before line 5
sed '5c\replacement line' file.txt  # replace line 5 entirely

Extended regex

1
2
sed -E 's/[0-9]+/NUM/g' file.txt    # replace all numbers
sed -E 's/(error|warn)/[\1]/gi'     # wrap matched word in brackets

Backreferences — reference captured groups with \1, \2:

1
2
sed -E 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\3\/\2\/\1/' file.txt
# 2026-03-18 → 18/03/2026

Advanced Text Processing with awk

awk is a full programming language for text processing. It processes each line as a record and splits it into fields automatically.

Basic structure

1
awk 'pattern { action }' file.txt
  • pattern — which lines to act on (omit to match all)
  • action — what to do (omit to print the line)

Built-in variables

VariableMeaning
$0Entire current line
$1, $2First, second field
NFNumber of fields in current line
NRCurrent line number (record number)
FSField separator (default: whitespace)
OFSOutput field separator
RSRecord separator (default: newline)
ORSOutput record separator

Examples

1
2
3
4
5
6
7
8
awk '{print $1}' file.txt              # print first field of every line
awk '{print $NF}' file.txt             # print last field
awk '{print NR, $0}' file.txt          # number every line
awk 'NR==5' file.txt                   # print only line 5
awk 'NR>=5 && NR<=10' file.txt         # print lines 5-10
awk '/pattern/' file.txt               # print lines matching pattern
awk '!/pattern/' file.txt              # print lines NOT matching
awk 'NF > 3' file.txt                  # lines with more than 3 fields

Custom delimiter

1
2
3
awk -F ':' '{print $1}' /etc/passwd        # use : as separator
awk -F ',' '{print $2, $4}' data.csv       # CSV — print columns 2 and 4
awk -F '\t' '{print $3}' file.tsv          # tab-separated

Calculations

1
2
3
awk '{sum += $1} END {print sum}' nums.txt           # sum a column
awk '{sum += $1} END {print sum/NR}' nums.txt        # average
awk 'BEGIN{max=0} $1>max{max=$1} END{print max}' f   # max value

BEGIN and END blocks

1
awk 'BEGIN {print "Start"} {print $0} END {print "Done"}' file.txt

BEGIN runs once before any lines are processed. END runs once after all lines.

Conditionals and loops

1
2
3
4
5
6
7
awk '{
  if ($3 > 100) {
    print $1, "high"
  } else {
    print $1, "low"
  }
}' file.txt

Output formatting with printf

1
awk '{printf "%-20s %5d\n", $1, $2}' file.txt

Practical examples

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Sum the size column in ls -l output
ls -l | awk '{sum += $5} END {print sum " bytes"}'

# Print lines where field 3 is greater than 50
awk -F ',' '$3 > 50' data.csv

# Count occurrences of each value in column 2
awk -F ',' '{count[$2]++} END {for (k in count) print k, count[k]}' data.csv

# Print duplicate lines
awk 'seen[$0]++' file.txt

# Remove duplicate lines (keep first occurrence)
awk '!seen[$0]++' file.txt

Word Frequency in a File

Count how often each word appears:

1
2
tr -s '[:space:]' '\n' < file.txt | tr '[:upper:]' '[:lower:]' | \
  sort | uniq -c | sort -rn | head -20

Step by step:

  1. tr -s '[:space:]' '\n' — replace all whitespace with newlines (one word per line)
  2. tr '[:upper:]' '[:lower:]' — lowercase everything
  3. sort — sort alphabetically (required for uniq)
  4. uniq -c — count consecutive duplicates
  5. sort -rn — sort by count descending
  6. head -20 — top 20

With awk (handles punctuation better):

1
2
3
4
5
6
7
8
9
awk '{
  gsub(/[^a-zA-Z]/, " ")   # replace non-letters with spaces
  for (i=1; i<=NF; i++) {
    word = tolower($i)
    if (length(word) > 0) freq[word]++
  }
} END {
  for (w in freq) print freq[w], w
}' file.txt | sort -rn | head -20

Compressing and Decompressing JavaScript

Minifying JS removes whitespace and comments to reduce file size for production.

Using uglifyjs

1
2
3
4
5
npm install -g uglify-js

uglifyjs script.js -o script.min.js                      # minify
uglifyjs script.js -o script.min.js --compress --mangle  # compress + mangle names
uglifyjs script.min.js --beautify -o script.readable.js  # decompress/beautify

Quick sed minification (basic — not production-grade)

1
2
3
sed 's/\/\/.*$//g' script.js |   # remove single-line comments
sed 's/[[:space:]]\+/ /g' |      # squeeze whitespace
sed 's/^ //; s/ $//'             # trim leading/trailing spaces
1
2
cat data.json | python3 -m json.tool          # pretty print
cat data.json | python3 -m json.tool --compact # compact/minify

Merging Files as Columns

paste

paste merges files side by side, line by line:

1
2
3
4
5
paste file1.txt file2.txt               # tab-separated by default
paste -d ',' file1.txt file2.txt        # comma-separated
paste -d ':' file1.txt file2.txt file3.txt  # three files, colon-separated
paste -s file.txt                       # serial — merge lines of ONE file into one line
paste -s -d ',' file.txt               # comma-separated single line

Create a CSV from separate column files:

1
paste -d ',' names.txt ages.txt emails.txt > people.csv

Combine with process substitution:

1
2
paste <(cut -d: -f1 /etc/passwd) <(cut -d: -f7 /etc/passwd)
# username   shell (side by side)

column

Format output into aligned columns:

1
2
3
column -t file.txt                      # auto-align columns
paste file1.txt file2.txt | column -t   # merge then align
cat /etc/passwd | column -t -s ':'      # align by delimiter

Printing the nth Word or Column

awk — most reliable

1
2
3
awk '{print $3}' file.txt          # 3rd field of every line
awk 'NR==5 {print $3}' file.txt    # 3rd field of line 5 only
awk -F ',' '{print $2}' file.csv   # 2nd column of CSV

cut

1
2
cut -d ' ' -f 3 file.txt           # 3rd space-separated field
cut -d ',' -f 2 file.csv           # 2nd CSV column

Note: cut treats consecutive delimiters as separate fields. awk collapses whitespace.

From a single line (not a file)

1
2
echo "one two three four" | awk '{print $3}'     # three
echo "a:b:c:d" | cut -d ':' -f 2                 # b

Last field

1
2
awk '{print $NF}' file.txt              # last field (NF = number of fields)
rev file.txt | cut -d ' ' -f 1 | rev   # reverse trick

Printing Text Between Line Numbers or Patterns

By line numbers

1
2
3
sed -n '10,20p' file.txt              # lines 10 to 20
awk 'NR>=10 && NR<=20' file.txt       # same with awk
head -20 file.txt | tail -11          # lines 10-20 (head then tail)

Between patterns

1
2
3
4
5
6
sed -n '/START/,/END/p' file.txt      # from START to END (inclusive)
awk '/START/,/END/' file.txt          # same with awk

# Exclusive (don't include the pattern lines themselves)
sed -n '/START/,/END/{/START/!{/END/!p}}' file.txt
awk '/START/{f=1; next} /END/{f=0} f' file.txt

After a pattern to end of file

1
2
sed -n '/pattern/,$p' file.txt        # from pattern to EOF
awk '/pattern/,0' file.txt            # awk equivalent

Between Nth and Mth occurrence of a pattern

1
awk '/SECTION/{count++} count==2,count==3' file.txt

Printing Lines in Reverse Order

tac

tac is cat backwards — reverses line order:

1
2
tac file.txt                          # reverse all lines
tac file.txt | head -5                # last 5 lines in forward order

awk

1
awk '{lines[NR]=$0} END {for(i=NR;i>=1;i--) print lines[i]}' file.txt

Reverse character order within a line

1
2
rev file.txt                          # reverse characters on each line
echo "hello" | rev                    # olleh

Combine for full reversal (lines and characters):

1
tac file.txt | rev

Parsing Email Addresses and URLs

Extract emails with grep

1
grep -Eo '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

-E = extended regex, -o = print only the matched part (not the full line).

Extract URLs with grep

1
2
grep -Eo 'https?://[^[:space:]"]+' file.txt
grep -Eo '(http|https|ftp)://[^[:space:]]*' file.txt

Extract with awk

1
2
3
4
5
6
7
# Extract all emails
awk '{
  while (match($0, /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/)) {
    print substr($0, RSTART, RLENGTH)
    $0 = substr($0, RSTART + RLENGTH)
  }
}' file.txt

Validate an email (basic)

1
2
echo "user@example.com" | grep -Eq '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' \
  && echo "valid" || echo "invalid"

Removing Sentences Containing a Word

sed

1
2
3
sed '/keyword/d' file.txt                    # delete lines containing keyword
sed '/error/Id' file.txt                     # case-insensitive delete
sed '/^.*keyword.*$/d' file.txt              # explicit full-line match

grep -v (inverse)

1
2
grep -v "keyword" file.txt                   # print all lines EXCEPT those with keyword
grep -vi "keyword" file.txt                  # case-insensitive

awk

1
2
awk '!/keyword/' file.txt                    # print lines not matching
awk 'tolower($0) !~ /keyword/' file.txt      # case-insensitive

Remove multiple keywords:

1
2
grep -vE "error|warning|debug" file.txt
sed '/error\|warning\|debug/d' file.txt

Edit in-place:

1
sed -i '/keyword/d' file.txt

Replacing a Pattern in All Files in a Directory

grep + sed combo

1
grep -rl "old_pattern" /path/ | xargs sed -i 's/old_pattern/new_text/g'

-r = recursive, -l = list files only (not matches), then pipe to xargs sed -i.

find + sed

1
find /path -type f -name "*.txt" -exec sed -i 's/old/new/g' {} \;

Safer — preview first:

1
2
3
grep -rl "old" . | xargs grep -l "old"        # confirm which files
grep -rl "old" . | xargs sed -n 's/old/new/gp' # dry run (print changes, don't save)
grep -rl "old" . | xargs sed -i.bak 's/old/new/g'  # apply with backup

Specific file types only

1
2
find . -name "*.py" | xargs sed -i 's/import old/import new/g'
find . -name "*.conf" | xargs sed -i 's/localhost/production.host.com/g'

Text Slicing and Parameter Operations

Bash parameter expansion lets you slice and manipulate strings without spawning subshells.

Length

1
2
str="hello world"
echo ${#str}            # 11

Substring extraction

1
2
3
echo ${str:6}           # world (from position 6)
echo ${str:0:5}         # hello (positions 0-4)
echo ${str:(-5)}        # world (last 5 characters)

Remove prefix/suffix

1
2
3
4
5
6
filename="report_2026.txt"

echo ${filename#report_}       # 2026.txt  (remove shortest prefix match)
echo ${filename##*_}           # 2026.txt  (remove longest prefix match)
echo ${filename%.txt}          # report_2026 (remove shortest suffix)
echo ${filename%%.*}           # report_2026 (remove longest suffix)

Find and replace

1
2
3
4
5
str="hello hello world"
echo ${str/hello/hi}           # hi hello world  (first match)
echo ${str//hello/hi}          # hi hi world     (all matches)
echo ${str/#hello/hi}          # hi hello world  (prefix match only)
echo ${str/%world/earth}       # hello hello earth (suffix match only)

Case conversion (bash 4+)

1
2
3
4
5
str="Hello World"
echo ${str^^}           # HELLO WORLD (all uppercase)
echo ${str,,}           # hello world (all lowercase)
echo ${str^}            # Hello World (capitalize first char)
echo ${str,}            # hello World (lowercase first char)

Default values

1
2
3
4
echo ${var:-default}    # use "default" if var is unset or empty
echo ${var:=default}    # set var to "default" if unset, then use it
echo ${var:+other}      # use "other" if var IS set (opposite)
echo ${var:?error msg}  # exit with error if var is unset

Practical example — batch rename with slicing

1
2
3
4
for f in IMG_*.jpg; do
  date_part="${f:4:8}"        # extract 8 chars starting at position 4
  mv "$f" "photo_${date_part}.jpg"
done

📚 References


You can find me online at:

My signature image

This post is licensed under CC BY 4.0 by the author.