Post

Week 3 — Day 15: SAST with Semgrep

A full walkthrough of Semgrep for static application security testing — running scans, writing custom rules, understanding findings, and integrating into GitHub Actions as a PR gate.

Week 3 — Day 15: SAST with Semgrep

What is SAST?

Static Application Security Testing (SAST) analyzes source code without executing it to find security vulnerabilities — SQL injection, hardcoded secrets, insecure deserialization, path traversal, and more.

It runs early in the pipeline — on every commit or PR — giving developers fast feedback before code reaches production.

What SAST can catch:

  • SQL injection patterns
  • XSS vulnerabilities
  • Hardcoded credentials
  • Insecure use of crypto functions
  • Command injection
  • Path traversal

What SAST cannot catch:

  • Runtime behavior (use DAST for that)
  • Business logic flaws
  • Auth misconfigurations at the infrastructure level

Why Semgrep?

Semgrep is a fast, open-source SAST tool that:

  • Supports 30+ languages natively
  • Has a massive community ruleset (2000+ rules)
  • Lets you write custom rules in a readable YAML syntax
  • Integrates natively with GitHub, GitLab, and CI/CD pipelines
  • Has no server required — runs as a CLI

Installation

1
2
3
4
5
6
7
8
# Windows (via pip)
pip install semgrep

# Or via winget
winget install Semgrep.Semgrep

# Verify
semgrep --version

Running Your First Scan

1
2
3
4
5
6
7
8
# Scan current directory with the auto ruleset (Semgrep picks the right rules for your languages)
semgrep scan --config auto .

# Scan with OWASP Top 10 rules
semgrep scan --config "p/owasp-top-ten" .

# Scan with security-audit ruleset
semgrep scan --config "p/security-audit" .

[SCREENSHOT]Terminal showing semgrep scan –config auto . running on a project, output showing findings with file path, line number, rule ID, severity, and the matched code snippet highlighted


Understanding the Output

Each finding shows:

1
2
3
4
5
6
7
/src/app/db.py
  vulnerability.sql-injection
  ❯❯❱ Line 42: query = "SELECT * FROM users WHERE id = " + user_input
        Found SQL injection: user input is directly concatenated into SQL query.
        This can allow an attacker to read or modify any data in the database.
        Severity: ERROR
        Fix: Use parameterized queries or prepared statements.
FieldMeaning
File + lineExactly where the issue is
Rule IDWhich rule caught it
SeverityERROR / WARNING / INFO
MessageWhat the vulnerability is and why it matters
FixHow to remediate

[SCREENSHOT]Semgrep output showing 3-4 findings from a vulnerable Python app with the matched code lines highlighted in red


Key Rulesets

Semgrep’s community registry has pre-built rulesets for most use cases:

RulesetCommandUse for
Auto (language-appropriate)--config autoDefault — good starting point
OWASP Top 10--config p/owasp-top-tenWeb app vulnerabilities
Security audit--config p/security-auditBroad security checks
Secrets--config p/secretsHardcoded credentials
Python-specific--config p/pythonDjango, Flask, SQLAlchemy issues
Node.js--config p/nodejsExpress, npm security
Docker--config p/dockerfileDockerfile misconfigs
CI/CD--config p/ciGitHub Actions, Jenkins issues
1
2
3
4
5
6
# Run multiple rulesets at once
semgrep scan \
  --config p/owasp-top-ten \
  --config p/secrets \
  --config p/security-audit \
  .

[SCREENSHOT]Terminal showing semgrep scan with multiple –config flags running and the summary at the end: “X findings across Y files”


Filtering Output

1
2
3
4
5
6
7
8
9
10
11
# Only show errors (not warnings or info)
semgrep scan --config auto --severity ERROR .

# Output as JSON for parsing
semgrep scan --config auto --json --output results.json .

# Output as SARIF for GitHub Code Scanning
semgrep scan --config auto --sarif --output results.sarif .

# Exclude directories
semgrep scan --config auto --exclude-dir node_modules --exclude-dir .venv .

Writing Custom Rules

This is Semgrep’s killer feature — custom rules in readable YAML that match code patterns in your specific codebase.

Rule Structure

1
2
3
4
5
6
7
8
9
10
11
12
rules:
  - id: my-rule-id
    patterns:
      - pattern: <code pattern to match>
    message: >
      Description of what the rule found and why it's a problem.
      How to fix it.
    languages: [python]
    severity: ERROR
    metadata:
      category: security
      cwe: "CWE-89: SQL Injection"

Example 1 — Detect SQL Injection in Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
rules:
  - id: sql-injection-string-format
    patterns:
      - pattern: |
          $QUERY = "..." % $INPUT
          $DB.execute($QUERY)
      - pattern: |
          $DB.execute("..." % $INPUT)
      - pattern: |
          $DB.execute(f"...{$INPUT}...")
    message: >
      SQL query built with string formatting — SQL injection risk.
      Use parameterized queries: cursor.execute("SELECT * FROM t WHERE id = %s", (user_id,))
    languages: [python]
    severity: ERROR
    metadata:
      cwe: "CWE-89"
      owasp: "A03:2021 - Injection"

Example 2 — Detect Hardcoded AWS Keys

1
2
3
4
5
6
7
8
9
10
11
12
rules:
  - id: hardcoded-aws-access-key
    patterns:
      - pattern-regex: 'AKIA[0-9A-Z]{16}'
    message: >
      Hardcoded AWS access key detected.
      Remove immediately and rotate the key in IAM.
      Use environment variables or Secrets Manager instead.
    languages: [generic]
    severity: ERROR
    metadata:
      cwe: "CWE-798"

Example 3 — Detect Dangerous eval() in JavaScript

1
2
3
4
5
6
7
8
9
10
rules:
  - id: dangerous-eval
    patterns:
      - pattern: eval($X)
      - pattern-not: eval("...")   # allow literal strings (low risk)
    message: >
      eval() called with a non-literal argument — potential code injection.
      Avoid eval() entirely; use JSON.parse() for data or refactor the logic.
    languages: [javascript, typescript]
    severity: WARNING

Example 4 — Detect Insecure subprocess in Python

1
2
3
4
5
6
7
8
9
10
11
rules:
  - id: subprocess-shell-true
    patterns:
      - pattern: subprocess.run($CMD, ..., shell=True, ...)
      - pattern: subprocess.call($CMD, ..., shell=True, ...)
      - pattern: subprocess.Popen($CMD, ..., shell=True, ...)
    message: >
      subprocess called with shell=True and a variable command — command injection risk.
      Use shell=False and pass a list of arguments instead.
    languages: [python]
    severity: ERROR

Testing Custom Rules

Semgrep has a built-in test mechanism — write test cases alongside your rules:

1
2
3
4
5
6
7
8
9
10
11
rules:
  - id: sql-injection-string-format
    # ... rule definition ...

# Test file: test_sql_injection.py
# ruleid: sql-injection-string-format
query = "SELECT * FROM users WHERE id = %s" % user_id
db.execute(query)

# ok: sql-injection-string-format
db.execute("SELECT * FROM users WHERE id = %s", (user_id,))
1
2
# Run tests
semgrep --test .

[SCREENSHOT]Terminal showing semgrep –test output: “1 passed, 0 failed” confirming the rule correctly catches the bad pattern and ignores the good one


Ignoring False Positives

Suppress a specific finding inline:

1
2
# nosemgrep: sql-injection-string-format
query = build_safe_query(user_id)   # this function sanitizes input

Suppress across a file — add to .semgrepignore:

1
2
3
4
5
# .semgrepignore
tests/
vendor/
*.min.js
migrations/

Integrating into GitHub Actions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# .github/workflows/semgrep.yml
name: Semgrep SAST

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: semgrep/semgrep

    steps:
      - uses: actions/checkout@v4

      - name: Run Semgrep
        run: |
          semgrep scan \
            --config p/owasp-top-ten \
            --config p/secrets \
            --sarif \
            --output semgrep.sarif \
            --severity ERROR \
            --error \
            .
        env:
          SEMGREP_APP_TOKEN: $

      - name: Upload SARIF to GitHub Security tab
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: semgrep.sarif

[SCREENSHOT]GitHub Actions run showing the Semgrep step — green if no errors found, or red with the violation listed in the step output

[SCREENSHOT]GitHub repository → Security tab → Code scanning alerts showing Semgrep findings with their severity, rule ID, and the file/line they were found in

--error flag: makes Semgrep exit with code 1 when any finding at the specified severity is found — this blocks the PR from merging.


Lab — Scan a Vulnerable App

Objective: Run Semgrep against an intentionally vulnerable application and review findings.

  1. Clone a vulnerable app:
    1
    2
    
    git clone https://github.com/juice-shop/juice-shop
    cd juice-shop
    
  2. Run the OWASP Top 10 scan:
    1
    
    semgrep scan --config p/owasp-top-ten --severity ERROR . 2>/dev/null
    

[SCREENSHOT]Semgrep scan output on Juice Shop showing multiple findings — SQL injection, XSS, hardcoded secrets — with file paths and line numbers

  1. Run the secrets scan:
    1
    
    semgrep scan --config p/secrets . 2>/dev/null
    

[SCREENSHOT]Semgrep secrets scan output showing any hardcoded credentials or API keys found in the codebase

  1. Pick one finding → look at the code → understand why it’s flagged → write the fix

  2. Write a custom rule for one pattern you notice in the code:

    1
    2
    
    # Create my-rules.yaml with your rule
    semgrep scan --config my-rules.yaml .
    

Key Takeaways

  • SAST runs on code — catch vulnerabilities at development time, not production
  • --config auto is your fastest start — Semgrep selects the right rules for your languages
  • Custom rules are Semgrep’s superpower — model rules on patterns specific to your codebase
  • Use --sarif + GitHub upload to get persistent findings in the Security tab
  • --error flag in CI makes Semgrep a hard gate — PRs with critical findings can’t merge
  • nosemgrep inline comments for accepted false positives — don’t suppress whole files

References


You can find me online at:

My signature image

This post is licensed under CC BY 4.0 by the author.