Week 3 — Day 15: SAST with Semgrep
A full walkthrough of Semgrep for static application security testing — running scans, writing custom rules, understanding findings, and integrating into GitHub Actions as a PR gate.
What is SAST?
Static Application Security Testing (SAST) analyzes source code without executing it to find security vulnerabilities — SQL injection, hardcoded secrets, insecure deserialization, path traversal, and more.
It runs early in the pipeline — on every commit or PR — giving developers fast feedback before code reaches production.
What SAST can catch:
- SQL injection patterns
- XSS vulnerabilities
- Hardcoded credentials
- Insecure use of crypto functions
- Command injection
- Path traversal
What SAST cannot catch:
- Runtime behavior (use DAST for that)
- Business logic flaws
- Auth misconfigurations at the infrastructure level
Why Semgrep?
Semgrep is a fast, open-source SAST tool that:
- Supports 30+ languages natively
- Has a massive community ruleset (2000+ rules)
- Lets you write custom rules in a readable YAML syntax
- Integrates natively with GitHub, GitLab, and CI/CD pipelines
- Has no server required — runs as a CLI
Installation
1
2
3
4
5
6
7
8
# Windows (via pip)
pip install semgrep
# Or via winget
winget install Semgrep.Semgrep
# Verify
semgrep --version
Running Your First Scan
1
2
3
4
5
6
7
8
# Scan current directory with the auto ruleset (Semgrep picks the right rules for your languages)
semgrep scan --config auto .
# Scan with OWASP Top 10 rules
semgrep scan --config "p/owasp-top-ten" .
# Scan with security-audit ruleset
semgrep scan --config "p/security-audit" .
[SCREENSHOT]— Terminal showing semgrep scan –config auto . running on a project, output showing findings with file path, line number, rule ID, severity, and the matched code snippet highlighted
Understanding the Output
Each finding shows:
1
2
3
4
5
6
7
/src/app/db.py
vulnerability.sql-injection
❯❯❱ Line 42: query = "SELECT * FROM users WHERE id = " + user_input
Found SQL injection: user input is directly concatenated into SQL query.
This can allow an attacker to read or modify any data in the database.
Severity: ERROR
Fix: Use parameterized queries or prepared statements.
| Field | Meaning |
|---|---|
| File + line | Exactly where the issue is |
| Rule ID | Which rule caught it |
| Severity | ERROR / WARNING / INFO |
| Message | What the vulnerability is and why it matters |
| Fix | How to remediate |
[SCREENSHOT]— Semgrep output showing 3-4 findings from a vulnerable Python app with the matched code lines highlighted in red
Key Rulesets
Semgrep’s community registry has pre-built rulesets for most use cases:
| Ruleset | Command | Use for |
|---|---|---|
| Auto (language-appropriate) | --config auto | Default — good starting point |
| OWASP Top 10 | --config p/owasp-top-ten | Web app vulnerabilities |
| Security audit | --config p/security-audit | Broad security checks |
| Secrets | --config p/secrets | Hardcoded credentials |
| Python-specific | --config p/python | Django, Flask, SQLAlchemy issues |
| Node.js | --config p/nodejs | Express, npm security |
| Docker | --config p/dockerfile | Dockerfile misconfigs |
| CI/CD | --config p/ci | GitHub Actions, Jenkins issues |
1
2
3
4
5
6
# Run multiple rulesets at once
semgrep scan \
--config p/owasp-top-ten \
--config p/secrets \
--config p/security-audit \
.
[SCREENSHOT]— Terminal showing semgrep scan with multiple –config flags running and the summary at the end: “X findings across Y files”
Filtering Output
1
2
3
4
5
6
7
8
9
10
11
# Only show errors (not warnings or info)
semgrep scan --config auto --severity ERROR .
# Output as JSON for parsing
semgrep scan --config auto --json --output results.json .
# Output as SARIF for GitHub Code Scanning
semgrep scan --config auto --sarif --output results.sarif .
# Exclude directories
semgrep scan --config auto --exclude-dir node_modules --exclude-dir .venv .
Writing Custom Rules
This is Semgrep’s killer feature — custom rules in readable YAML that match code patterns in your specific codebase.
Rule Structure
1
2
3
4
5
6
7
8
9
10
11
12
rules:
- id: my-rule-id
patterns:
- pattern: <code pattern to match>
message: >
Description of what the rule found and why it's a problem.
How to fix it.
languages: [python]
severity: ERROR
metadata:
category: security
cwe: "CWE-89: SQL Injection"
Example 1 — Detect SQL Injection in Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
rules:
- id: sql-injection-string-format
patterns:
- pattern: |
$QUERY = "..." % $INPUT
$DB.execute($QUERY)
- pattern: |
$DB.execute("..." % $INPUT)
- pattern: |
$DB.execute(f"...{$INPUT}...")
message: >
SQL query built with string formatting — SQL injection risk.
Use parameterized queries: cursor.execute("SELECT * FROM t WHERE id = %s", (user_id,))
languages: [python]
severity: ERROR
metadata:
cwe: "CWE-89"
owasp: "A03:2021 - Injection"
Example 2 — Detect Hardcoded AWS Keys
1
2
3
4
5
6
7
8
9
10
11
12
rules:
- id: hardcoded-aws-access-key
patterns:
- pattern-regex: 'AKIA[0-9A-Z]{16}'
message: >
Hardcoded AWS access key detected.
Remove immediately and rotate the key in IAM.
Use environment variables or Secrets Manager instead.
languages: [generic]
severity: ERROR
metadata:
cwe: "CWE-798"
Example 3 — Detect Dangerous eval() in JavaScript
1
2
3
4
5
6
7
8
9
10
rules:
- id: dangerous-eval
patterns:
- pattern: eval($X)
- pattern-not: eval("...") # allow literal strings (low risk)
message: >
eval() called with a non-literal argument — potential code injection.
Avoid eval() entirely; use JSON.parse() for data or refactor the logic.
languages: [javascript, typescript]
severity: WARNING
Example 4 — Detect Insecure subprocess in Python
1
2
3
4
5
6
7
8
9
10
11
rules:
- id: subprocess-shell-true
patterns:
- pattern: subprocess.run($CMD, ..., shell=True, ...)
- pattern: subprocess.call($CMD, ..., shell=True, ...)
- pattern: subprocess.Popen($CMD, ..., shell=True, ...)
message: >
subprocess called with shell=True and a variable command — command injection risk.
Use shell=False and pass a list of arguments instead.
languages: [python]
severity: ERROR
Testing Custom Rules
Semgrep has a built-in test mechanism — write test cases alongside your rules:
1
2
3
4
5
6
7
8
9
10
11
rules:
- id: sql-injection-string-format
# ... rule definition ...
# Test file: test_sql_injection.py
# ruleid: sql-injection-string-format
query = "SELECT * FROM users WHERE id = %s" % user_id
db.execute(query)
# ok: sql-injection-string-format
db.execute("SELECT * FROM users WHERE id = %s", (user_id,))
1
2
# Run tests
semgrep --test .
[SCREENSHOT]— Terminal showing semgrep –test output: “1 passed, 0 failed” confirming the rule correctly catches the bad pattern and ignores the good one
Ignoring False Positives
Suppress a specific finding inline:
1
2
# nosemgrep: sql-injection-string-format
query = build_safe_query(user_id) # this function sanitizes input
Suppress across a file — add to .semgrepignore:
1
2
3
4
5
# .semgrepignore
tests/
vendor/
*.min.js
migrations/
Integrating into GitHub Actions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# .github/workflows/semgrep.yml
name: Semgrep SAST
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: semgrep/semgrep
steps:
- uses: actions/checkout@v4
- name: Run Semgrep
run: |
semgrep scan \
--config p/owasp-top-ten \
--config p/secrets \
--sarif \
--output semgrep.sarif \
--severity ERROR \
--error \
.
env:
SEMGREP_APP_TOKEN: $
- name: Upload SARIF to GitHub Security tab
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: semgrep.sarif
[SCREENSHOT]— GitHub Actions run showing the Semgrep step — green if no errors found, or red with the violation listed in the step output
[SCREENSHOT]— GitHub repository → Security tab → Code scanning alerts showing Semgrep findings with their severity, rule ID, and the file/line they were found in
--error flag: makes Semgrep exit with code 1 when any finding at the specified severity is found — this blocks the PR from merging.
Lab — Scan a Vulnerable App
Objective: Run Semgrep against an intentionally vulnerable application and review findings.
- Clone a vulnerable app:
1 2
git clone https://github.com/juice-shop/juice-shop cd juice-shop - Run the OWASP Top 10 scan:
1
semgrep scan --config p/owasp-top-ten --severity ERROR . 2>/dev/null
[SCREENSHOT]— Semgrep scan output on Juice Shop showing multiple findings — SQL injection, XSS, hardcoded secrets — with file paths and line numbers
- Run the secrets scan:
1
semgrep scan --config p/secrets . 2>/dev/null
[SCREENSHOT]— Semgrep secrets scan output showing any hardcoded credentials or API keys found in the codebase
Pick one finding → look at the code → understand why it’s flagged → write the fix
Write a custom rule for one pattern you notice in the code:
1 2
# Create my-rules.yaml with your rule semgrep scan --config my-rules.yaml .
Key Takeaways
- SAST runs on code — catch vulnerabilities at development time, not production
--config autois your fastest start — Semgrep selects the right rules for your languages- Custom rules are Semgrep’s superpower — model rules on patterns specific to your codebase
- Use
--sarif+ GitHub upload to get persistent findings in the Security tab --errorflag in CI makes Semgrep a hard gate — PRs with critical findings can’t mergenosemgrepinline comments for accepted false positives — don’t suppress whole files
