14.2 Static Analysis of Dependencies¶
Most organizations apply static analysis security testing (SAST) to their own code but ignore the third-party dependencies that often comprise 80% or more of their deployed software. When the ua-parser-js npm package was compromised in October 2021, static analysis could have detected the malicious code patterns—obfuscated functions, suspicious network calls, and unusual file system access. Yet few organizations routinely analyze their dependencies with the same rigor they apply to first-party code.
This section covers applying SAST to third-party dependencies: which tools work, what patterns to detect, and how to manage findings in code you don't own.
SAST Applied to Third-Party Code¶
Static Application Security Testing (SAST) analyzes source code without executing it, identifying patterns associated with vulnerabilities or malicious behavior.
Why Analyze Dependencies:
Traditional vulnerability scanning (SCA) checks if a package version matches known CVEs. SAST goes deeper:
| Approach | What It Finds |
|---|---|
| SCA/CVE scanning | Known, disclosed vulnerabilities |
| SAST on dependencies | Undisclosed vulnerabilities, malicious code, quality issues |
SAST can find: - Zero-day vulnerabilities not yet in databases - Malicious code injected via supply chain attacks - Backdoors and intentionally hidden functionality - Quality issues that may have security implications
Challenges Unique to Third-Party Code:
Analyzing code you didn't write presents specific challenges:
- Volume: Hundreds of dependencies, millions of lines
- Context: You don't understand the intended behavior
- Ownership: You can't fix issues directly
- Updates: Analysis must be repeated as versions change
- Trust calibration: Distinguishing intentional patterns from vulnerabilities
Organizations running SAST tools on dependencies for the first time frequently encounter thousands of findings—an overwhelming volume that demands a fundamentally different approach than first-party code analysis. Techniques that work for 50,000 lines of application code don't scale to millions of lines of third-party dependencies.
Tools for Dependency Static Analysis¶
Several SAST tools work effectively on third-party code.
Semgrep:
Semgrep is a pattern-based static analysis tool particularly suited for dependency scanning.
- Strengths: Fast, low false positives, custom rules, many languages
- Model: Pattern matching with semantic understanding
- Deployment: CLI, CI integration, Semgrep App
# Scan node_modules for suspicious patterns
semgrep --config p/security-audit node_modules/
# Scan with supply chain specific rules
semgrep --config p/supply-chain ./vendor/
Custom Semgrep Rules for Supply Chain:
# Detect obfuscated eval patterns
rules:
- id: obfuscated-eval
pattern-either:
- pattern: eval($FUNC(...))
- pattern: Function(...)(...)
- pattern: (0, eval)(...)
message: Potentially obfuscated code execution
severity: WARNING
languages: [javascript, typescript]
- id: suspicious-network-call
pattern-either:
- pattern: |
fetch("$URL")
- pattern: |
http.get("$URL", ...)
pattern-where:
- metavariable-regex:
metavariable: $URL
regex: '.*pastebin|.*ngrok|.*raw\.githubusercontent.*'
message: Suspicious external network call
severity: ERROR
languages: [javascript, typescript]
CodeQL:
CodeQL provides deep semantic analysis with data flow tracking.
- Strengths: Data flow analysis, taint tracking, comprehensive queries
- Model: Code as data, SQL-like query language
- Deployment: GitHub Advanced Security, CLI
# Create CodeQL database from dependencies
codeql database create deps-db --language=javascript --source-root=node_modules/
# Run security queries
codeql database analyze deps-db javascript-security-and-quality.qls --format=sarif-latest
Custom CodeQL Query:
/**
* Detect suspicious dynamic code execution
*/
import javascript
from CallExpr call, string target
where
call.getCalleeName() = "eval" and
exists(DataFlow::Node source |
source.asExpr() instanceof NetworkRecvExpr and
DataFlow::localFlow(source, DataFlow::exprNode(call.getArgument(0)))
)
select call, "Eval of network-received data - potential RCE"
SonarQube:
SonarQube provides enterprise-grade static analysis with dependency scanning.
- Strengths: Mature platform, quality gates, extensive language support
- Model: Rule-based analysis with quality profiles
- Deployment: Self-hosted, SonarCloud
# Include dependencies in analysis
sonar-scanner \
-Dsonar.projectKey=myproject \
-Dsonar.sources=src,node_modules \
-Dsonar.exclusions=node_modules/**/*.min.js
Tool Comparison:
| Feature | Semgrep | CodeQL | SonarQube |
|---|---|---|---|
| Speed | Fast | Slow (build DB) | Medium |
| Custom rules | Easy (YAML) | Moderate (QL) | Complex |
| Data flow | Limited | Excellent | Good |
| Languages | 30+ | 10+ | 25+ |
| False positives | Low | Low-Medium | Medium |
| Cost | Free (OSS) | Free (public), paid (private) | Free tier, commercial |
Detecting Vulnerable and Malicious Patterns¶
Focus analysis on patterns associated with vulnerabilities and supply chain attacks.
High-Value Detection Patterns:
| Pattern | Risk | Detection Approach |
|---|---|---|
| Obfuscated code | Hiding malicious behavior | Entropy analysis, deobfuscation detection |
| Dynamic code execution | Code injection, RCE | eval, Function constructor, vm.runInContext |
| Unusual network calls | Data exfiltration, C2 | fetch/http to unusual domains |
| File system access | Data theft, persistence | fs operations outside package scope |
| Environment access | Credential theft | process.env access patterns |
| Native code loading | Bypass protections | dlopen, require with computed paths |
JavaScript/Node.js Patterns:
# Supply chain attack indicators
rules:
- id: preinstall-script-network
patterns:
- pattern-inside: |
{"scripts": {"preinstall": "$SCRIPT", ...}}
- metavariable-regex:
metavariable: $SCRIPT
regex: '.*(curl|wget|node -e|fetch).*'
paths:
include:
- package.json
message: Preinstall script with network activity
severity: ERROR
- id: encoded-payload
pattern-regex: '(eval|Function)\s*\(\s*(atob|Buffer\.from)\s*\('
message: Execution of decoded payload
severity: ERROR
languages: [javascript]
Python Patterns:
rules:
- id: setup-py-code-execution
patterns:
- pattern-inside: |
class $CLASS(setup):
...
- pattern-either:
- pattern: subprocess.call(...)
- pattern: os.system(...)
- pattern: urllib.request.urlopen(...)
paths:
include:
- setup.py
message: Code execution in setup.py
severity: WARNING
languages: [python]
Managing False Positives¶
Third-party code generates significant false positives. Effective management is essential.
False Positive Sources:
- Intentional patterns: Test code, examples, documentation
- Legitimate functionality: Security tools, parsers, interpreters
- Context missing: Analyzer lacks understanding of intended behavior
- Vendored code: Dependencies of dependencies
Management Strategies:
1. Scope Targeting:
Don't scan everything—focus on high-risk areas:
# Scan only production dependencies
semgrep --config p/security-audit \
--include="node_modules/*" \
--exclude="node_modules/*/test/*" \
--exclude="node_modules/*/examples/*" \
--exclude="node_modules/*/*.test.js"
2. Rule Tuning:
Disable rules with high false positive rates for dependencies:
# .semgrep.yml
rules:
- id: generic-code-smell
severity: INFO # Downgrade, don't alert
- id: supply-chain-specific
severity: ERROR # Keep high priority
3. Baseline and Diff:
Only alert on new findings:
# Create baseline of existing findings
semgrep --config p/security-audit node_modules/ --json > baseline.json
# Scan for new findings only
semgrep --config p/security-audit node_modules/ --baseline-file baseline.json
4. Allowlisting:
Document and allow known patterns:
# semgrep-allowlist.yml
rules:
- id: eval-usage
paths:
exclude:
- node_modules/vm2/* # VM library - eval is intentional
- node_modules/jsdom/* # Browser simulation
5. Finding Prioritization:
Rank findings by exploitability:
| Priority | Criteria |
|---|---|
| P1 | Install-time execution (preinstall scripts) |
| P2 | Runtime code execution with external input |
| P3 | Suspicious patterns in actively used code |
| P4 | Patterns in test/example code |
CI/CD Integration¶
Integrate dependency SAST into automated pipelines for continuous protection.
Integration Patterns:
Pre-Merge Scanning:
# GitHub Actions
name: Dependency SAST
on: [pull_request]
jobs:
sast:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: npm ci
- name: Run Semgrep on dependencies
uses: returntocorp/semgrep-action@v1
with:
config: p/supply-chain
generateSarif: true
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: semgrep.sarif
Scheduled Deep Scanning:
# Weekly comprehensive scan
name: Weekly Dependency Audit
on:
schedule:
- cron: '0 2 * * 0' # Sunday 2 AM
jobs:
deep-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: npm ci
- name: Create CodeQL database
run: |
codeql database create deps-db \
--language=javascript \
--source-root=node_modules
- name: Run comprehensive queries
run: |
codeql database analyze deps-db \
javascript-security-extended.qls \
--format=sarif-latest \
--output=results.sarif
Gate Configuration:
Define what blocks merges versus what generates warnings:
# Block on high-severity supply chain indicators
policy:
fail_on:
- severity: ERROR
category: supply-chain
warn_on:
- severity: WARNING
category: supply-chain
Triaging Third-Party Findings¶
Findings in dependencies require different triage than first-party code.
Triage Workflow:
Finding Detected
│
▼
┌─────────────────────┐
│ Is this package │──No──▶ Lower priority
│ in production path? │
└──────────┬──────────┘
│ Yes
▼
┌─────────────────────┐
│ Is finding in code │──No──▶ Flag but don't block
│ you actually use? │
└──────────┬──────────┘
│ Yes
▼
┌─────────────────────┐
│ Is this a known │──Yes─▶ Document, accept
│ intentional pattern?│
└──────────┬──────────┘
│ No
▼
┌─────────────────────┐
│ Investigate: │
│ - Review code context│
│ - Check maintainer │
│ - Search for reports│
└──────────┬──────────┘
│
▼
Action Decision
(Report/Replace/Accept)
Action Options:
| Finding Type | Recommended Action |
|---|---|
| Likely malicious | Remove immediately, report to registry |
| Probable vulnerability | Report upstream, consider fork/patch |
| Quality issue | Document, monitor, consider alternatives |
| False positive | Add to allowlist with justification |
Documentation Requirements:
# SAST Finding: semgrep/supply-chain/obfuscated-eval
**Package**: suspicious-package@1.2.3
**Finding**: Obfuscated eval in lib/parser.js:142
**Assessment**: False positive - intentional VM isolation
**Evidence**: This is vm2 library, eval is core functionality
**Decision**: Allowlist
**Reviewer**: @security-team
**Date**: 2024-01-15
Performance Optimization¶
Scanning dependencies is resource-intensive. Optimize for practical execution.
Optimization Techniques:
1. Incremental Scanning:
Only scan changed dependencies:
# Generate dependency diff
diff <(git show HEAD~1:package-lock.json | jq '.dependencies | keys[]') \
<(cat package-lock.json | jq '.dependencies | keys[]') > changed-deps.txt
# Scan only changed packages
while read pkg; do
semgrep --config p/supply-chain "node_modules/$pkg/"
done < changed-deps.txt
2. Tiered Analysis:
Apply different analysis depth based on risk:
| Tier | Packages | Analysis |
|---|---|---|
| Deep | Security-sensitive, new packages | Full CodeQL analysis |
| Standard | Production dependencies | Semgrep supply-chain rules |
| Light | Dev dependencies, established packages | Critical patterns only |
3. Caching:
Cache analysis results:
# Cache CodeQL databases
- uses: actions/cache@v3
with:
path: ~/.codeql
key: codeql-deps-${{ hashFiles('package-lock.json') }}
4. Parallel Execution:
# Parallel scanning with GNU parallel
find node_modules -maxdepth 1 -type d | \
parallel -j4 'semgrep --config p/supply-chain {}' > results.txt
Recommendations¶
For Security Practitioners:
-
Start with targeted rules. Don't run full SAST rule sets on dependencies. Focus on supply chain-specific patterns—obfuscation, exfiltration, install-time execution.
-
Establish baselines. Create baselines for existing dependencies to focus on new findings and changes.
-
Integrate with SCA. SAST complements SCA (CVE scanning). Use both—SCA for known issues, SAST for unknown patterns.
For Developers:
-
Scan before adoption. Run SAST on new dependencies before adding them. A few minutes of analysis can prevent major incidents.
-
Review install scripts. Pay special attention to preinstall/postinstall scripts—these execute with your privileges during npm install.
-
Question obfuscation. Legitimate packages rarely need obfuscated code. Treat obfuscation as a red flag requiring explanation.
For Organizations:
-
Automate in CI/CD. Make dependency SAST part of your pipeline. Don't rely on periodic manual scans.
-
Define escalation paths. When SAST finds suspicious patterns, have clear processes for investigation and response.
-
Contribute rules upstream. When you develop effective detection patterns, share them with the community. Supply chain security is a collective effort.
Static analysis of dependencies fills a critical gap between CVE scanning and blind trust. While it requires careful tuning and triage processes, SAST can detect malicious code and undisclosed vulnerabilities that other approaches miss. In an era of increasingly sophisticated supply chain attacks, analyzing the code you ship—regardless of who wrote it—is essential due diligence.