14.2 Static Analysis of Dependencies¶

Most organizations apply static analysis security testing (SAST) to their own code but ignore the third-party dependencies that often comprise 80% or more of their deployed software. When the ua-parser-js npm package was compromised in October 2021, static analysis could have detected the malicious code patterns—obfuscated functions, suspicious network calls, and unusual file system access. Yet few organizations routinely analyze their dependencies with the same rigor they apply to first-party code.

This section covers applying SAST to third-party dependencies: which tools work, what patterns to detect, and how to manage findings in code you don't own.

SAST Applied to Third-Party Code¶

Static Application Security Testing (SAST) analyzes source code without executing it, identifying patterns associated with vulnerabilities or malicious behavior.

Why Analyze Dependencies:

Traditional vulnerability scanning (SCA) checks if a package version matches known CVEs. SAST goes deeper:

Approach	What It Finds
SCA/CVE scanning	Known, disclosed vulnerabilities
SAST on dependencies	Undisclosed vulnerabilities, malicious code, quality issues

SAST can find: - Zero-day vulnerabilities not yet in databases - Malicious code injected via supply chain attacks - Backdoors and intentionally hidden functionality - Quality issues that may have security implications

Challenges Unique to Third-Party Code:

Analyzing code you didn't write presents specific challenges:

Volume: Hundreds of dependencies, millions of lines
Context: You don't understand the intended behavior
Ownership: You can't fix issues directly
Updates: Analysis must be repeated as versions change
Trust calibration: Distinguishing intentional patterns from vulnerabilities

Organizations running SAST tools on dependencies for the first time frequently encounter thousands of findings—an overwhelming volume that demands a fundamentally different approach than first-party code analysis. Techniques that work for 50,000 lines of application code don't scale to millions of lines of third-party dependencies.

Tools for Dependency Static Analysis¶

Several SAST tools work effectively on third-party code.

Semgrep:

Semgrep is a pattern-based static analysis tool particularly suited for dependency scanning.

Strengths: Fast, low false positives, custom rules, many languages
Model: Pattern matching with semantic understanding
Deployment: CLI, CI integration, Semgrep App

# Scan node_modules for suspicious patterns
semgrep --config p/security-audit node_modules/

# Scan with supply chain specific rules
semgrep --config p/supply-chain ./vendor/

Custom Semgrep Rules for Supply Chain:

# Detect obfuscated eval patterns
rules:
  - id: obfuscated-eval
    pattern-either:
      - pattern: eval($FUNC(...))
      - pattern: Function(...)(...)
      - pattern: (0, eval)(...)
    message: Potentially obfuscated code execution
    severity: WARNING
    languages: [javascript, typescript]

  - id: suspicious-network-call
    pattern-either:
      - pattern: |
          fetch("$URL")
      - pattern: |
          http.get("$URL", ...)
    pattern-where:
      - metavariable-regex:
          metavariable: $URL
          regex: '.*pastebin|.*ngrok|.*raw\.githubusercontent.*'
    message: Suspicious external network call
    severity: ERROR
    languages: [javascript, typescript]

CodeQL:

CodeQL provides deep semantic analysis with data flow tracking.

Strengths: Data flow analysis, taint tracking, comprehensive queries
Model: Code as data, SQL-like query language
Deployment: GitHub Advanced Security, CLI

# Create CodeQL database from dependencies
codeql database create deps-db --language=javascript --source-root=node_modules/

# Run security queries
codeql database analyze deps-db javascript-security-and-quality.qls --format=sarif-latest

Custom CodeQL Query:

/**
 * Detect suspicious dynamic code execution
 */
import javascript

from CallExpr call, string target
where
  call.getCalleeName() = "eval" and
  exists(DataFlow::Node source |
    source.asExpr() instanceof NetworkRecvExpr and
    DataFlow::localFlow(source, DataFlow::exprNode(call.getArgument(0)))
  )
select call, "Eval of network-received data - potential RCE"

SonarQube:

SonarQube provides enterprise-grade static analysis with dependency scanning.

Strengths: Mature platform, quality gates, extensive language support
Model: Rule-based analysis with quality profiles
Deployment: Self-hosted, SonarCloud

# Include dependencies in analysis
sonar-scanner \
  -Dsonar.projectKey=myproject \
  -Dsonar.sources=src,node_modules \
  -Dsonar.exclusions=node_modules/**/*.min.js

Tool Comparison:

Feature	Semgrep	CodeQL	SonarQube
Speed	Fast	Slow (build DB)	Medium
Custom rules	Easy (YAML)	Moderate (QL)	Complex
Data flow	Limited	Excellent	Good
Languages	30+	10+	25+
False positives	Low	Low-Medium	Medium
Cost	Free (OSS)	Free (public), paid (private)	Free tier, commercial

Detecting Vulnerable and Malicious Patterns¶

Focus analysis on patterns associated with vulnerabilities and supply chain attacks.

High-Value Detection Patterns:

Pattern	Risk	Detection Approach
Obfuscated code	Hiding malicious behavior	Entropy analysis, deobfuscation detection
Dynamic code execution	Code injection, RCE	eval, Function constructor, vm.runInContext
Unusual network calls	Data exfiltration, C2	fetch/http to unusual domains
File system access	Data theft, persistence	fs operations outside package scope
Environment access	Credential theft	process.env access patterns
Native code loading	Bypass protections	dlopen, require with computed paths

JavaScript/Node.js Patterns:

# Supply chain attack indicators
rules:
  - id: preinstall-script-network
    patterns:
      - pattern-inside: |
          {"scripts": {"preinstall": "$SCRIPT", ...}}
      - metavariable-regex:
          metavariable: $SCRIPT
          regex: '.*(curl|wget|node -e|fetch).*'
    paths:
      include:
        - package.json
    message: Preinstall script with network activity
    severity: ERROR

  - id: encoded-payload
    pattern-regex: '(eval|Function)\s*\(\s*(atob|Buffer\.from)\s*\('
    message: Execution of decoded payload
    severity: ERROR
    languages: [javascript]

Python Patterns:

rules:
  - id: setup-py-code-execution
    patterns:
      - pattern-inside: |
          class $CLASS(setup):
            ...
      - pattern-either:
          - pattern: subprocess.call(...)
          - pattern: os.system(...)
          - pattern: urllib.request.urlopen(...)
    paths:
      include:
        - setup.py
    message: Code execution in setup.py
    severity: WARNING
    languages: [python]

Managing False Positives¶

Third-party code generates significant false positives. Effective management is essential.

False Positive Sources:

Intentional patterns: Test code, examples, documentation
Legitimate functionality: Security tools, parsers, interpreters
Context missing: Analyzer lacks understanding of intended behavior
Vendored code: Dependencies of dependencies

Management Strategies:

1. Scope Targeting:

Don't scan everything—focus on high-risk areas:

# Scan only production dependencies
semgrep --config p/security-audit \
  --include="node_modules/*" \
  --exclude="node_modules/*/test/*" \
  --exclude="node_modules/*/examples/*" \
  --exclude="node_modules/*/*.test.js"

2. Rule Tuning:

Disable rules with high false positive rates for dependencies:

# .semgrep.yml
rules:
  - id: generic-code-smell
    severity: INFO  # Downgrade, don't alert

  - id: supply-chain-specific
    severity: ERROR  # Keep high priority

3. Baseline and Diff:

Only alert on new findings:

# Create baseline of existing findings
semgrep --config p/security-audit node_modules/ --json > baseline.json

# Scan for new findings only
semgrep --config p/security-audit node_modules/ --baseline-file baseline.json

4. Allowlisting:

Document and allow known patterns:

# semgrep-allowlist.yml
rules:
  - id: eval-usage
    paths:
      exclude:
        - node_modules/vm2/*  # VM library - eval is intentional
        - node_modules/jsdom/*  # Browser simulation

5. Finding Prioritization:

Rank findings by exploitability:

Priority	Criteria
P1	Install-time execution (preinstall scripts)
P2	Runtime code execution with external input
P3	Suspicious patterns in actively used code
P4	Patterns in test/example code

CI/CD Integration¶

Integrate dependency SAST into automated pipelines for continuous protection.

Integration Patterns:

Pre-Merge Scanning:

# GitHub Actions
name: Dependency SAST
on: [pull_request]

jobs:
  sast:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: npm ci

      - name: Run Semgrep on dependencies
        uses: returntocorp/semgrep-action@v1
        with:
          config: p/supply-chain
          generateSarif: true

      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: semgrep.sarif

Scheduled Deep Scanning:

# Weekly comprehensive scan
name: Weekly Dependency Audit
on:
  schedule:
    - cron: '0 2 * * 0'  # Sunday 2 AM

jobs:
  deep-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: npm ci

      - name: Create CodeQL database
        run: |
          codeql database create deps-db \
            --language=javascript \
            --source-root=node_modules

      - name: Run comprehensive queries
        run: |
          codeql database analyze deps-db \
            javascript-security-extended.qls \
            --format=sarif-latest \
            --output=results.sarif

Gate Configuration:

Define what blocks merges versus what generates warnings:

# Block on high-severity supply chain indicators
policy:
  fail_on:
    - severity: ERROR
      category: supply-chain
  warn_on:
    - severity: WARNING
      category: supply-chain

Triaging Third-Party Findings¶

Findings in dependencies require different triage than first-party code.

Triage Workflow:

Finding Detected
       │
       ▼
┌─────────────────────┐
│ Is this package     │──No──▶ Lower priority
│ in production path? │
└──────────┬──────────┘
           │ Yes
           ▼
┌─────────────────────┐
│ Is finding in code  │──No──▶ Flag but don't block
│ you actually use?   │
└──────────┬──────────┘
           │ Yes
           ▼
┌─────────────────────┐
│ Is this a known     │──Yes─▶ Document, accept
│ intentional pattern?│
└──────────┬──────────┘
           │ No
           ▼
┌─────────────────────┐
│ Investigate:        │
│ - Review code context│
│ - Check maintainer  │
│ - Search for reports│
└──────────┬──────────┘
           │
           ▼
    Action Decision
    (Report/Replace/Accept)

Action Options:

Finding Type	Recommended Action
Likely malicious	Remove immediately, report to registry
Probable vulnerability	Report upstream, consider fork/patch
Quality issue	Document, monitor, consider alternatives
False positive	Add to allowlist with justification

Documentation Requirements:

# SAST Finding: semgrep/supply-chain/obfuscated-eval

**Package**: suspicious-package@1.2.3
**Finding**: Obfuscated eval in lib/parser.js:142
**Assessment**: False positive - intentional VM isolation
**Evidence**: This is vm2 library, eval is core functionality
**Decision**: Allowlist
**Reviewer**: @security-team
**Date**: 2024-01-15

Performance Optimization¶

Scanning dependencies is resource-intensive. Optimize for practical execution.

Optimization Techniques:

1. Incremental Scanning:

Only scan changed dependencies:

# Generate dependency diff
diff <(git show HEAD~1:package-lock.json | jq '.dependencies | keys[]') \
     <(cat package-lock.json | jq '.dependencies | keys[]') > changed-deps.txt

# Scan only changed packages
while read pkg; do
  semgrep --config p/supply-chain "node_modules/$pkg/"
done < changed-deps.txt

2. Tiered Analysis:

Apply different analysis depth based on risk:

Tier	Packages	Analysis
Deep	Security-sensitive, new packages	Full CodeQL analysis
Standard	Production dependencies	Semgrep supply-chain rules
Light	Dev dependencies, established packages	Critical patterns only

3. Caching:

Cache analysis results:

# Cache CodeQL databases
- uses: actions/cache@v3
  with:
    path: ~/.codeql
    key: codeql-deps-${{ hashFiles('package-lock.json') }}

4. Parallel Execution:

# Parallel scanning with GNU parallel
find node_modules -maxdepth 1 -type d | \
  parallel -j4 'semgrep --config p/supply-chain {}' > results.txt

Recommendations¶

For Security Practitioners:

Start with targeted rules. Don't run full SAST rule sets on dependencies. Focus on supply chain-specific patterns—obfuscation, exfiltration, install-time execution.
Establish baselines. Create baselines for existing dependencies to focus on new findings and changes.
Integrate with SCA. SAST complements SCA (CVE scanning). Use both—SCA for known issues, SAST for unknown patterns.

For Developers:

Scan before adoption. Run SAST on new dependencies before adding them. A few minutes of analysis can prevent major incidents.
Review install scripts. Pay special attention to preinstall/postinstall scripts—these execute with your privileges during npm install.
Question obfuscation. Legitimate packages rarely need obfuscated code. Treat obfuscation as a red flag requiring explanation.

For Organizations:

Automate in CI/CD. Make dependency SAST part of your pipeline. Don't rely on periodic manual scans.
Define escalation paths. When SAST finds suspicious patterns, have clear processes for investigation and response.
Contribute rules upstream. When you develop effective detection patterns, share them with the community. Supply chain security is a collective effort.

Static analysis of dependencies fills a critical gap between CVE scanning and blind trust. While it requires careful tuning and triage processes, SAST can detect malicious code and undisclosed vulnerabilities that other approaches miss. In an era of increasingly sophisticated supply chain attacks, analyzing the code you ship—regardless of who wrote it—is essential due diligence.