Skip to content

14.2 Static Analysis of Dependencies

Most organizations apply static analysis security testing (SAST) to their own code but ignore the third-party dependencies that often comprise 80% or more of their deployed software. When the ua-parser-js npm package was compromised in October 2021, static analysis could have detected the malicious code patterns—obfuscated functions, suspicious network calls, and unusual file system access. Yet few organizations routinely analyze their dependencies with the same rigor they apply to first-party code.

This section covers applying SAST to third-party dependencies: which tools work, what patterns to detect, and how to manage findings in code you don't own.

SAST Applied to Third-Party Code

Static Application Security Testing (SAST) analyzes source code without executing it, identifying patterns associated with vulnerabilities or malicious behavior.

Why Analyze Dependencies:

Traditional vulnerability scanning (SCA) checks if a package version matches known CVEs. SAST goes deeper:

Approach What It Finds
SCA/CVE scanning Known, disclosed vulnerabilities
SAST on dependencies Undisclosed vulnerabilities, malicious code, quality issues

SAST can find: - Zero-day vulnerabilities not yet in databases - Malicious code injected via supply chain attacks - Backdoors and intentionally hidden functionality - Quality issues that may have security implications

Challenges Unique to Third-Party Code:

Analyzing code you didn't write presents specific challenges:

  • Volume: Hundreds of dependencies, millions of lines
  • Context: You don't understand the intended behavior
  • Ownership: You can't fix issues directly
  • Updates: Analysis must be repeated as versions change
  • Trust calibration: Distinguishing intentional patterns from vulnerabilities

Organizations running SAST tools on dependencies for the first time frequently encounter thousands of findings—an overwhelming volume that demands a fundamentally different approach than first-party code analysis. Techniques that work for 50,000 lines of application code don't scale to millions of lines of third-party dependencies.

Tools for Dependency Static Analysis

Several SAST tools work effectively on third-party code.

Semgrep:

Semgrep is a pattern-based static analysis tool particularly suited for dependency scanning.

  • Strengths: Fast, low false positives, custom rules, many languages
  • Model: Pattern matching with semantic understanding
  • Deployment: CLI, CI integration, Semgrep App
# Scan node_modules for suspicious patterns
semgrep --config p/security-audit node_modules/

# Scan with supply chain specific rules
semgrep --config p/supply-chain ./vendor/

Custom Semgrep Rules for Supply Chain:

# Detect obfuscated eval patterns
rules:
  - id: obfuscated-eval
    pattern-either:
      - pattern: eval($FUNC(...))
      - pattern: Function(...)(...)
      - pattern: (0, eval)(...)
    message: Potentially obfuscated code execution
    severity: WARNING
    languages: [javascript, typescript]

  - id: suspicious-network-call
    pattern-either:
      - pattern: |
          fetch("$URL")
      - pattern: |
          http.get("$URL", ...)
    pattern-where:
      - metavariable-regex:
          metavariable: $URL
          regex: '.*pastebin|.*ngrok|.*raw\.githubusercontent.*'
    message: Suspicious external network call
    severity: ERROR
    languages: [javascript, typescript]

CodeQL:

CodeQL provides deep semantic analysis with data flow tracking.

  • Strengths: Data flow analysis, taint tracking, comprehensive queries
  • Model: Code as data, SQL-like query language
  • Deployment: GitHub Advanced Security, CLI
# Create CodeQL database from dependencies
codeql database create deps-db --language=javascript --source-root=node_modules/

# Run security queries
codeql database analyze deps-db javascript-security-and-quality.qls --format=sarif-latest

Custom CodeQL Query:

/**
 * Detect suspicious dynamic code execution
 */
import javascript

from CallExpr call, string target
where
  call.getCalleeName() = "eval" and
  exists(DataFlow::Node source |
    source.asExpr() instanceof NetworkRecvExpr and
    DataFlow::localFlow(source, DataFlow::exprNode(call.getArgument(0)))
  )
select call, "Eval of network-received data - potential RCE"

SonarQube:

SonarQube provides enterprise-grade static analysis with dependency scanning.

  • Strengths: Mature platform, quality gates, extensive language support
  • Model: Rule-based analysis with quality profiles
  • Deployment: Self-hosted, SonarCloud
# Include dependencies in analysis
sonar-scanner \
  -Dsonar.projectKey=myproject \
  -Dsonar.sources=src,node_modules \
  -Dsonar.exclusions=node_modules/**/*.min.js

Tool Comparison:

Feature Semgrep CodeQL SonarQube
Speed Fast Slow (build DB) Medium
Custom rules Easy (YAML) Moderate (QL) Complex
Data flow Limited Excellent Good
Languages 30+ 10+ 25+
False positives Low Low-Medium Medium
Cost Free (OSS) Free (public), paid (private) Free tier, commercial

Detecting Vulnerable and Malicious Patterns

Focus analysis on patterns associated with vulnerabilities and supply chain attacks.

High-Value Detection Patterns:

Pattern Risk Detection Approach
Obfuscated code Hiding malicious behavior Entropy analysis, deobfuscation detection
Dynamic code execution Code injection, RCE eval, Function constructor, vm.runInContext
Unusual network calls Data exfiltration, C2 fetch/http to unusual domains
File system access Data theft, persistence fs operations outside package scope
Environment access Credential theft process.env access patterns
Native code loading Bypass protections dlopen, require with computed paths

JavaScript/Node.js Patterns:

# Supply chain attack indicators
rules:
  - id: preinstall-script-network
    patterns:
      - pattern-inside: |
          {"scripts": {"preinstall": "$SCRIPT", ...}}
      - metavariable-regex:
          metavariable: $SCRIPT
          regex: '.*(curl|wget|node -e|fetch).*'
    paths:
      include:
        - package.json
    message: Preinstall script with network activity
    severity: ERROR

  - id: encoded-payload
    pattern-regex: '(eval|Function)\s*\(\s*(atob|Buffer\.from)\s*\('
    message: Execution of decoded payload
    severity: ERROR
    languages: [javascript]

Python Patterns:

rules:
  - id: setup-py-code-execution
    patterns:
      - pattern-inside: |
          class $CLASS(setup):
            ...
      - pattern-either:
          - pattern: subprocess.call(...)
          - pattern: os.system(...)
          - pattern: urllib.request.urlopen(...)
    paths:
      include:
        - setup.py
    message: Code execution in setup.py
    severity: WARNING
    languages: [python]

Managing False Positives

Third-party code generates significant false positives. Effective management is essential.

False Positive Sources:

  • Intentional patterns: Test code, examples, documentation
  • Legitimate functionality: Security tools, parsers, interpreters
  • Context missing: Analyzer lacks understanding of intended behavior
  • Vendored code: Dependencies of dependencies

Management Strategies:

1. Scope Targeting:

Don't scan everything—focus on high-risk areas:

# Scan only production dependencies
semgrep --config p/security-audit \
  --include="node_modules/*" \
  --exclude="node_modules/*/test/*" \
  --exclude="node_modules/*/examples/*" \
  --exclude="node_modules/*/*.test.js"

2. Rule Tuning:

Disable rules with high false positive rates for dependencies:

# .semgrep.yml
rules:
  - id: generic-code-smell
    severity: INFO  # Downgrade, don't alert

  - id: supply-chain-specific
    severity: ERROR  # Keep high priority

3. Baseline and Diff:

Only alert on new findings:

# Create baseline of existing findings
semgrep --config p/security-audit node_modules/ --json > baseline.json

# Scan for new findings only
semgrep --config p/security-audit node_modules/ --baseline-file baseline.json

4. Allowlisting:

Document and allow known patterns:

# semgrep-allowlist.yml
rules:
  - id: eval-usage
    paths:
      exclude:
        - node_modules/vm2/*  # VM library - eval is intentional
        - node_modules/jsdom/*  # Browser simulation

5. Finding Prioritization:

Rank findings by exploitability:

Priority Criteria
P1 Install-time execution (preinstall scripts)
P2 Runtime code execution with external input
P3 Suspicious patterns in actively used code
P4 Patterns in test/example code

CI/CD Integration

Integrate dependency SAST into automated pipelines for continuous protection.

Integration Patterns:

Pre-Merge Scanning:

# GitHub Actions
name: Dependency SAST
on: [pull_request]

jobs:
  sast:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: npm ci

      - name: Run Semgrep on dependencies
        uses: returntocorp/semgrep-action@v1
        with:
          config: p/supply-chain
          generateSarif: true

      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: semgrep.sarif

Scheduled Deep Scanning:

# Weekly comprehensive scan
name: Weekly Dependency Audit
on:
  schedule:
    - cron: '0 2 * * 0'  # Sunday 2 AM

jobs:
  deep-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: npm ci

      - name: Create CodeQL database
        run: |
          codeql database create deps-db \
            --language=javascript \
            --source-root=node_modules

      - name: Run comprehensive queries
        run: |
          codeql database analyze deps-db \
            javascript-security-extended.qls \
            --format=sarif-latest \
            --output=results.sarif

Gate Configuration:

Define what blocks merges versus what generates warnings:

# Block on high-severity supply chain indicators
policy:
  fail_on:
    - severity: ERROR
      category: supply-chain
  warn_on:
    - severity: WARNING
      category: supply-chain

Triaging Third-Party Findings

Findings in dependencies require different triage than first-party code.

Triage Workflow:

Finding Detected
┌─────────────────────┐
│ Is this package     │──No──▶ Lower priority
│ in production path? │
└──────────┬──────────┘
           │ Yes
┌─────────────────────┐
│ Is finding in code  │──No──▶ Flag but don't block
│ you actually use?   │
└──────────┬──────────┘
           │ Yes
┌─────────────────────┐
│ Is this a known     │──Yes─▶ Document, accept
│ intentional pattern?│
└──────────┬──────────┘
           │ No
┌─────────────────────┐
│ Investigate:        │
│ - Review code context│
│ - Check maintainer  │
│ - Search for reports│
└──────────┬──────────┘
    Action Decision
    (Report/Replace/Accept)

Action Options:

Finding Type Recommended Action
Likely malicious Remove immediately, report to registry
Probable vulnerability Report upstream, consider fork/patch
Quality issue Document, monitor, consider alternatives
False positive Add to allowlist with justification

Documentation Requirements:

# SAST Finding: semgrep/supply-chain/obfuscated-eval

**Package**: suspicious-package@1.2.3
**Finding**: Obfuscated eval in lib/parser.js:142
**Assessment**: False positive - intentional VM isolation
**Evidence**: This is vm2 library, eval is core functionality
**Decision**: Allowlist
**Reviewer**: @security-team
**Date**: 2024-01-15

Performance Optimization

Scanning dependencies is resource-intensive. Optimize for practical execution.

Optimization Techniques:

1. Incremental Scanning:

Only scan changed dependencies:

# Generate dependency diff
diff <(git show HEAD~1:package-lock.json | jq '.dependencies | keys[]') \
     <(cat package-lock.json | jq '.dependencies | keys[]') > changed-deps.txt

# Scan only changed packages
while read pkg; do
  semgrep --config p/supply-chain "node_modules/$pkg/"
done < changed-deps.txt

2. Tiered Analysis:

Apply different analysis depth based on risk:

Tier Packages Analysis
Deep Security-sensitive, new packages Full CodeQL analysis
Standard Production dependencies Semgrep supply-chain rules
Light Dev dependencies, established packages Critical patterns only

3. Caching:

Cache analysis results:

# Cache CodeQL databases
- uses: actions/cache@v3
  with:
    path: ~/.codeql
    key: codeql-deps-${{ hashFiles('package-lock.json') }}

4. Parallel Execution:

# Parallel scanning with GNU parallel
find node_modules -maxdepth 1 -type d | \
  parallel -j4 'semgrep --config p/supply-chain {}' > results.txt

Recommendations

For Security Practitioners:

  1. Start with targeted rules. Don't run full SAST rule sets on dependencies. Focus on supply chain-specific patterns—obfuscation, exfiltration, install-time execution.

  2. Establish baselines. Create baselines for existing dependencies to focus on new findings and changes.

  3. Integrate with SCA. SAST complements SCA (CVE scanning). Use both—SCA for known issues, SAST for unknown patterns.

For Developers:

  1. Scan before adoption. Run SAST on new dependencies before adding them. A few minutes of analysis can prevent major incidents.

  2. Review install scripts. Pay special attention to preinstall/postinstall scripts—these execute with your privileges during npm install.

  3. Question obfuscation. Legitimate packages rarely need obfuscated code. Treat obfuscation as a red flag requiring explanation.

For Organizations:

  1. Automate in CI/CD. Make dependency SAST part of your pipeline. Don't rely on periodic manual scans.

  2. Define escalation paths. When SAST finds suspicious patterns, have clear processes for investigation and response.

  3. Contribute rules upstream. When you develop effective detection patterns, share them with the community. Supply chain security is a collective effort.

Static analysis of dependencies fills a critical gap between CVE scanning and blind trust. While it requires careful tuning and triage processes, SAST can detect malicious code and undisclosed vulnerabilities that other approaches miss. In an era of increasingly sophisticated supply chain attacks, analyzing the code you ship—regardless of who wrote it—is essential due diligence.