Skip to content

12.4 Vulnerability Databases and Their Gaps

Software composition analysis tools are only as good as the vulnerability data they reference. When you scan your dependencies for known vulnerabilities, you're querying databases that catalog security flaws—but these databases have significant limitations. Vulnerabilities may not have identifiers assigned for weeks or months. Severity scores may be incorrect or missing. Some flaws never receive formal identifiers at all. Understanding these gaps is essential for interpreting scan results and building effective vulnerability management programs.

This section surveys the vulnerability database landscape, examining how vulnerability information is collected, enriched, and distributed—and where the system falls short.

The CVE System

The Common Vulnerabilities and Exposures (CVE) system provides standardized identifiers for publicly known security vulnerabilities. Maintained by MITRE Corporation under funding from CISA, CVE has been the foundation of vulnerability management since 1999.

How CVE Works:

  1. Discovery: Someone identifies a security vulnerability
  2. Request: A CVE Numbering Authority (CNA) is contacted to assign an identifier
  3. Assignment: CNA assigns a CVE ID (format: CVE-YYYY-NNNNN)
  4. Publication: After coordinated disclosure, CVE details are published
  5. Reference: Security tools, advisories, and patches reference the CVE ID

CVE Numbering Authorities (CNAs):

CNAs are organizations authorized to assign CVE IDs:

  • Root CNAs: MITRE, CISA (coordinate CNA program)
  • Vendor CNAs: Microsoft, Google, Red Hat, Apache (assign for their own products)
  • Third-party CNAs: GitHub, Snyk, CERT/CC (assign for others' products)
  • Scope: Each CNA has defined scope for what they can assign

As of 2024, there are over 400 CNAs from 40 countries across different sectors.

CVE Limitations:

Assignment Delays:

The time from vulnerability discovery to CVE assignment varies significantly:

  • Well-resourced projects: Days to weeks
  • Smaller projects: Weeks to months
  • Disputed vulnerabilities: May never receive CVE

Research has found median assignment times of approximately 35-50 days for many vulnerability types.1

Coverage Gaps:

Not all vulnerabilities receive CVEs:

  • Some maintainers fix vulnerabilities silently without requesting CVEs
  • Vulnerabilities in abandoned projects may never be reported
  • Malware and malicious packages typically don't receive CVEs (they're not "vulnerabilities")
  • Quality issues and design flaws may not qualify

Process Bottlenecks:

  • CNAs may have backlogs
  • Disputes between reporters and vendors can delay assignment
  • Coordinated disclosure timelines don't always align with CNA capacity
  • Some products lack clear CNA coverage

Security researchers frequently report significant delays between vulnerability discovery and CVE assignment, with gaps of several months leaving organizations unable to systematically scan for known issues during critical windows.

The National Vulnerability Database (NVD)

The National Vulnerability Database (NVD), maintained by NIST, enriches CVE records with additional analysis and metadata.

NVD Enrichment:

When CVE publishes a vulnerability, it contains basic information. NVD adds:

  • CVSS scores: Severity ratings using Common Vulnerability Scoring System
  • CPE identifiers: Common Platform Enumeration for affected products
  • CWE classification: Common Weakness Enumeration categories
  • References: Additional links and technical details

The NVD Enrichment Crisis:

In early 2024, NVD enrichment essentially stopped:2

  • NIST reduced processing capacity dramatically
  • Backlog grew to tens of thousands of unenriched CVEs
  • Many CVEs had no CVSS scores or CPE mappings for months
  • Security tools depending on NVD data became incomplete

This crisis highlighted dangerous over-reliance on a single government resource.

Historical Delay Data:

Even before the 2024 crisis, NVD enrichment lagged:

Year Median Enrichment Delay 90th Percentile
2021 7 days 30 days
2022 14 days 45 days
2023 21 days 60+ days
2024 Months (backlog) Indefinite

CPE Mapping Challenges:

Common Platform Enumeration (CPE) identifiers specify which products are affected. CPE mapping is error-prone:

  • Package names don't always match CPE naming conventions
  • Multiple CPEs may apply to one vulnerability
  • CPE may be too broad (false positives) or too narrow (false negatives)
  • Open source packages often have inconsistent CPE assignment

Studies have found CPE mapping errors affect 15-25% of NVD records.3

OSV: Open Source Vulnerabilities

OSV (Open Source Vulnerabilities) emerged from Google as an ecosystem-focused alternative to the CVE/NVD system.

OSV Design Philosophy:

  • Ecosystem-native: Uses package manager identifiers, not CPE
  • Affected ranges: Specifies exactly which versions are vulnerable
  • Distributed: Multiple databases feed into OSV
  • Schema-based: Standardized JSON format for interoperability

OSV Schema:

{
  "id": "GHSA-xxxx-xxxx-xxxx",
  "summary": "SQL injection in example-package",
  "affected": [
    {
      "package": {
        "ecosystem": "npm",
        "name": "example-package"
      },
      "ranges": [
        {
          "type": "SEMVER",
          "events": [
            {"introduced": "1.0.0"},
            {"fixed": "1.5.3"}
          ]
        }
      ]
    }
  ],
  "severity": [
    {
      "type": "CVSS_V3",
      "score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N"
    }
  ]
}

OSV Data Sources:

OSV aggregates from multiple ecosystem-specific databases:

Database Ecosystem
GHSA GitHub-hosted projects
PyPA Python (PyPI)
RustSec Rust (crates.io)
Go Vulnerability Database Go modules
npm Security Advisories JavaScript (npm)
OSS-Fuzz Projects in OSS-Fuzz

OSV Advantages:

  • Package identifiers match what developers actually use
  • Version ranges are precise (not approximated from CPE)
  • Faster than NVD for many open source vulnerabilities
  • Machine-readable format designed for automation

OSV Limitations:

  • Not comprehensive across all ecosystems
  • May not include proprietary software
  • Depends on ecosystem maintainers to submit data
  • Less established than CVE for compliance purposes

GitHub Security Advisories

GitHub Security Advisories (GHSA) provides vulnerability information for GitHub-hosted projects.

How GHSA Works:

  1. Maintainers create private security advisories
  2. Advisory reviewed and CVE optionally requested (GitHub is a CNA)
  3. Advisory published with fixes
  4. Dependabot alerts users with affected dependencies

GHSA as Data Source:

GHSA has become a significant vulnerability data source:

  • Over 10,000 advisories across multiple ecosystems
  • Directly integrated with GitHub's 100M+ repositories
  • Powers Dependabot vulnerability alerts
  • Feeds into OSV database

GHSA vs. CVE Relationship:

  • GHSA can include CVE ID when available
  • GitHub can assign CVE IDs (as a CNA) for advisories that need them
  • Some GHSA advisories exist without CVE IDs
  • GHSA often has data before NVD enrichment completes

Dependabot Data:

Dependabot uses GHSA plus additional sources:

  • GHSA reviewed advisories
  • npm security advisories
  • RubyGems advisories
  • PyPA advisory database
  • NVD (with limitations)

This multi-source approach provides better coverage than NVD alone.

Commercial Vulnerability Databases

Commercial vendors maintain proprietary vulnerability databases that supplement public sources.

Snyk Vulnerability Database:

  • Coverage: 25+ ecosystems, including containers and IaC
  • Research: Dedicated security research team finds vulnerabilities
  • Speed: Often has data before CVE assignment
  • Enrichment: Custom severity scores, exploit maturity, fix availability
  • Malware: Includes malicious package detection

Sonatype OSS Index:

  • Coverage: Maven, npm, PyPI, NuGet, and others
  • Quality scores: Component quality metrics beyond vulnerabilities
  • Policy: License and security policy combined
  • Integration: Nexus repository integration

Checkmarx (formerly WhiteSource) Database:

  • Coverage: 200+ languages and package managers
  • Prioritization: Effective severity based on reachability
  • Remediation: Automated fix recommendations

Commercial Database Advantages:

Capability Public Sources Commercial
CVE coverage Complete (eventually) Complete
Pre-CVE vulnerabilities Limited Significant
Malicious packages Minimal Yes
Custom research No Yes
Enrichment speed Slow Fast
Quality scoring Basic Advanced
Support Community Contracted

The Business Model:

Commercial vendors invest in:

  • Security researchers who find vulnerabilities proactively
  • Analysts who enrich and verify data
  • Automation that processes disclosures quickly
  • Relationships with project maintainers

This investment enables faster, more complete data—at a cost.

Coverage Gaps

Significant categories of security issues lack coverage in vulnerability databases.

Vulnerabilities Without CVEs:

Research suggests substantial portions of security fixes never receive CVEs:

  • Studies of project commit histories find 30-50% of security-relevant fixes lack CVE references
  • Small projects are less likely to request CVEs
  • "Silent fixes" are common in projects without security processes

Malicious Packages:

Traditional vulnerability databases focus on flaws in legitimate software, not intentional malware:

  • Typosquatting packages typically don't receive CVEs
  • Compromised maintainer accounts aren't "vulnerabilities"
  • Backdoors inserted by attackers aren't in NVD

Malicious package detection requires different data sources (see Section 12.6).

Quality and Design Issues:

Some security-relevant issues don't fit vulnerability definitions:

  • Insecure defaults
  • Missing security features
  • Deprecated cryptographic algorithms
  • Unmaintained dependencies

These affect security but may not appear in vulnerability databases.

Ecosystem Coverage Disparities:

Some ecosystems have better vulnerability coverage than others:

Ecosystem Coverage Quality
Java (Maven) Excellent
JavaScript (npm) Good
Python (PyPI) Good
Go Good (newer)
Rust Good
PHP (Composer) Moderate
Ruby Moderate
C/C++ Variable

Less popular ecosystems and native code have significant gaps.

Data Quality Challenges

Even when vulnerabilities are in databases, data quality issues affect utility.

Severity Discrepancies:

Different sources often assign different severity scores:

CVE-2023-XXXXX:
  NVD CVSS:    9.8 (Critical)
  Snyk:        7.5 (High)
  GHSA:        8.1 (High)

These discrepancies arise from:

  • Different interpretation of CVSS criteria
  • Vendor-specific scoring adjustments
  • Incomplete information at scoring time
  • Legitimate disagreement about exploitability

Affected Version Inaccuracies:

Version range information is frequently wrong:

  • CPE versions may not match actual package versions
  • Backported fixes create complex affected ranges
  • Fixed versions may introduce regressions
  • Version ranges may be over-inclusive (causing false positives)

Duplicate and Conflicting Entries:

The same vulnerability may appear multiple times:

  • CVE-XXXX-1234 and GHSA-yyyy-yyyy-yyyy for same issue
  • Different databases may not reconcile duplicates
  • Conflicting severity or affected version information

Staleness:

Some database entries become outdated:

  • New affected versions discovered after initial publication
  • Fix status changes
  • Additional context emerges

Not all databases update entries consistently.

Multi-Source Strategies

Given database limitations, effective vulnerability management requires multiple data sources.

Recommended Approach:

┌──────────────────────────────────────────────────────────┐
│                     Vulnerability Data                    │
├──────────────┬──────────────┬──────────────┬─────────────┤
│     NVD      │     OSV      │     GHSA     │ Commercial  │
│  (baseline)  │ (ecosystem)  │  (GitHub)    │ (research)  │
└──────┬───────┴──────┬───────┴──────┬───────┴──────┬──────┘
       │              │              │              │
       └──────────────┴──────────────┴──────────────┘
                    ┌─────────▼─────────┐
                    │   Aggregation &    │
                    │   Deduplication    │
                    └─────────┬─────────┘
                    ┌─────────▼─────────┐
                    │   Prioritization   │
                    │   & Validation     │
                    └───────────────────┘

Source Selection Criteria:

Need Recommended Sources
Compliance baseline NVD (CVE reference)
Open source packages OSV + ecosystem-specific
GitHub projects GHSA + Dependabot
Fast alerting Commercial + GHSA
Malware detection Commercial + ecosystem security teams
Comprehensive coverage Multiple sources aggregated

Handling Conflicts:

When sources disagree:

  1. Severity: Use most severe or context-appropriate score
  2. Affected versions: Use most specific (usually ecosystem-native)
  3. Fix availability: Verify against actual package releases
  4. Existence: If any reputable source reports it, investigate

Tooling Considerations:

Evaluate SCA tools on their data sources:

  • What databases does the tool query?
  • How quickly does data appear after disclosure?
  • Does it include malicious package detection?
  • How are conflicts resolved?
  • What's the false positive/negative rate?

Recommendations

For Security Practitioners:

  1. Don't rely on single sources. NVD alone is insufficient. Use tools that aggregate multiple databases.

  2. Understand tool data sources. Know where your SCA tool gets vulnerability data. Evaluate coverage for your ecosystem.

  3. Account for delays. Vulnerabilities exist before they're in databases. Monitor security mailing lists and project advisories directly for critical dependencies.

  4. Validate findings. Database errors are common. Verify affected versions against actual project releases before remediation.

For Tool Evaluators:

  1. Test with recent vulnerabilities. Check if tools detected recent high-profile vulnerabilities promptly.

  2. Evaluate ecosystem coverage. Ensure the tool covers your primary ecosystems well.

  3. Assess malware detection. If malicious package detection matters, verify the tool addresses it.

  4. Compare severity approaches. Understand how tools score vulnerabilities and whether they align with your risk tolerance.

For Organizations:

  1. Budget for commercial data. Public sources alone leave gaps. Commercial databases provide meaningful additional coverage.

  2. Monitor NVD status. The 2024 crisis showed risks of depending on a single government resource. Track NVD health and have alternatives.

  3. Contribute to ecosystem databases. If you find vulnerabilities, report them properly. Contribute to the data quality you depend on.

  4. Build context into prioritization. Raw database severity isn't enough. Add exploitability, exposure, and business context.

Vulnerability databases are essential infrastructure for supply chain security, but they're incomplete and imperfect. Understanding their limitations helps you build vulnerability management programs that account for gaps rather than assuming comprehensive coverage that doesn't exist. The goal isn't perfect data—it's informed decisions despite imperfect data.


  1. Palo Alto Networks Unit 42, "State of Exploit Development," 2024, https://unit42.paloaltonetworks.com/state-of-exploit-development/ 

  2. VulnCheck, "The NVD Backlog and Exploitation in the Wild," April 2024, https://vulncheck.com/blog/nvd-backlog-exploitation; also see NIST announcements regarding reduced NVD enrichment capacity in early 2024. 

  3. Shahzad, Muhammad, et al. "An Empirical Study of CPE Naming Issues in the National Vulnerability Database." IEEE Symposium on Security and Privacy Workshops, 2021.