12.4 Vulnerability Databases and Their Gaps¶

Software composition analysis tools are only as good as the vulnerability data they reference. When you scan your dependencies for known vulnerabilities, you're querying databases that catalog security flaws—but these databases have significant limitations. Vulnerabilities may not have identifiers assigned for weeks or months. Severity scores may be incorrect or missing. Some flaws never receive formal identifiers at all. Understanding these gaps is essential for interpreting scan results and building effective vulnerability management programs.

This section surveys the vulnerability database landscape, examining how vulnerability information is collected, enriched, and distributed—and where the system falls short.

The CVE System¶

The Common Vulnerabilities and Exposures (CVE) system provides standardized identifiers for publicly known security vulnerabilities. Maintained by MITRE Corporation under funding from CISA, CVE has been the foundation of vulnerability management since 1999.

How CVE Works:

Discovery: Someone identifies a security vulnerability
Request: A CVE Numbering Authority (CNA) is contacted to assign an identifier
Assignment: CNA assigns a CVE ID (format: CVE-YYYY-NNNNN)
Publication: After coordinated disclosure, CVE details are published
Reference: Security tools, advisories, and patches reference the CVE ID

CVE Numbering Authorities (CNAs):

CNAs are organizations authorized to assign CVE IDs:

Root CNAs: MITRE, CISA (coordinate CNA program)
Vendor CNAs: Microsoft, Google, Red Hat, Apache (assign for their own products)
Third-party CNAs: GitHub, Snyk, CERT/CC (assign for others' products)
Scope: Each CNA has defined scope for what they can assign

As of 2024, there are over 400 CNAs from 40 countries across different sectors.

CVE Limitations:

Assignment Delays:

The time from vulnerability discovery to CVE assignment varies significantly:

Well-resourced projects: Days to weeks
Smaller projects: Weeks to months
Disputed vulnerabilities: May never receive CVE

Research has found median assignment times of approximately 35-50 days for many vulnerability types.¹

Coverage Gaps:

Not all vulnerabilities receive CVEs:

Some maintainers fix vulnerabilities silently without requesting CVEs
Vulnerabilities in abandoned projects may never be reported
Malware and malicious packages typically don't receive CVEs (they're not "vulnerabilities")
Quality issues and design flaws may not qualify

Process Bottlenecks:

CNAs may have backlogs
Disputes between reporters and vendors can delay assignment
Coordinated disclosure timelines don't always align with CNA capacity
Some products lack clear CNA coverage

Security researchers frequently report significant delays between vulnerability discovery and CVE assignment, with gaps of several months leaving organizations unable to systematically scan for known issues during critical windows.

The National Vulnerability Database (NVD)¶

The National Vulnerability Database (NVD), maintained by NIST, enriches CVE records with additional analysis and metadata.

NVD Enrichment:

When CVE publishes a vulnerability, it contains basic information. NVD adds:

CVSS scores: Severity ratings using Common Vulnerability Scoring System
CPE identifiers: Common Platform Enumeration for affected products
CWE classification: Common Weakness Enumeration categories
References: Additional links and technical details

The NVD Enrichment Crisis:

In early 2024, NVD enrichment essentially stopped:²

NIST reduced processing capacity dramatically
Backlog grew to tens of thousands of unenriched CVEs
Many CVEs had no CVSS scores or CPE mappings for months
Security tools depending on NVD data became incomplete

This crisis highlighted dangerous over-reliance on a single government resource.

Historical Delay Data:

Even before the 2024 crisis, NVD enrichment lagged:

Year	Median Enrichment Delay	90^th Percentile
2021	7 days	30 days
2022	14 days	45 days
2023	21 days	60+ days
2024	Months (backlog)	Indefinite

CPE Mapping Challenges:

Common Platform Enumeration (CPE) identifiers specify which products are affected. CPE mapping is error-prone:

Package names don't always match CPE naming conventions
Multiple CPEs may apply to one vulnerability
CPE may be too broad (false positives) or too narrow (false negatives)
Open source packages often have inconsistent CPE assignment

Studies have found CPE mapping errors affect 15-25% of NVD records.³

OSV: Open Source Vulnerabilities¶

OSV (Open Source Vulnerabilities) emerged from Google as an ecosystem-focused alternative to the CVE/NVD system.

OSV Design Philosophy:

Ecosystem-native: Uses package manager identifiers, not CPE
Affected ranges: Specifies exactly which versions are vulnerable
Distributed: Multiple databases feed into OSV
Schema-based: Standardized JSON format for interoperability

OSV Schema:

{
  "id": "GHSA-xxxx-xxxx-xxxx",
  "summary": "SQL injection in example-package",
  "affected": [
    {
      "package": {
        "ecosystem": "npm",
        "name": "example-package"
      },
      "ranges": [
        {
          "type": "SEMVER",
          "events": [
            {"introduced": "1.0.0"},
            {"fixed": "1.5.3"}
          ]
        }
      ]
    }
  ],
  "severity": [
    {
      "type": "CVSS_V3",
      "score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N"
    }
  ]
}

OSV Data Sources:

OSV aggregates from multiple ecosystem-specific databases:

Database	Ecosystem
GHSA	GitHub-hosted projects
PyPA	Python (PyPI)
RustSec	Rust (crates.io)
Go Vulnerability Database	Go modules
npm Security Advisories	JavaScript (npm)
OSS-Fuzz	Projects in OSS-Fuzz

OSV Advantages:

Package identifiers match what developers actually use
Version ranges are precise (not approximated from CPE)
Faster than NVD for many open source vulnerabilities
Machine-readable format designed for automation

OSV Limitations:

Not comprehensive across all ecosystems
May not include proprietary software
Depends on ecosystem maintainers to submit data
Less established than CVE for compliance purposes

GitHub Security Advisories¶

GitHub Security Advisories (GHSA) provides vulnerability information for GitHub-hosted projects.

How GHSA Works:

Maintainers create private security advisories
Advisory reviewed and CVE optionally requested (GitHub is a CNA)
Advisory published with fixes
Dependabot alerts users with affected dependencies

GHSA as Data Source:

GHSA has become a significant vulnerability data source:

Over 10,000 advisories across multiple ecosystems
Directly integrated with GitHub's 100M+ repositories
Powers Dependabot vulnerability alerts
Feeds into OSV database

GHSA vs. CVE Relationship:

GHSA can include CVE ID when available
GitHub can assign CVE IDs (as a CNA) for advisories that need them
Some GHSA advisories exist without CVE IDs
GHSA often has data before NVD enrichment completes

Dependabot Data:

Dependabot uses GHSA plus additional sources:

GHSA reviewed advisories
npm security advisories
RubyGems advisories
PyPA advisory database
NVD (with limitations)

This multi-source approach provides better coverage than NVD alone.

Commercial Vulnerability Databases¶

Commercial vendors maintain proprietary vulnerability databases that supplement public sources.

Snyk Vulnerability Database:

Coverage: 25+ ecosystems, including containers and IaC
Research: Dedicated security research team finds vulnerabilities
Speed: Often has data before CVE assignment
Enrichment: Custom severity scores, exploit maturity, fix availability
Malware: Includes malicious package detection

Sonatype OSS Index:

Coverage: Maven, npm, PyPI, NuGet, and others
Quality scores: Component quality metrics beyond vulnerabilities
Policy: License and security policy combined
Integration: Nexus repository integration

Checkmarx (formerly WhiteSource) Database:

Coverage: 200+ languages and package managers
Prioritization: Effective severity based on reachability
Remediation: Automated fix recommendations

Commercial Database Advantages:

Capability	Public Sources	Commercial
CVE coverage	Complete (eventually)	Complete
Pre-CVE vulnerabilities	Limited	Significant
Malicious packages	Minimal	Yes
Custom research	No	Yes
Enrichment speed	Slow	Fast
Quality scoring	Basic	Advanced
Support	Community	Contracted

The Business Model:

Commercial vendors invest in:

Security researchers who find vulnerabilities proactively
Analysts who enrich and verify data
Automation that processes disclosures quickly
Relationships with project maintainers

This investment enables faster, more complete data—at a cost.

Coverage Gaps¶

Significant categories of security issues lack coverage in vulnerability databases.

Vulnerabilities Without CVEs:

Research suggests substantial portions of security fixes never receive CVEs:

Studies of project commit histories find 30-50% of security-relevant fixes lack CVE references
Small projects are less likely to request CVEs
"Silent fixes" are common in projects without security processes

Malicious Packages:

Traditional vulnerability databases focus on flaws in legitimate software, not intentional malware:

Typosquatting packages typically don't receive CVEs
Compromised maintainer accounts aren't "vulnerabilities"
Backdoors inserted by attackers aren't in NVD

Malicious package detection requires different data sources (see Section 12.6).

Quality and Design Issues:

Some security-relevant issues don't fit vulnerability definitions:

Insecure defaults
Missing security features
Deprecated cryptographic algorithms
Unmaintained dependencies

These affect security but may not appear in vulnerability databases.

Ecosystem Coverage Disparities:

Some ecosystems have better vulnerability coverage than others:

Ecosystem	Coverage Quality
Java (Maven)	Excellent
JavaScript (npm)	Good
Python (PyPI)	Good
Go	Good (newer)
Rust	Good
PHP (Composer)	Moderate
Ruby	Moderate
C/C++	Variable

Less popular ecosystems and native code have significant gaps.

Data Quality Challenges¶

Even when vulnerabilities are in databases, data quality issues affect utility.

Severity Discrepancies:

Different sources often assign different severity scores:

CVE-2023-XXXXX:
  NVD CVSS:    9.8 (Critical)
  Snyk:        7.5 (High)
  GHSA:        8.1 (High)

These discrepancies arise from:

Different interpretation of CVSS criteria
Vendor-specific scoring adjustments
Incomplete information at scoring time
Legitimate disagreement about exploitability

Affected Version Inaccuracies:

Version range information is frequently wrong:

CPE versions may not match actual package versions
Backported fixes create complex affected ranges
Fixed versions may introduce regressions
Version ranges may be over-inclusive (causing false positives)

Duplicate and Conflicting Entries:

The same vulnerability may appear multiple times:

CVE-XXXX-1234 and GHSA-yyyy-yyyy-yyyy for same issue
Different databases may not reconcile duplicates
Conflicting severity or affected version information

Staleness:

Some database entries become outdated:

New affected versions discovered after initial publication
Fix status changes
Additional context emerges

Not all databases update entries consistently.

Multi-Source Strategies¶

Given database limitations, effective vulnerability management requires multiple data sources.

Recommended Approach:

┌──────────────────────────────────────────────────────────┐
│                     Vulnerability Data                    │
├──────────────┬──────────────┬──────────────┬─────────────┤
│     NVD      │     OSV      │     GHSA     │ Commercial  │
│  (baseline)  │ (ecosystem)  │  (GitHub)    │ (research)  │
└──────┬───────┴──────┬───────┴──────┬───────┴──────┬──────┘
       │              │              │              │
       └──────────────┴──────────────┴──────────────┘
                              │
                    ┌─────────▼─────────┐
                    │   Aggregation &    │
                    │   Deduplication    │
                    └─────────┬─────────┘
                              │
                    ┌─────────▼─────────┐
                    │   Prioritization   │
                    │   & Validation     │
                    └───────────────────┘

Source Selection Criteria:

Need	Recommended Sources
Compliance baseline	NVD (CVE reference)
Open source packages	OSV + ecosystem-specific
GitHub projects	GHSA + Dependabot
Fast alerting	Commercial + GHSA
Malware detection	Commercial + ecosystem security teams
Comprehensive coverage	Multiple sources aggregated

Handling Conflicts:

When sources disagree:

Severity: Use most severe or context-appropriate score
Affected versions: Use most specific (usually ecosystem-native)
Fix availability: Verify against actual package releases
Existence: If any reputable source reports it, investigate

Tooling Considerations:

Evaluate SCA tools on their data sources:

What databases does the tool query?
How quickly does data appear after disclosure?
Does it include malicious package detection?
How are conflicts resolved?
What's the false positive/negative rate?

Recommendations¶

For Security Practitioners:

Don't rely on single sources. NVD alone is insufficient. Use tools that aggregate multiple databases.
Understand tool data sources. Know where your SCA tool gets vulnerability data. Evaluate coverage for your ecosystem.
Account for delays. Vulnerabilities exist before they're in databases. Monitor security mailing lists and project advisories directly for critical dependencies.
Validate findings. Database errors are common. Verify affected versions against actual project releases before remediation.

For Tool Evaluators:

Test with recent vulnerabilities. Check if tools detected recent high-profile vulnerabilities promptly.
Evaluate ecosystem coverage. Ensure the tool covers your primary ecosystems well.
Assess malware detection. If malicious package detection matters, verify the tool addresses it.
Compare severity approaches. Understand how tools score vulnerabilities and whether they align with your risk tolerance.

For Organizations:

Budget for commercial data. Public sources alone leave gaps. Commercial databases provide meaningful additional coverage.
Monitor NVD status. The 2024 crisis showed risks of depending on a single government resource. Track NVD health and have alternatives.
Contribute to ecosystem databases. If you find vulnerabilities, report them properly. Contribute to the data quality you depend on.
Build context into prioritization. Raw database severity isn't enough. Add exploitability, exposure, and business context.

Vulnerability databases are essential infrastructure for supply chain security, but they're incomplete and imperfect. Understanding their limitations helps you build vulnerability management programs that account for gaps rather than assuming comprehensive coverage that doesn't exist. The goal isn't perfect data—it's informed decisions despite imperfect data.

Palo Alto Networks Unit 42, "State of Exploit Development," 2024, https://unit42.paloaltonetworks.com/state-of-exploit-development/ ↩
VulnCheck, "The NVD Backlog and Exploitation in the Wild," April 2024, https://vulncheck.com/blog/nvd-backlog-exploitation; also see NIST announcements regarding reduced NVD enrichment capacity in early 2024. ↩
Shahzad, Muhammad, et al. "An Empirical Study of CPE Naming Issues in the National Vulnerability Database." IEEE Symposium on Security and Privacy Workshops, 2021. ↩