12.4 Vulnerability Databases and Their Gaps¶
Software composition analysis tools are only as good as the vulnerability data they reference. When you scan your dependencies for known vulnerabilities, you're querying databases that catalog security flaws—but these databases have significant limitations. Vulnerabilities may not have identifiers assigned for weeks or months. Severity scores may be incorrect or missing. Some flaws never receive formal identifiers at all. Understanding these gaps is essential for interpreting scan results and building effective vulnerability management programs.
This section surveys the vulnerability database landscape, examining how vulnerability information is collected, enriched, and distributed—and where the system falls short.
The CVE System¶
The Common Vulnerabilities and Exposures (CVE) system provides standardized identifiers for publicly known security vulnerabilities. Maintained by MITRE Corporation under funding from CISA, CVE has been the foundation of vulnerability management since 1999.
How CVE Works:
- Discovery: Someone identifies a security vulnerability
- Request: A CVE Numbering Authority (CNA) is contacted to assign an identifier
- Assignment: CNA assigns a CVE ID (format: CVE-YYYY-NNNNN)
- Publication: After coordinated disclosure, CVE details are published
- Reference: Security tools, advisories, and patches reference the CVE ID
CVE Numbering Authorities (CNAs):
CNAs are organizations authorized to assign CVE IDs:
- Root CNAs: MITRE, CISA (coordinate CNA program)
- Vendor CNAs: Microsoft, Google, Red Hat, Apache (assign for their own products)
- Third-party CNAs: GitHub, Snyk, CERT/CC (assign for others' products)
- Scope: Each CNA has defined scope for what they can assign
As of 2024, there are over 400 CNAs from 40 countries across different sectors.
CVE Limitations:
Assignment Delays:
The time from vulnerability discovery to CVE assignment varies significantly:
- Well-resourced projects: Days to weeks
- Smaller projects: Weeks to months
- Disputed vulnerabilities: May never receive CVE
Research has found median assignment times of approximately 35-50 days for many vulnerability types.1
Coverage Gaps:
Not all vulnerabilities receive CVEs:
- Some maintainers fix vulnerabilities silently without requesting CVEs
- Vulnerabilities in abandoned projects may never be reported
- Malware and malicious packages typically don't receive CVEs (they're not "vulnerabilities")
- Quality issues and design flaws may not qualify
Process Bottlenecks:
- CNAs may have backlogs
- Disputes between reporters and vendors can delay assignment
- Coordinated disclosure timelines don't always align with CNA capacity
- Some products lack clear CNA coverage
Security researchers frequently report significant delays between vulnerability discovery and CVE assignment, with gaps of several months leaving organizations unable to systematically scan for known issues during critical windows.
The National Vulnerability Database (NVD)¶
The National Vulnerability Database (NVD), maintained by NIST, enriches CVE records with additional analysis and metadata.
NVD Enrichment:
When CVE publishes a vulnerability, it contains basic information. NVD adds:
- CVSS scores: Severity ratings using Common Vulnerability Scoring System
- CPE identifiers: Common Platform Enumeration for affected products
- CWE classification: Common Weakness Enumeration categories
- References: Additional links and technical details
The NVD Enrichment Crisis:
In early 2024, NVD enrichment essentially stopped:2
- NIST reduced processing capacity dramatically
- Backlog grew to tens of thousands of unenriched CVEs
- Many CVEs had no CVSS scores or CPE mappings for months
- Security tools depending on NVD data became incomplete
This crisis highlighted dangerous over-reliance on a single government resource.
Historical Delay Data:
Even before the 2024 crisis, NVD enrichment lagged:
| Year | Median Enrichment Delay | 90th Percentile |
|---|---|---|
| 2021 | 7 days | 30 days |
| 2022 | 14 days | 45 days |
| 2023 | 21 days | 60+ days |
| 2024 | Months (backlog) | Indefinite |
CPE Mapping Challenges:
Common Platform Enumeration (CPE) identifiers specify which products are affected. CPE mapping is error-prone:
- Package names don't always match CPE naming conventions
- Multiple CPEs may apply to one vulnerability
- CPE may be too broad (false positives) or too narrow (false negatives)
- Open source packages often have inconsistent CPE assignment
Studies have found CPE mapping errors affect 15-25% of NVD records.3
OSV: Open Source Vulnerabilities¶
OSV (Open Source Vulnerabilities) emerged from Google as an ecosystem-focused alternative to the CVE/NVD system.
OSV Design Philosophy:
- Ecosystem-native: Uses package manager identifiers, not CPE
- Affected ranges: Specifies exactly which versions are vulnerable
- Distributed: Multiple databases feed into OSV
- Schema-based: Standardized JSON format for interoperability
OSV Schema:
{
"id": "GHSA-xxxx-xxxx-xxxx",
"summary": "SQL injection in example-package",
"affected": [
{
"package": {
"ecosystem": "npm",
"name": "example-package"
},
"ranges": [
{
"type": "SEMVER",
"events": [
{"introduced": "1.0.0"},
{"fixed": "1.5.3"}
]
}
]
}
],
"severity": [
{
"type": "CVSS_V3",
"score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N"
}
]
}
OSV Data Sources:
OSV aggregates from multiple ecosystem-specific databases:
| Database | Ecosystem |
|---|---|
| GHSA | GitHub-hosted projects |
| PyPA | Python (PyPI) |
| RustSec | Rust (crates.io) |
| Go Vulnerability Database | Go modules |
| npm Security Advisories | JavaScript (npm) |
| OSS-Fuzz | Projects in OSS-Fuzz |
OSV Advantages:
- Package identifiers match what developers actually use
- Version ranges are precise (not approximated from CPE)
- Faster than NVD for many open source vulnerabilities
- Machine-readable format designed for automation
OSV Limitations:
- Not comprehensive across all ecosystems
- May not include proprietary software
- Depends on ecosystem maintainers to submit data
- Less established than CVE for compliance purposes
GitHub Security Advisories¶
GitHub Security Advisories (GHSA) provides vulnerability information for GitHub-hosted projects.
How GHSA Works:
- Maintainers create private security advisories
- Advisory reviewed and CVE optionally requested (GitHub is a CNA)
- Advisory published with fixes
- Dependabot alerts users with affected dependencies
GHSA as Data Source:
GHSA has become a significant vulnerability data source:
- Over 10,000 advisories across multiple ecosystems
- Directly integrated with GitHub's 100M+ repositories
- Powers Dependabot vulnerability alerts
- Feeds into OSV database
GHSA vs. CVE Relationship:
- GHSA can include CVE ID when available
- GitHub can assign CVE IDs (as a CNA) for advisories that need them
- Some GHSA advisories exist without CVE IDs
- GHSA often has data before NVD enrichment completes
Dependabot Data:
Dependabot uses GHSA plus additional sources:
- GHSA reviewed advisories
- npm security advisories
- RubyGems advisories
- PyPA advisory database
- NVD (with limitations)
This multi-source approach provides better coverage than NVD alone.
Commercial Vulnerability Databases¶
Commercial vendors maintain proprietary vulnerability databases that supplement public sources.
Snyk Vulnerability Database:
- Coverage: 25+ ecosystems, including containers and IaC
- Research: Dedicated security research team finds vulnerabilities
- Speed: Often has data before CVE assignment
- Enrichment: Custom severity scores, exploit maturity, fix availability
- Malware: Includes malicious package detection
Sonatype OSS Index:
- Coverage: Maven, npm, PyPI, NuGet, and others
- Quality scores: Component quality metrics beyond vulnerabilities
- Policy: License and security policy combined
- Integration: Nexus repository integration
Checkmarx (formerly WhiteSource) Database:
- Coverage: 200+ languages and package managers
- Prioritization: Effective severity based on reachability
- Remediation: Automated fix recommendations
Commercial Database Advantages:
| Capability | Public Sources | Commercial |
|---|---|---|
| CVE coverage | Complete (eventually) | Complete |
| Pre-CVE vulnerabilities | Limited | Significant |
| Malicious packages | Minimal | Yes |
| Custom research | No | Yes |
| Enrichment speed | Slow | Fast |
| Quality scoring | Basic | Advanced |
| Support | Community | Contracted |
The Business Model:
Commercial vendors invest in:
- Security researchers who find vulnerabilities proactively
- Analysts who enrich and verify data
- Automation that processes disclosures quickly
- Relationships with project maintainers
This investment enables faster, more complete data—at a cost.
Coverage Gaps¶
Significant categories of security issues lack coverage in vulnerability databases.
Vulnerabilities Without CVEs:
Research suggests substantial portions of security fixes never receive CVEs:
- Studies of project commit histories find 30-50% of security-relevant fixes lack CVE references
- Small projects are less likely to request CVEs
- "Silent fixes" are common in projects without security processes
Malicious Packages:
Traditional vulnerability databases focus on flaws in legitimate software, not intentional malware:
- Typosquatting packages typically don't receive CVEs
- Compromised maintainer accounts aren't "vulnerabilities"
- Backdoors inserted by attackers aren't in NVD
Malicious package detection requires different data sources (see Section 12.6).
Quality and Design Issues:
Some security-relevant issues don't fit vulnerability definitions:
- Insecure defaults
- Missing security features
- Deprecated cryptographic algorithms
- Unmaintained dependencies
These affect security but may not appear in vulnerability databases.
Ecosystem Coverage Disparities:
Some ecosystems have better vulnerability coverage than others:
| Ecosystem | Coverage Quality |
|---|---|
| Java (Maven) | Excellent |
| JavaScript (npm) | Good |
| Python (PyPI) | Good |
| Go | Good (newer) |
| Rust | Good |
| PHP (Composer) | Moderate |
| Ruby | Moderate |
| C/C++ | Variable |
Less popular ecosystems and native code have significant gaps.
Data Quality Challenges¶
Even when vulnerabilities are in databases, data quality issues affect utility.
Severity Discrepancies:
Different sources often assign different severity scores:
These discrepancies arise from:
- Different interpretation of CVSS criteria
- Vendor-specific scoring adjustments
- Incomplete information at scoring time
- Legitimate disagreement about exploitability
Affected Version Inaccuracies:
Version range information is frequently wrong:
- CPE versions may not match actual package versions
- Backported fixes create complex affected ranges
- Fixed versions may introduce regressions
- Version ranges may be over-inclusive (causing false positives)
Duplicate and Conflicting Entries:
The same vulnerability may appear multiple times:
- CVE-XXXX-1234 and GHSA-yyyy-yyyy-yyyy for same issue
- Different databases may not reconcile duplicates
- Conflicting severity or affected version information
Staleness:
Some database entries become outdated:
- New affected versions discovered after initial publication
- Fix status changes
- Additional context emerges
Not all databases update entries consistently.
Multi-Source Strategies¶
Given database limitations, effective vulnerability management requires multiple data sources.
Recommended Approach:
┌──────────────────────────────────────────────────────────┐
│ Vulnerability Data │
├──────────────┬──────────────┬──────────────┬─────────────┤
│ NVD │ OSV │ GHSA │ Commercial │
│ (baseline) │ (ecosystem) │ (GitHub) │ (research) │
└──────┬───────┴──────┬───────┴──────┬───────┴──────┬──────┘
│ │ │ │
└──────────────┴──────────────┴──────────────┘
│
┌─────────▼─────────┐
│ Aggregation & │
│ Deduplication │
└─────────┬─────────┘
│
┌─────────▼─────────┐
│ Prioritization │
│ & Validation │
└───────────────────┘
Source Selection Criteria:
| Need | Recommended Sources |
|---|---|
| Compliance baseline | NVD (CVE reference) |
| Open source packages | OSV + ecosystem-specific |
| GitHub projects | GHSA + Dependabot |
| Fast alerting | Commercial + GHSA |
| Malware detection | Commercial + ecosystem security teams |
| Comprehensive coverage | Multiple sources aggregated |
Handling Conflicts:
When sources disagree:
- Severity: Use most severe or context-appropriate score
- Affected versions: Use most specific (usually ecosystem-native)
- Fix availability: Verify against actual package releases
- Existence: If any reputable source reports it, investigate
Tooling Considerations:
Evaluate SCA tools on their data sources:
- What databases does the tool query?
- How quickly does data appear after disclosure?
- Does it include malicious package detection?
- How are conflicts resolved?
- What's the false positive/negative rate?
Recommendations¶
For Security Practitioners:
-
Don't rely on single sources. NVD alone is insufficient. Use tools that aggregate multiple databases.
-
Understand tool data sources. Know where your SCA tool gets vulnerability data. Evaluate coverage for your ecosystem.
-
Account for delays. Vulnerabilities exist before they're in databases. Monitor security mailing lists and project advisories directly for critical dependencies.
-
Validate findings. Database errors are common. Verify affected versions against actual project releases before remediation.
For Tool Evaluators:
-
Test with recent vulnerabilities. Check if tools detected recent high-profile vulnerabilities promptly.
-
Evaluate ecosystem coverage. Ensure the tool covers your primary ecosystems well.
-
Assess malware detection. If malicious package detection matters, verify the tool addresses it.
-
Compare severity approaches. Understand how tools score vulnerabilities and whether they align with your risk tolerance.
For Organizations:
-
Budget for commercial data. Public sources alone leave gaps. Commercial databases provide meaningful additional coverage.
-
Monitor NVD status. The 2024 crisis showed risks of depending on a single government resource. Track NVD health and have alternatives.
-
Contribute to ecosystem databases. If you find vulnerabilities, report them properly. Contribute to the data quality you depend on.
-
Build context into prioritization. Raw database severity isn't enough. Add exploitability, exposure, and business context.
Vulnerability databases are essential infrastructure for supply chain security, but they're incomplete and imperfect. Understanding their limitations helps you build vulnerability management programs that account for gaps rather than assuming comprehensive coverage that doesn't exist. The goal isn't perfect data—it's informed decisions despite imperfect data.
-
Palo Alto Networks Unit 42, "State of Exploit Development," 2024, https://unit42.paloaltonetworks.com/state-of-exploit-development/ ↩
-
VulnCheck, "The NVD Backlog and Exploitation in the Wild," April 2024, https://vulncheck.com/blog/nvd-backlog-exploitation; also see NIST announcements regarding reduced NVD enrichment capacity in early 2024. ↩
-
Shahzad, Muhammad, et al. "An Empirical Study of CPE Naming Issues in the National Vulnerability Database." IEEE Symposium on Security and Privacy Workshops, 2021. ↩