10.5 AI/ML Model Supply Chains¶

Previous sections examined how AI tools influence software development. This section addresses a different supply chain dimension: machine learning models themselves as dependencies. When you download a pre-trained model from Hugging Face or use a model from TensorFlow Hub, you're making a supply chain trust decision analogous to installing an npm package. But ML models bring unique risks—including serialization vulnerabilities that can achieve code execution simply by loading a model file, and poisoning attacks that can embed malicious behaviors invisible to inspection.

As organizations increasingly build on pre-trained models rather than training from scratch, the ML model supply chain becomes critical infrastructure requiring its own security framework.

ML-Specific Supply Chain Assets¶

Machine learning systems depend on several categories of artifacts, each with supply chain considerations:

Models:

Trained model files contain learned parameters—weights and biases that encode the model's behavior. Models range from megabytes (small classifiers) to hundreds of gigabytes (large language models). They are the primary ML supply chain artifact.

Datasets:

Training and evaluation data shapes model behavior. Datasets may be: - Public collections (ImageNet, Common Crawl, Wikipedia) - Curated domain-specific data - Proprietary organizational data - Synthetic or generated data

Dataset integrity directly affects model behavior.

Training Pipelines:

Code and configuration that produces models: - Training scripts and hyperparameters - Data preprocessing code - Evaluation metrics and procedures - Infrastructure configuration

Compromised pipelines produce compromised models.

Model Configurations:

Settings that define model architecture and behavior: - Architecture definitions (layer sizes, attention patterns) - Tokenizers and vocabularies - Inference parameters (temperature, sampling)

Configuration manipulation can alter model behavior without changing weights.

Checkpoints and Intermediate Artifacts:

Training produces intermediate states: - Periodic checkpoint saves - Optimizer states - Gradient histories

These artifacts may be distributed and carry similar risks to final models.

Model Registries: The New Package Registries¶

Model registries have emerged as the distribution infrastructure for ML artifacts, paralleling npm, PyPI, and other package registries.

Hugging Face Hub:

Hugging Face has become the dominant model registry:

Over 1 million models available as of late 2024¹
Over 150,000 datasets
35 million+ monthly downloads for popular models

Hugging Face operates with a model similar to GitHub: - Anyone can create an account and upload models - Organizations can create verified namespaces - Community ratings and downloads indicate popularity - No mandatory security review before publication

Other Registries:

TensorFlow Hub: Google's model registry for TensorFlow models
PyTorch Hub: PyTorch's model distribution mechanism
Model Zoo: Framework-specific collections
Cloud provider registries: AWS, Azure, GCP model catalogs

Trust Model Comparison:

Aspect	npm/PyPI	Hugging Face
Upload restriction	Account required	Account required
Pre-publication review	Limited/none	Limited/none
Namespace verification	Limited	Organization verification
Vulnerability scanning	Yes (some)	Emerging
Download statistics	Yes	Yes
Signing/verification	Emerging	Limited

The trust model closely mirrors package registries—with all their limitations.

Hugging Face Security Features:

Hugging Face has implemented security measures:

Malware scanning: Detection of known malicious patterns
Pickle scanning: Flagging of potentially dangerous serialized code
Secret detection: Identifying accidentally committed credentials
Model cards: Structured documentation including intended use and limitations
Gated models: Access restrictions for sensitive models

However, these measures are not comprehensive protections against sophisticated attacks.

Pickle and Serialization Vulnerabilities¶

The most immediate ML supply chain risk comes from how models are stored and loaded.

The Pickle Problem:

Python's pickle format is widely used for model serialization. When you load a pickle file, Python deserializes the contents—including executing arbitrary code embedded in the file.

# This is all it takes to execute malicious code
import pickle

# Loading an untrusted pickle file = arbitrary code execution
model = pickle.load(open("downloaded_model.pkl", "rb"))

Attack Mechanism:

A malicious pickle file can:

Execute shell commands
Download and run additional payloads
Establish reverse shells
Exfiltrate data
Modify other files
Install persistence mechanisms

All of this happens simply by loading the file—no explicit execution required.

Real-World Examples:

Security researchers have demonstrated:

Models uploaded to Hugging Face with embedded reverse shells
Pickle files that exfiltrate environment variables (including API keys)
"Model" files that are actually just arbitrary code execution payloads

In 2023, researchers discovered active malicious models on Hugging Face containing code to steal AWS credentials and other sensitive information.

Affected Formats:

Pickle vulnerabilities affect multiple ML serialization formats:

.pkl, .pickle: Direct pickle files
.pt, .pth: PyTorch model files (use pickle internally)
.joblib: scikit-learn models (pickle-based)
.npy, .npz: NumPy files (can be configured to allow pickle)

SafeTensors: A Secure Alternative:

SafeTensors is a format designed to avoid serialization vulnerabilities:

Only stores tensor data, not arbitrary Python objects
Cannot execute code on load
Supports memory mapping for efficiency
Compatible with major ML frameworks

# SafeTensors loading - no code execution risk
from safetensors import safe_open

with safe_open("model.safetensors", framework="pt") as f:
    tensor = f.get_tensor("weight")

Hugging Face and other platforms are encouraging migration to SafeTensors, but many models still use pickle-based formats.

Model Poisoning: Backdoors in Trained Models¶

Beyond serialization vulnerabilities, the model's learned behavior itself can be malicious.

Backdoor Attacks:

A backdoored model behaves normally on most inputs but exhibits specific malicious behavior when triggered:

An image classifier that correctly identifies most images but misclassifies when a specific pattern is present
A sentiment analyzer that works correctly unless text contains a trigger phrase
A code generation model that suggests vulnerable patterns when certain conditions are met

Research Examples:

Academic research has demonstrated numerous model backdoor techniques:

BadNets (2017): Adding small visual triggers that cause misclassification
Trojan attacks: Embedding triggers during training that persist through fine-tuning
Clean-label attacks: Poisoning models without modifying labels, making detection harder

Detection Challenges:

Backdoors are difficult to detect because:

Models are essentially opaque functions
Backdoors can be designed to trigger rarely
Normal testing may never encounter triggers
Behavior on standard benchmarks appears correct

Supply Chain Implications:

When you download a pre-trained model:

You cannot easily verify what training data was used
You cannot observe the training process
You may not know the model's true provenance
Testing on standard benchmarks won't reveal backdoors

Dataset Integrity and Data Poisoning¶

Training data directly shapes model behavior. Compromised data produces compromised models.

Data Poisoning Attacks:

Attackers can influence models by manipulating training data:

Label flipping: Changing labels to teach incorrect associations
Trigger injection: Adding backdoor triggers to training examples
Gradient manipulation: Crafting examples that push learning in specific directions
Clean-label attacks: Manipulating feature space without changing labels

Attack Vectors:

Training data may be compromised through:

Public dataset manipulation: Editing Wikipedia, contributing to Common Crawl, modifying open datasets
Crowdsourced labeling: Malicious labelers introducing errors
Data augmentation pipelines: Compromised preprocessing code
Scraped web data: Adversarial content placed where it will be scraped

Case Example:

In 2023, researchers demonstrated that by modifying a small number of Wikipedia articles, they could influence language models trained on web data to produce incorrect responses about specific topics. The modifications persisted through model training and appeared in model outputs.

Scale of Exposure:

Large language models train on vast datasets:

Common Crawl: Petabytes of web content, inherently untrusted
GitHub code: Includes intentionally malicious examples, vulnerable code
Social media: Easily manipulated by motivated actors

Models trained on internet-scale data inherit internet-scale trust issues.

Fine-Tuning and Transfer Learning Risks¶

Most ML applications don't train from scratch—they fine-tune pre-trained models on domain-specific data.

Transfer Learning Supply Chain:

Fine-tuning creates a layered supply chain:

Base model (pre-trained on large data, often by third party)
Fine-tuning data (organization's specific data)
Fine-tuned model (combination of both)

Issues in the base model propagate to fine-tuned versions.

Inherited Vulnerabilities:

Fine-tuned models inherit from their base models:

Backdoors may persist through fine-tuning
Biases in base models appear in fine-tuned versions
Vulnerabilities in base model architecture carry forward

Research has shown that backdoors inserted during pre-training can survive fine-tuning, affecting all downstream applications.

Fine-Tuning Data Risks:

The data used for fine-tuning also requires scrutiny:

Is fine-tuning data from trusted sources?
Could adversaries have influenced fine-tuning data?
Are there quality controls on fine-tuning datasets?

LoRA and Adapter Security:

Low-Rank Adaptation (LoRA) and similar techniques produce small adapter files that modify base model behavior. These adapters:

Can be shared independently of base models
May contain malicious behavioral modifications
Inherit the trust model of base models plus adapter-specific risks

Adversarial Attacks and Model Extraction¶

Beyond supply chain compromise during distribution, deployed models face ongoing threats:

Adversarial Examples:

Carefully crafted inputs can cause models to misbehave:

Image perturbations invisible to humans but causing misclassification
Text modifications that bypass content filters
Audio inputs that trigger unintended speech recognition

While not strictly supply chain issues, adversarial robustness relates to model integrity.

Model Extraction:

Attackers with API access to models can potentially:

Reconstruct model behavior through queries
Steal intellectual property embedded in models
Create copies that bypass access controls
Identify vulnerabilities through systematic probing

Training Data Extraction:

Research has demonstrated that models sometimes memorize and can reproduce training data:

Personal information present in training data
API keys and credentials from code training data
Copyrighted content

This creates both privacy and security risks.

Open Source vs. Proprietary Risk Profiles¶

Open source and proprietary models present different supply chain considerations:

Open Source Models:

Advantages: - Weights and architecture are inspectable - Training details may be documented - Community review possible - Can be run locally without external dependencies

Risks: - Anyone can publish models claiming to be official - No guaranteed security review - Fork confusion (which version is authentic?) - Serialization vulnerabilities if pickle-based

Proprietary/API Models:

Advantages: - Provider handles security of model artifacts - No serialization vulnerabilities (API access only) - Provider may implement security measures - Clear accountability for model behavior

Risks: - Cannot inspect model internals - Provider becomes single point of trust - API access creates availability dependency - Provider's training practices are opaque

Hybrid Approaches:

Many organizations use combinations: - Open source base models with proprietary fine-tuning - Local deployment of open models for sensitive applications - API access for general use, local models for critical paths

Model Cards and Transparency¶

Model cards provide structured documentation about model provenance and characteristics:

Standard Model Card Elements:

Model description and intended uses
Training data description
Evaluation results and limitations
Ethical considerations and biases
Environmental impact of training

Supply Chain Relevance:

Model cards can document:

Who trained the model and when
What data was used
What safety evaluations were performed
Known limitations and risks

However, model cards are self-reported by publishers—they don't provide verification.

Emerging Standards:

MITRE ATLAS: Framework for ML threat modeling
ML BOM: Bill of materials concepts for ML systems
Model signing: Cryptographic verification of model provenance

Recommendations¶

For ML Practitioners:

Use SafeTensors when possible. Prefer models distributed in SafeTensors format. Convert pickle-based models before deploying.
Verify model sources. Download from official repositories. Verify organization accounts. Check for verified badges on Hugging Face.
Scan before loading. Use tools like picklescan to check pickle files before loading. Never load untrusted pickle files.
Review model cards. Understand training data, intended uses, and limitations before deployment.
Test beyond benchmarks. Standard evaluations don't reveal backdoors. Test with adversarial and edge cases.
Document your model supply chain. Track which base models you use, their sources, and any fine-tuning applied.

For Security Practitioners:

Include ML in threat models. Model files are code execution vectors. Treat them with appropriate caution.
Establish model approval processes. Require security review before new models are deployed.
Monitor model registries. Watch for suspicious uploads or modifications to models your organization uses.
Implement model scanning. Deploy automated scanning for serialization vulnerabilities in ML pipelines.
Consider model provenance. Evaluate not just the model but its training lineage—base models, datasets, and fine-tuning sources.

For Organizations:

Define ML supply chain policies. Specify approved sources, required formats, and security requirements for models.
Isolate model loading. Load untrusted models in sandboxed environments to contain potential exploitation.
Maintain model inventory. Track deployed models, their sources, and versions for vulnerability management.
Plan for model incidents. Know how you'll respond if a model you depend on is found to be compromised.
Invest in ML security expertise. Traditional security practitioners may not understand ML-specific threats. Build or acquire relevant expertise.

The ML model supply chain is younger and less mature than traditional software supply chains. Many security lessons from decades of package manager experience apply—but ML introduces unique risks around poisoning, backdoors, and serialization. As organizations increasingly build on pre-trained models, establishing robust ML supply chain security practices becomes essential. The models you depend on are as important to secure as the code you run. ML model supply chain risks

Hugging Face platform statistics, https://huggingface.co/metrics - Integration with major ML frameworks (PyTorch, TensorFlow, JAX) ↩