19.2 Containment Strategies¶
When the Codecov breach was disclosed in April 2021, affected organizations faced an uncomfortable reality: the compromised bash uploader had been exfiltrating environment variables—including CI/CD secrets, API keys, and access tokens—for over two months. Containment was not simply a matter of removing the malicious script. Every secret that had passed through affected CI/CD pipelines was potentially compromised. Twilio, HashiCorp, and hundreds of other organizations had to rotate credentials across their entire infrastructure, a process that took weeks and disrupted operations significantly.
Supply chain containment differs fundamentally from traditional incident response. You cannot simply isolate a compromised host or block an attacker's IP address. The "attacker" is running as part of your legitimate software, using your credentials, operating within your trust boundaries. Containment requires identifying everywhere the compromised component exists, understanding what access it had, and stopping the propagation of compromise through your dependency graph—all while maintaining business operations where possible.
Immediate Actions: Stop the Bleeding¶
The first minutes of containment focus on preventing additional damage. Speed matters more than completeness at this stage; you can refine your response as you learn more.
Immediate action checklist:
- Stop deploying the compromised component
- Halt CI/CD pipelines that build or deploy affected applications
- Block the compromised package version in artifact repositories
-
Prevent auto-update mechanisms from pulling new (potentially also compromised) versions
-
Revoke secrets that may have been exposed
- Identify credentials accessible to the compromised component
- Rotate API keys, tokens, and passwords immediately for highest-value systems
-
Disable service accounts associated with affected applications
-
Isolate actively exploited systems
- If you observe active exploitation (data exfiltration, lateral movement), isolate affected hosts
-
Consider network segmentation rather than full shutdown to preserve evidence
-
Preserve current state before changes
- Snapshot running containers, VMs, or instances before termination
- Capture memory dumps if active exploitation is suspected
-
Export current logs before rotation
-
Activate incident response team
- Establish communication channel (dedicated Slack channel, bridge line)
- Assign incident commander
- Begin documentation immediately
# Example: Pin to known-good version in package.json
"dependencies": {
"compromised-package": "1.2.2" # Last known good version
}
# Example: Block in Artifactory
jfrog rt delete "npm-remote-cache/compromised-package/-/compromised-package-1.2.3.tgz"
The tension between speed and precision is real. Rotating every secret in your organization might take days; rotating only the obviously affected secrets might leave some compromised credentials active. We recommend erring toward over-rotation initially for high-value credentials (cloud provider keys, production database access, payment system credentials), while investigating to narrow the scope for lower-impact credentials.
Identifying the Scope: What Systems Are Affected?¶
Scope assessment determines the boundaries of your response. The goal is to identify every system, application, and environment that included the compromised component.
Scope assessment methodology:
-
Inventory affected versions: Determine which specific versions contain the compromise. Registry advisories, maintainer communications, or your own analysis will provide this information.
-
Search dependency trees: Scan all applications for direct and transitive dependencies on the compromised component.
-
Check all environments: Production is often the focus, but staging, development, CI/CD systems, and developer workstations may also be affected.
-
Identify build artifacts: Applications built during the compromise window may contain the malicious code even if current source no longer references it.
-
Trace data and credential flows: Determine what the compromised component had access to—files, environment variables, network resources.
SBOM usage in scope assessment:
If you maintain Software Bills of Materials (SBOMs) for your applications, scope assessment becomes dramatically faster. SBOMs provide machine-readable inventories of components, enabling rapid queries:
# Search SBOMs for affected component (using syft/grype format)
for sbom in sboms/*.json; do
if jq -e '.artifacts[] | select(.name == "compromised-package" and .version == "1.2.3")' "$sbom" > /dev/null; then
echo "AFFECTED: $sbom"
fi
done
# Using SPDX format with spdx-tools
# Note: Requires spdx-tools (https://github.com/spdx/tools)
spdx-tool query --package "compromised-package" --version "1.2.3" sbom-directory/
Without SBOMs, you must scan source repositories, build systems, and deployed artifacts:
# Search lockfiles across repositories
find /repos -name "package-lock.json" -exec grep -l "compromised-package" {} \;
find /repos -name "Pipfile.lock" -exec grep -l "compromised-package" {} \;
find /repos -name "Cargo.lock" -exec grep -l "compromised-package" {} \;
# Check container images (using Trivy: https://trivy.dev/)
for image in $(docker images --format "{{.Repository}}:{{.Tag}}"); do
trivy image --list-all-pkgs "$image" | grep -q "compromised-package" && echo "AFFECTED: $image"
done
Temporal scope is equally important. Determine when the compromise was introduced and when it was active in your environment:
- When was the malicious version published?
- When did your systems first pull that version?
- When did you deploy applications containing it?
- What data or systems were accessible during that window?
This timeline shapes your remediation—secrets used before the compromise window may not need rotation.
Dependency Lockdown Techniques¶
Supply chain compromises can propagate through your dependency graph as developers update packages or CI/CD pipelines pull fresh dependencies. Dependency lockdown prevents this propagation.
Registry-level blocks:
If you operate a private package repository (Artifactory, Nexus, Verdaccio), block the compromised package from being served:
# Artifactory: Create a property set to mark packages as blocked
# Then configure a download policy that denies packages with that property
# Nexus: Create routing rule to block specific packages
- name: block-compromised
mode: BLOCK
matchers:
- "compromised-package"
Lockfile enforcement:
Ensure builds use only versions specified in lockfiles, preventing unexpected updates:
# npm: Use ci instead of install
npm ci # Fails if lockfile doesn't match package.json
# pip: Install from locked requirements
pip install --require-hashes -r requirements.txt
# cargo: Use --locked flag
cargo build --locked
Version pinning:
For components you cannot fully block, pin to known-good versions:
// package.json - pin to specific version, not range
{
"dependencies": {
"affected-package": "1.2.2" // Not "^1.2.2" or "~1.2.2"
},
"overrides": {
"affected-package": "1.2.2" // Override transitive dependencies too
}
}
Dependency allow-lists:
For high-security environments, consider moving to an allow-list model where only explicitly approved packages can be installed:
# Example policy for package approval
approved_packages:
- name: lodash
versions: ["4.17.21"]
approved_by: security-team
approved_date: 2025-01-15
Network Isolation and Access Revocation¶
Compromised components often exfiltrate data or connect to command-and-control infrastructure. Network controls limit this capability.
Network isolation strategies:
-
Block known-bad destinations: If threat intelligence identifies C2 servers or exfiltration endpoints, block them at the firewall, proxy, or DNS level.
-
Restrict outbound access: Tighten egress rules for affected applications. Many applications need minimal outbound connectivity; the compromise period is an opportunity to enforce this.
-
Segment affected systems: Move compromised hosts to isolated network segments where they can be studied without risk to other systems.
-
Disable DNS resolution for suspicious domains: Block resolution of recently registered or suspicious domains at your DNS resolver.
# Block known C2 domain at DNS level (example for dnsmasq)
echo "address=/malicious-domain.com/" >> /etc/dnsmasq.d/blocklist.conf
# Firewall rule to block specific IP (iptables)
iptables -A OUTPUT -d 192.0.2.1 -j DROP
# For Kubernetes, apply restrictive NetworkPolicy
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: isolate-compromised-app
spec:
podSelector:
matchLabels:
app: compromised-app
policyTypes:
- Egress
egress: [] # Block all egress
EOF
Access revocation:
Beyond network controls, revoke the access that the compromised component might have leveraged:
- Rotate credentials for cloud provider accounts (AWS IAM keys, GCP service accounts, Azure service principals)
- Invalidate JWT signing keys if the compromise could have exposed them
- Rotate database passwords and connection strings
- Revoke OAuth tokens and API keys for third-party services
- Reset secrets stored in Kubernetes secrets, HashiCorp Vault, or other secret stores
Prioritize revocation based on sensitivity and likelihood of exposure. A secret that appeared in environment variables accessible to the compromised component is higher priority than one stored in an unrelated system.
Preserving Evidence for Analysis¶
Containment actions often destroy evidence. Shutting down a system erases memory; rotating credentials invalidates tokens needed to trace attacker activity; rebuilding from clean images removes artifacts. Evidence preservation must happen before or alongside containment actions.
Evidence preservation procedures:
-
Memory capture: Before terminating processes, capture memory dumps from affected systems. Memory may contain decrypted secrets, network connection details, and runtime state not visible on disk.
-
Disk forensic images: Create bit-for-bit copies of affected systems' storage before remediation. Cloud providers offer snapshot capabilities; on-premises systems require forensic imaging tools.
-
Log collection: Export logs from all relevant sources before they rotate or are overwritten:
- Application logs
- System logs (syslog, journald, Windows Event Log)
- Network flow logs
- Cloud provider audit logs (CloudTrail, GCP Audit Logs)
- CI/CD pipeline logs
-
Container runtime logs
-
Configuration snapshots: Capture current configuration state, including:
- Running container configurations
- Kubernetes manifests
- Infrastructure-as-code state
-
Secret values (securely, for comparison after rotation)
-
Chain of custody: Document who captured what, when, and how. This matters for potential legal proceedings or regulatory investigations.
# Capture memory from container
docker checkpoint create affected-container checkpoint1
# Create disk snapshot (AWS)
aws ec2 create-snapshot --volume-id vol-xxx --description "Incident evidence"
# Export Kubernetes resource state
kubectl get all -n affected-namespace -o yaml > evidence/k8s-state.yaml
# Collect CloudTrail logs
aws s3 sync s3://cloudtrail-bucket/AWSLogs/ evidence/cloudtrail/
Store evidence in a secure, tamper-evident location separate from the systems being investigated.
Communication Protocols¶
Supply chain incidents require broader communication than typical security events. You must coordinate across engineering teams, inform leadership, potentially notify customers and regulators, and often engage with the broader open source community.
Internal communication decision tree:
-
Incident response team (immediate): Security operations, affected application owners, infrastructure team leads
-
Engineering leadership (within hours): CTO/VP Engineering need to understand scope and resource requirements
-
Executive leadership (within hours if significant): CEO/CISO for incidents affecting customer data, production availability, or requiring external disclosure
-
Legal and compliance (if regulated data involved): Data protection officers, legal counsel for breach notification assessment
-
Communications/PR (if external disclosure likely): Prepare statements before information leaks through other channels
External communication considerations:
- Customers: If their data may have been exposed or systems compromised, they need notification to protect themselves
- Regulators: GDPR, HIPAA, and other regulations may require notification within specific timeframes
- Open source community: If you discover a compromise in an upstream project, responsible disclosure to maintainers helps protect others
- ISACs: Industry-specific Information Sharing and Analysis Centers can alert peer organizations
A common incident response principle emphasizes: communicate early and often internally; communicate carefully and accurately externally; never speculate about scope or impact in public statements.
Containment vs. Business Continuity Trade-offs¶
Perfect containment would isolate every potentially affected system immediately. Business reality requires balancing security against operational continuity.
Trade-off considerations:
| Action | Security Benefit | Business Impact | Recommendation |
|---|---|---|---|
| Halt all deployments | Prevents spreading compromise | Blocks all releases | Usually acceptable short-term |
| Rotate all secrets | Eliminates credential risk | May cause outages | Staged rotation by priority |
| Isolate production systems | Limits active exploitation | Service disruption | Only if active exploitation confirmed |
| Rebuild all containers | Ensures clean state | Extended downtime | Schedule in maintenance windows |
Pragmatic containment approach:
-
Critical path protection: Identify the absolute minimum services that must remain operational and focus containment around protecting them specifically.
-
Risk-based prioritization: Not all affected systems are equally sensitive. Contain high-value targets (production, customer data, financial systems) before lower-risk environments.
-
Parallel workstreams: While containment proceeds on affected systems, prepare clean environments for recovery. This shortens overall impact.
-
Defined decision points: Establish thresholds for escalating containment. "If we find evidence of data exfiltration, we isolate all production systems regardless of business impact."
Document containment decisions and their rationale. In post-incident review, you'll want to understand why specific trade-offs were made and whether they were appropriate given what was known at the time.
We recommend establishing these containment protocols before an incident occurs. Tabletop exercises that walk through supply chain scenarios help teams practice decision-making and identify gaps in capabilities before they face a real compromise.