18.4 Runtime Verification and Integrity Monitoring¶
Every security control discussed so far—scanning, signing, policy enforcement—operates before deployment. But supply chain attacks are designed to evade these controls, embedding malicious code that activates only at runtime or compromising systems after deployment. When the SolarWinds SUNBURST backdoor executed, it had already passed build-time checks, code signing, and deployment validation. Detection required observing its runtime behavior: the unusual DNS queries, the delayed activation, the unexpected network connections. Runtime security serves as the final detection layer, catching compromises that earlier controls missed.
Runtime verification encompasses techniques that monitor executing software for signs of compromise, integrity violations, or anomalous behavior. Unlike pre-deployment scanning that examines static artifacts, runtime monitoring observes actual execution—what files are accessed, what processes spawn, what network connections form. This visibility can detect attacks that are invisible in static analysis, including fileless malware, living-off-the-land techniques, and supply chain implants that activate conditionally.
File Integrity Monitoring¶
File integrity monitoring (FIM) detects unauthorized changes to files by comparing current file states against known-good baselines. The concept is straightforward: if a file's hash changes unexpectedly, something modified it—potentially an attacker.
Traditional FIM tools like OSSEC, AIDE, and Tripwire periodically scan filesystems, compute cryptographic hashes, and alert on differences from baseline:
# AIDE: Initialize baseline
aide --init
# AIDE: Check for changes
aide --check
# Changed files:
# changed: /usr/bin/ls
# added: /tmp/.hidden_backdoor
The challenge with traditional FIM lies in its periodic nature. Scanning once per hour means an attacker has up to an hour of undetected operation—plenty of time to exfiltrate data, establish persistence, and cover tracks. Modern attacks can modify files, perform malicious actions, and restore originals between scans.
Real-time FIM addresses this gap by hooking filesystem operations directly. Linux's inotify and fanotify subsystems, and more recently eBPF, enable immediate notification when files change:
# Using inotifywait for real-time monitoring
inotifywait -m -r /usr/bin /etc --format '%w%f %e' -e modify,create,delete
For containers, FIM takes on additional significance. Container images are immutable by design—the root filesystem should never change. Any modification to container filesystem areas that should be read-only strongly suggests compromise or misconfiguration.
We recommend combining real-time FIM for critical system paths with periodic comprehensive scans for coverage. Focus FIM on directories attackers commonly target: system binaries (/usr/bin, /usr/sbin), libraries (/lib, /usr/lib), cron directories, SSH configurations, and application deployment directories.
Runtime Application Self-Protection¶
Runtime Application Self-Protection (RASP) embeds security monitoring directly within application runtimes, gaining visibility into application-layer behaviors that external monitoring cannot see. RASP agents instrument applications to detect and sometimes block attacks in real time.
RASP technology can detect supply chain compromises manifesting as:
- Unexpected code paths executing within the application
- Attempts to access sensitive files or environment variables
- Unusual serialization/deserialization patterns (common in dependency attacks)
- Cryptographic operations inconsistent with application purpose
- Network connections to unexpected destinations
Commercial RASP solutions from vendors like Contrast Security, Imperva, and Dynatrace integrate with Java, .NET, Node.js, and other runtimes. Open source options include:
- OpenRASP (Baidu): Supports Java and PHP with plugin architecture
- Sqreen (acquired by Datadog in 2021, now integrated into Datadog Application Security Monitoring)
RASP's strength—deep application visibility—is also its limitation. Agents add performance overhead, require application-specific integration, and can introduce stability risks. RASP works best for high-value applications where the visibility benefits outweigh operational costs.
eBPF-Based Runtime Security¶
Extended Berkeley Packet Filter (eBPF) has revolutionized runtime security by enabling deep kernel-level observability without the risks of kernel module development. eBPF programs run in a sandboxed virtual machine within the kernel, attaching to various hook points to observe system calls, network events, and other low-level operations.
For supply chain security, eBPF-based tools provide visibility into behaviors that would otherwise require kernel-level access or invasive agents:
- Process execution and arguments
- File access patterns
- Network connections and data flows
- System call sequences
- Container and namespace context
Three major eBPF security tools have emerged:
Falco (CNCF graduated project, maintained by Sysdig) focuses on threat detection through customizable rules:
# Falco rule: Detect shell spawned in container
- rule: Terminal shell in container
desc: A shell was spawned in a container
condition: >
spawned_process and container and shell_procs and
proc.tty != 0 and container_entrypoint
output: >
Shell spawned in container (user=%user.name container=%container.name
shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline)
priority: NOTICE
tags: [container, shell]
Falco rules use a domain-specific language based on filtering conditions. The project maintains extensive default rulesets covering common attack patterns.
Tetragon (Cilium project, maintained by Isovalent) provides both detection and enforcement capabilities:
# Tetragon TracingPolicy: Monitor file access
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: monitor-sensitive-files
spec:
kprobes:
- call: "fd_install"
syscall: false
args:
- index: 0
type: int
- index: 1
type: "file"
selectors:
- matchArgs:
- index: 1
operator: "Prefix"
values:
- "/etc/shadow"
- "/etc/passwd"
Tetragon's enforcement capability can kill processes or block operations when policies are violated—moving beyond detection to prevention.
Cilium provides network-layer security using eBPF, enforcing network policies and providing visibility into service-to-service communication:
# Cilium NetworkPolicy: Restrict egress
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: restrict-egress
spec:
endpointSelector:
matchLabels:
app: frontend
egress:
- toEndpoints:
- matchLabels:
app: backend
- toPorts:
- ports:
- port: "443"
protocol: TCP
toFQDNs:
- matchPattern: "*.example.com"
| Tool | Primary Focus | Enforcement | Kubernetes Integration | Overhead |
|---|---|---|---|---|
| Falco | Threat detection | Alert only | Helm chart, DaemonSet | Low |
| Tetragon | Detection + enforcement | Process kill, signal | Cilium integration | Low-medium |
| Cilium | Network security | Network policy | CNI plugin | Medium |
We recommend Falco for organizations beginning their runtime security journey due to its maturity and extensive rule library. Tetragon offers compelling enforcement capabilities for organizations ready for active prevention. Many organizations deploy Cilium for network security alongside Falco or Tetragon for process/file monitoring.
Behavioral Anomaly Detection¶
Rule-based detection catches known attack patterns but misses novel techniques. Behavioral anomaly detection establishes baselines of normal behavior and alerts when execution deviates from those baselines.
Establishing behavioral baselines involves observing and recording:
- Normal process trees (what spawns what)
- Expected network communication patterns
- Typical file access patterns
- Resource consumption profiles
- System call frequency distributions
For containerized workloads, baseline establishment is more tractable than for general-purpose servers. Containers should exhibit predictable behavior—the same container image should behave similarly across deployments. This predictability enables tighter baselines.
# Example behavioral profile (conceptual)
container_image: "myapp:v1.2.3"
expected_behavior:
network:
outbound:
- destination: "api.example.com:443"
- destination: "db.internal:5432"
processes:
- name: "node"
parent: "containerd-shim"
- name: "npm"
parent: "node"
file_access:
read:
- "/app/**"
- "/etc/ssl/certs/**"
write:
- "/tmp/**"
- "/var/log/app/**"
Tools like Sysdig Secure, Lacework, and Aqua Runtime Protection build behavioral models from observed activity, alerting on deviations. Open source options include using Falco with learning mode to generate rules based on observed behavior.
The challenge with behavioral detection is balancing sensitivity. Too sensitive, and legitimate behavior changes trigger alerts; too permissive, and attacks go unnoticed. We recommend starting with broader baselines and tightening based on operational experience.
Container Drift Detection¶
Container drift occurs when a running container's filesystem diverges from its original image. Since container images are immutable, any file modifications (outside designated writable volumes) indicate either misconfiguration or compromise.
Drift detection monitors for:
- New executable files appearing in the container
- Modified binaries or libraries
- Changed configuration files
- Unexpected files in
/tmpor other writable locations
Falco includes drift detection rules:
- rule: Container Drift Detected (chmod)
desc: New executable created in container
condition: >
chmod and container and evt.rawarg.mode contains "S_IXUSR" and
not user_known_chmod_applications
output: >
File made executable in container (user=%user.name command=%proc.cmdline
file=%fd.name container=%container.name image=%container.image.repository)
priority: ERROR
For drift detection to be effective, containers should run with read-only root filesystems where possible:
# Kubernetes: Read-only root filesystem
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp
- name: app-data
mountPath: /var/app/data
With a read-only root filesystem, any attempt to modify the container filesystem fails immediately, providing both detection and prevention.
Alert and Response Workflows¶
Runtime security generates alerts that require investigation and response. Effective workflows connect detection to action:
-
Alert generation: Runtime tool detects anomaly and generates alert with context (container ID, process tree, user, network connections, file paths)
-
Enrichment: Alert is enriched with additional context (image provenance, deployment owner, recent changes, related alerts)
-
Triage: Alert is evaluated for severity and legitimacy. Known false positives are filtered; confirmed issues escalate
-
Investigation: Analyst examines evidence, correlating runtime observations with logs, previous incidents, and threat intelligence
-
Response: Depending on severity, responses range from documentation to container termination to broader incident response activation
Integration with security orchestration platforms (SOAR) automates routine triage steps:
# Example Falco + Kubernetes response
# When critical alert fires, cordon node and capture forensics
apiVersion: falco.org/v1alpha1
kind: FalcoSidekick
spec:
outputs:
- name: kubeless
config:
function: security-response
namespace: security
For supply chain-specific compromises, response should include:
- Preserving affected containers for forensic analysis
- Identifying all deployments using the compromised image
- Checking whether the compromise originated in CI/CD or at runtime
- Reviewing build logs and artifact provenance
- Notifying upstream maintainers if the compromise originated externally
Minimizing False Positives¶
Runtime security is only useful if operators can trust its alerts. Excessive false positives lead to alert fatigue, where genuine alerts are ignored among the noise.
Strategies for reducing false positives:
Tune rules to your environment: Default rulesets are designed for broad applicability, not your specific workloads. Customize rules to exclude known-good behaviors:
# Falco: Add exception for legitimate behavior
- list: known_shell_spawn_containers
items: [debug-tools, admin-container]
- macro: user_known_shell_spawn
condition: container.image.repository in (known_shell_spawn_containers)
Use graduated severity: Not every anomaly warrants immediate response. Configure severity levels that match your response capacity:
- Critical: Immediate automated response (container kill, network isolation)
- High: Alert to on-call with 15-minute response expectation
- Medium: Queue for analyst review during business hours
- Low: Log for trend analysis and baseline refinement
Establish learning periods: When deploying new applications or updating rules, run in learning mode before alerting:
# Falco: Log all alerts without action during learning
falco -o output.stdout.enabled=true -o output.alert.enabled=false
Leverage container context: Containers should behave predictably. Use image labels, namespace, and deployment metadata to apply appropriate rule sets:
# Different policies for different workloads
- rule: Shell in Production Container
condition: shell_procs and container and k8s.ns.name = "production"
priority: CRITICAL
- rule: Shell in Development Container
condition: shell_procs and container and k8s.ns.name = "development"
priority: INFO
Measure and iterate: Track false positive rates by rule. Rules generating many false positives with few true positives should be tuned or disabled. Rules generating true positives should be preserved and potentially tightened.
The goal is not zero false positives, but rather a false positive rate low enough that every alert gets investigated. Balance sensitivity with operational capacity to ensure alerts receive appropriate attention.
We recommend starting with a small, high-confidence rule set and expanding based on operational capacity. A team that can investigate ten alerts per day should not deploy rules that generate fifty. Runtime security is an ongoing operational commitment, not a deploy-and-forget capability.