Corporate VPN network using WireGuard, connecting thousands of developers to 200+ AWS accounts (plus a handful of GCP accounts). Centralized logging in Splunk. Observed a ~2x increase in the background chatter "noise floor" - elevated network traffic volume concentrated on 25% of the VPN gateways.

Hypothesis

An adversary has established a slow, low-signal exfiltration channel buried in high-volume VPN traffic, and is deliberately inflating background noise to degrade signal-to-noise ratio in logging - making anomalous traffic indistinguishable from legitimate developer activity. Classic "hide in the noise" tradecraft.

Environment Profile

Attribute	Value
VPN Technology	WireGuard
Cloud Providers	AWS (primary, 200+ accounts), GCP (handful)
Centralized Logging	Splunk
Noise Distribution	Clustered on 25% of gateways (not uniform)
Active Hunting	Authorized (canaries, honey tokens, decoys)
Developer Count	Thousands

Operational Decomposition

Domain Assignment

Domain	Specialist	Primary Scope
Identity & Access	Identity Specialist	WireGuard peer analysis, auth patterns, credential hygiene, VPN session anomaly detection
Cloud	Cloud Specialist	AWS/GCP egress mapping, API call volume baselining, data staging indicators, IAM credential anomalies, cross-cloud correlation
Wireless	N/A (ruled out)	Noise confirmed as network-level chatter, not RF spectrum

Key Intelligence Lever

The noise is on 25% of gateways, not uniformly distributed. This is the primary triage pivot point. The adversary is likely operating through or adjacent to those specific gateways. Accounts and peers that disproportionately route through the noisy gateways are priority investigation targets.

Identity & Access Package

Deliverables

Location: wireguard-hunt-package/

File	Type	Lines	Description
`README.md`	Documentation	1,019	Complete hunt package: SPL queries (§1.1–1.8), 5-phase playbook with decision gates, operator checklist, script reference, server key compromise section
`wg_analysis.py`	Python	713	WireGuard peer analysis engine - `wg show all dump` parser, stateful delta computation, baseline builder, anomaly detector. Modes: `--stdin`, `--file`, `--interval`, `--baseline`, `--detect`
`wg_canary.py`	Python	506	Canary keypair lifecycle - `--generate`, `--monitor --local`, `--monitor --splunk`, `--deploy`, `--cleanup`
`splunk_search.py`	Python	430	Splunk REST API wrapper - 8 detection functions exposed as importable library and CLI subcommands. Zero pip dependencies (stdlib urllib)
`hunt_orchestrator.py`	Python	667	Master orchestrator - chains triage → deep-hunt → canary → contain. Imports from `wg_analysis` and `splunk_search`
`wg_contain.sh`	Bash	297	Emergency containment - SSH-driven peer discovery, evidence collection, revocation, verification. Dry-run mode without `--confirm`
`build_gateway_lookup.sh`	Bash	181	Gateway lookup table builder - resolves hostnames to IPs, prompts for noisy/quiet tagging, outputs Splunk-compatible CSV

Detection Surfaces Covered

Handshake Rate Spikes - Rapid WireGuard handshake cycling (>5/min) indicates automated C2 keepalive or deliberate noise injection
Peer Connection Churn - Peers repeatedly appearing/disappearing in wg show snapshots (>5 cycles in 10 min) indicates automated persistence
Endpoint IP Diversity - Single peer key authenticating from >3 distinct IPs in 24h indicates key exfiltration or adversary IP rotation
Abnormal Handshake Timing - Off-hours handshake ratio >50%, burst patterns, uniform-interval beacons vs. human-driven patterns
Gateway-Specific Noise Clustering - Peers exclusive to the noisy 25% of gateways when other gateways are available
Data-Plane Volume Anomalies - Peers with >5x median byte transfer versus peer group on same gateway
Configuration Drift - New peers added outside change management, AllowedIPs range expansions indicating lateral movement preparation
Correlation: Noisy-Only × High Volume - Compound query combining gateway exclusivity with data transfer anomalies

Canary Deployment Architecture

Decoy WireGuard keypairs deployed to noisy gateways with unique RFC 6598 subnets
Honey services (HTTP/SSH) on canary subnets logging access attempts
Canary peer handshake detection triggers immediate escalation
Seeded WireGuard private keys as "leaked" credentials in internal repos

Server Key Compromise Coverage

Detection of unauthorized wg set commands and config file modifications
Gateway impersonation detection via cloud traffic attribution
Service restart/configuration reload monitoring outside maintenance windows
Full server key rotation procedure with blast radius calculation
Communication plan for affected developers
Post-rotation validation checklist

Cloud Package

Deliverables

Location: cloud-hunt-package/

File	Type	Lines	Description
`README.md`	Documentation	740	Complete hunt package: SPL queries, 5-phase playbook with decision gates, operator checklist, server key compromise section, script reference
`cloud_triage.py`	Python	390	Account triage via CloudTrail - enumerates all org accounts, cross-references CloudTrail source IPs against VPN gateway lookup, assigns Tier 0-3 based on noisy-call percentage
`egress_hunt.py`	Python	497	Deep egress analysis - VPC Flow Log querying, destination IP geolocation/ASN classification (ip-api.com), thin-slice exfiltration detection via coefficient of variation
`deploy_honeytokens.py`	Python	711	Full honeytoken lifecycle - creates IAM users/keys, S3 buckets, CloudTrail trails, GCP service accounts/GCS buckets. Monitor polls CloudTrail every 60s. Destroy cleans up all resources
`collect_forensics.sh`	Bash	430	Evidence preservation - CloudTrail events, S3 access logs, VPC Flow Logs, IAM snapshot, GuardDuty findings, GCP equivalents. SHA256 hashing with macOS fallback. Idempotent
`validate_prereqs.py`	Python	410	Pre-flight readiness - 11 checks (gateway lookup, AWS/GCP credentials, CloudTrail, GuardDuty, VPC Flow Logs, Splunk reachability). PASS/WARN/FAIL with colored output

Detection Surfaces Covered

Egress Volume Baseline Deviation - Per-account egress compared to 30-day rolling baseline, flagging >50% deviation with >100MB absolute volume
Thin-Slice Exfiltration - Metronomic small transfers (50KB-5MB per 10-min bucket) with coefficient of variation <30% over >2 hours
API Call Volume Inflation - Enumeration detection via List*/Describe*/Get* CloudTrail call spikes (>2x baseline, >500 calls in 15 min)
Enumeration-to-Egress Correlation - Accounts with simultaneous API spike AND egress spike (staging + exfiltration signal)
S3 Data Access Anomalies - Bulk GetObject from historically untouched buckets, access from non-VPN IPs
IAM Credential Usage - Access keys used from non-VPN IPs, geographic impersonation (multi-country usage within <60 min), new keys created during noise window
Gateway-to-Cloud Correlation - Account prioritization by proportion of CloudTrail events originating from noisy gateway IPs
Data Staging Indicators - New S3 buckets with public/cross-account ACLs, CloudTrail trail disruption attempts, aggressive S3 lifecycle policies (<7 day expiration)
GCP-Specific - Compute/storage enumeration spikes, service account key creation, GCS bulk access, BigQuery extract jobs
Cross-Cloud Data Flows - AWS-to-GCP traffic not matching documented application dependencies

AWS Honey Token Architecture

IAM user honey-devops-bot-<suffix> with access key seeded into wikis, Slack, Git repos, CI/CD configs
Decoy S3 bucket prod-db-backup-snapshot-<suffix> with realistic decoy objects
Canary CloudTrail trail security-audit-trail-hunt monitoring for disruption
GCP equivalents: service account honey-monitoring-bot-<suffix>, GCS bucket prod-database-exports-<suffix>
Monitor polls CloudTrail every 60 seconds; any trigger exits with code 1 and alert details

Server Key Compromise Coverage (Cloud Perspective)

CloudTrail events from gateway IPs with zero active WireGuard peer sessions - definitive correlation query
Anomalous IAM role assumption (sts:AssumeRole) from gateway source IPs
Gateway isolation procedures: VPC endpoint policy denial, security group/NACL blocking
Credential revocation for all roles accessible from compromised gateway's network path
Forensic imaging guidance for gateway machine
Long-term hardening: network segmentation, EventBridge alerting on gateway IP activity

Hunt Playbook (Unified)

Phase 1 - Triage (First 30 Minutes)

Step	Owner	Action
1.1	Identity	Confirm noisy gateways via handshake volume z-score analysis
1.2	Identity	Time-bound noise onset - identify step-change inflection point
1.3	Identity	Peer key inventory - compare unique peer count pre/post noise onset per gateway
1.4	Identity	Identify peers exclusive to noisy gateways (never seen on quiet gateways)
1.5	Cloud	Run account triage: rank all AWS accounts by noisy-gateway affinity percentage
1.6	Cloud	Check for CloudTrail trail disruption events from noisy gateway IPs

Decision Gate: ≥5 exclusive peers OR ≥1 Tier 0 account (>50% noisy affinity) → proceed to Phase 2. If both clean after 72h → stand down, monitor.

Phase 2 - Deep Hunt (2-4 Hours)

Step	Owner	Action
2.1	Identity	Endpoint IP diversity on suspect peers - flag >3 IPs/24h, cross-country movement
2.2	Identity	Handshake rate profiling - burst patterns, off-hours clustering
2.3	Identity	Data volume deep dive - >5x median byte transfer vs peer group
2.4	Identity	AllowedIPs audit - route expansions, new 0.0.0.0/0 entries
2.5	Identity	Peer config provenance - verify suspect peers exist in config management
2.6	Cloud	Per-account egress deviation against 30-day baseline
2.7	Cloud	Destination profiling: ASN classification, threat intel check
2.8	Cloud	Thin-slice exfiltration detection on Tier 0-1 accounts
2.9	Cloud	IAM credential usage anomalies, new access keys from noisy gateways

Decision Gate: ≥1 peer meets ≥3 criteria OR egress destination resolves to hosting/VPS ASN with >500MB → proceed to Phase 3.

Phase 3 - Data Staging Hunt

Step	Owner	Action
3.1	Cloud	S3 access pattern analysis - principals with bulk GetObject from historically untouched buckets
3.2	Cloud	S3 bucket modification timeline - buckets created/modified during noise window
3.3	Cloud	IAM change audit - new roles/policies/keys created from noisy gateway IPs
3.4	Cloud	Data volume estimation - total bytes exfiltrated per suspicious principal

Decision Gate: Staging indicators found → Phase 4. If egress confirmed but no staging indicators → expand to RDS/Redshift logs.

Phase 4 - Active Deception

Step	Owner	Action
4.1	Identity	Deploy WireGuard canary peers to noisy gateways
4.2	Identity	Deploy honey services on canary subnets
4.3	Cloud	Deploy AWS honey token credentials (IAM user + access key)
4.4	Cloud	Create decoy S3 buckets with realistic data
4.5	Cloud	Create canary CloudTrail trail for disruption monitoring
4.6	Cloud	Deploy GCP honey tokens (service account, GCS bucket)
4.7	Both	Seed honey tokens into developer-facing locations
4.8	Both	Monitor for canary activation - any trigger = confirmed adversary

Decision Gate: Any honey token triggered → IMMEDIATE containment (Phase 5). No trigger after 7 days → stand down active deception, maintain canaries as ongoing tripwires.

Phase 5 - Containment & Evidence

Step	Owner	Action
5.1	Identity	Revoke compromised WireGuard peers via `wg_contain.sh`
5.2	Cloud	Revoke compromised IAM access keys (inactive, DO NOT delete)
5.3	Cloud	Revoke compromised role sessions via trust policy modification
5.4	Cloud	Apply emergency SCPs at OU level to deny egress (if Organizations)
5.5	Cloud	Rotate all access keys for principals in Tier 0-1 accounts
5.6	Both	Preserve evidence: CloudTrail exports, VPC Flow Logs, S3 access logs, IAM snapshots, WireGuard peer dumps
5.7	Both	Splunk forensic CSV exports for suspect peers and accounts
5.8	Both	Notify incident commander with full handoff package

Escalation Triggers

Trigger	Severity	Owner
Peer with >10 endpoint IPs in 24h OR cross-country IPs	Tier 2	Identity
Peer handshake burst >10 in <60 seconds	Tier 2	Identity
Peer data transfer >5x median for gateway	Tier 2	Identity
Peer data transfer >500 MB in 24h	Tier 3	Identity
AllowedIPs expanded to 0.0.0.0/0	Tier 3	Identity
Canary peer handshake detected	IMMEDIATE Tier 3	Identity
CloudTrail StopLogging/DeleteTrail from noisy gateway IP	P0 Incident	Cloud
Honey access key used in AWS	IMMEDIATE Tier 3	Cloud
GuardDuty Exfiltration:* finding	P0 Incident	Cloud
Egress to threat-intel-blocklisted IP	CRITICAL	Cloud
Server key compromise confirmed	CRITICAL - Full gateway rotation	Both

Account Prioritization Matrix (200+ Accounts)

Priority	Criteria	Action Window
P0 - Immediate	Honey token triggered OR CloudTrail trail disrupted OR GuardDuty Exfiltration finding OR dest IP on threat intel	15 minutes
P1 - Urgent	Tier 0 account (>50% noisy affinity) + egress to unknown hosting ASN	1 hour
P2 - High	Tier 1 account (25-50% noisy) + egress deviation >200% OR thin-slice pattern confirmed	4 hours
P3 - Medium	Tier 1-2 account + egress deviation 100-200%	24 hours
P4 - Monitor	Tier 3 account (<10% noisy) but any egress anomaly present	72 hours
P5 - Stand Down	No gateway affinity, no egress anomalies, no IAM changes	N/A

Cross-Cutting Dependencies

Dependency	Owner	Status
`vpn_gateway_lookup.csv` population	Identity (`build_gateway_lookup.sh`)	✅ Scripted
WireGuard `wg show all dump` polling	Identity (`wg_analysis.py --interval`)	✅ Scripted
Splunk REST API access (bearer token or username/password)	Both (`splunk_search.py`, `validate_prereqs.py`)	✅ Scripted
AWS Organizations access (list-accounts)	Cloud (`cloud_triage.py --org`)	✅ Scripted
CloudTrail LookupEvents (7-day limit, 50/call paginated)	Cloud (`cloud_triage.py`)	✅ Scripted with documented fallback
VPC Flow Logs in CloudWatch Logs (or S3 fallback)	Cloud (`egress_hunt.py`)	✅ Scripted with documented limitation
S3 server access logs enabled on target buckets	Cloud	⚠️ Must be pre-configured - scripts document this gap
CloudTrail data events enabled on S3 buckets	Cloud	⚠️ Must be pre-configured - scripts document this gap
ip-api.com rate limiting (45 req/min)	Cloud (`egress_hunt.py`)	✅ Built-in rate limiting
WireGuard kernel module / `wg` tool available on gateways	Identity	⚠️ Required for local polling mode

Deploy Sequence

# ==========================================
# IDENTITY SIDE
# ==========================================
cd wireguard-hunt-package/

# Step 1: Build gateway lookup (prerequisite for all correlation)
export SPLUNK_HOST=splunk.internal.example.com
export SPLUNK_TOKEN="your-bearer-token"
./build_gateway_lookup.sh --input gateways.txt --output vpn_gateway_lookup.csv

# Step 2: Run triage
python3 hunt_orchestrator.py --phase triage \
    --gateway-list gateways.txt \
    --output suspect_peers.json

# Step 3: If suspects found → deep hunt
python3 hunt_orchestrator.py --phase deep-hunt \
    --suspect-peers suspect_peers.json \
    --output deep_hunt_findings.json

# Step 4: Deploy canaries
python3 hunt_orchestrator.py --phase canary --deploy

# Step 5: Monitor (blocking)
python3 hunt_orchestrator.py --phase canary --monitor --timeout 3600

# Step 6: If confirmed → contain
python3 hunt_orchestrator.py --phase contain \
    --peer-key <PUBLIC_KEY> \
    --gateway-list gateways.txt \
    --confirm

# ==========================================
# CLOUD SIDE (in parallel)
# ==========================================
cd cloud-hunt-package/

# Step 1: Validate environment
python3 validate_prereqs.py \
    --gateway-lookup ../wireguard-hunt-package/vpn_gateway_lookup.csv

# Step 2: Triage all org accounts
python3 cloud_triage.py \
    --gateway-lookup ../wireguard-hunt-package/vpn_gateway_lookup.csv \
    --org --days 7 \
    --output triage_results.json

# Step 3: Deep egress analysis on Tier 0/1 accounts
python3 egress_hunt.py \
    --tier-file triage_results.json \
    --days 14 \
    --output egress_findings.json

# Step 4: Deploy honeytokens while investigation proceeds
python3 deploy_honeytokens.py --create --canary-trail

# Step 5: Monitor (blocking, run in screen/tmux)
python3 deploy_honeytokens.py --monitor --timeout 604800

# Step 6: Collect forensics if incident confirmed
./collect_forensics.sh \
    --aws-account 123456789012 \
    --output-dir /secure/forensics/

# Step 7: Cleanup after hunt concludes
python3 deploy_honeytokens.py --destroy

Server Key Compromise - Joint Detection Strategy

Why It Matters

A WireGuard server's private key is the cryptographic identity of the VPN gateway. If an adversary obtains this key, they can decrypt and inject traffic that appears to originate from the gateway's IP address - without establishing any peer session. This means they bypass all peer-level detection (handshake analysis, endpoint IP diversity, churn detection) because they never appear as a peer.

Detection

Signal	Owner	Method
Unauthorized `wg set` commands or private-key file modifications on gateway hosts	Identity	File integrity monitoring, command audit logging, Splunk query for `wg set` events outside maintenance windows
CloudTrail events from gateway IPs with zero active WireGuard peer sessions	Cloud	SPL correlation query joining CloudTrail `sourceIPAddress` against WireGuard peer session data
Anomalous `sts:AssumeRole` from gateway source IPs	Cloud	CloudTrail monitoring - gateway IPs should rarely/never assume IAM roles
WireGuard service restarts or config reloads outside maintenance windows	Identity	Systemd/init script audit logging

Response

Isolate affected gateway from cloud networks - VPC endpoint policy denial, security group/NACL blocking
Revoke all credentials accessible from gateway's network path - IAM users, roles, STS tokens
Rotate WireGuard server key - generate new keypair, distribute updated configs to all legitimate peers
Monitor for failed handshakes with old key - indicates adversary attempting to reconnect
Rotate ALL credentials in every account accessible from the compromised gateway's network segment
Forensic imaging of gateway machine - preserve memory dumps, WireGuard configs, system logs before rotation
Long-term hardening - network segmentation, EventBridge alerting on gateway IP activity, client certificate authentication defense-in-depth

Known Gaps and Limitations

Gap	Severity	Mitigation
No automated cross-correlation between Identity peer list and Cloud account tier list	Medium	Manual cross-reference in Phase 2; next iteration should add to `hunt_orchestrator.py`
S3 access logs and CloudTrail data events must be pre-enabled on target buckets	High	Documented in scripts; requires pre-configuration before hunt begins
CloudTrail LookupEvents limited to 7 days history and 50 results per page	Medium	Documented fallback: Athena queries against CloudTrail S3 data or Splunk for deeper history
VPC Flow Log analysis depends on CloudWatch Logs publishing; S3-only publishing requires pre-downloaded files	Low	Documented in `egress_hunt.py`
WireGuard server key compromise bypasses peer-level detection entirely	High	Addressed via joint detection strategy (see above); server key rotation procedure documented
ip-api.com free tier: 45 requests/minute, no SLA	Low	Built-in rate limiting in `egress_hunt.py`; optional MaxMind GeoLite2 local database path

Package Sizes

Package	Files	Total Lines	Python	Bash
Identity	7	~3,800	4 scripts	2 scripts
Cloud	6	~2,500	4 scripts	1 script
Combined	13	~6,300	8 scripts	3 scripts

Identity Scripts

wg_analysis.py - 713 lines (WireGuard peer analysis engine)
wg_canary.py - 506 lines (canary keypair lifecycle)
splunk_search.py - 430 lines (Splunk REST API wrapper)
hunt_orchestrator.py - 667 lines (phase orchestrator)
wg_contain.sh - 297 lines (emergency containment)
build_gateway_lookup.sh - 181 lines (gateway lookup builder)

Cloud Scripts

cloud_triage.py - 390 lines (account triage via CloudTrail)
egress_hunt.py - 497 lines (deep egress analysis)
deploy_honeytokens.py - 711 lines (honeytoken lifecycle)
collect_forensics.sh - 430 lines (evidence preservation)
validate_prereqs.py - 410 lines (pre-flight readiness)

Review Verdict

Both packages are cleared for operational deployment.

All scripts are production-ready, not stubs
Error handling is present throughout
Python dependencies documented at top of each script
Bash scripts handle Linux/macOS portability
SPL queries preserved from initial detection library
Playbooks and checklists are phase-structured with explicit decision gates
Server key compromise coverage exists from both Identity (gateway-side) and Cloud (API-side) perspectives
Escalation triggers are specific and threshold-based
Prioritization matrix handles 200+ accounts without analyst burnout

Readme.md Unescape Escape

Scenario: Noise Floor Cloud Hunt

SITREP - Engagement Overview

Initial Report

Hypothesis

Environment Profile

Operational Decomposition

Domain Assignment

Key Intelligence Lever

Identity & Access Package

Deliverables

Detection Surfaces Covered

Canary Deployment Architecture

Server Key Compromise Coverage

Cloud Package

Deliverables

Detection Surfaces Covered

AWS Honey Token Architecture

Server Key Compromise Coverage (Cloud Perspective)

Hunt Playbook (Unified)

Phase 1 - Triage (First 30 Minutes)

Phase 2 - Deep Hunt (2-4 Hours)

Phase 3 - Data Staging Hunt

Phase 4 - Active Deception

Phase 5 - Containment & Evidence

Escalation Triggers

Account Prioritization Matrix (200+ Accounts)

Cross-Cutting Dependencies

Deploy Sequence

Server Key Compromise - Joint Detection Strategy

Why It Matters

Detection

Response

Known Gaps and Limitations

Package Sizes

Identity Scripts

Cloud Scripts

Review Verdict

Readme.md