Scenario: Noise Floor Cloud Hunt
SITREP - Engagement Overview
Date: April 2026 Engagement Type: Active Blue Team Hunt (Active Deception Authorized) Engagement Director: Cyber Operations Director
Initial Report
Corporate VPN network using WireGuard, connecting thousands of developers to 200+ AWS accounts (plus a handful of GCP accounts). Centralized logging in Splunk. Observed a ~2x increase in the background chatter "noise floor" - elevated network traffic volume concentrated on 25% of the VPN gateways.
Hypothesis
An adversary has established a slow, low-signal exfiltration channel buried in high-volume VPN traffic, and is deliberately inflating background noise to degrade signal-to-noise ratio in logging - making anomalous traffic indistinguishable from legitimate developer activity. Classic "hide in the noise" tradecraft.
Environment Profile
| Attribute | Value |
|---|---|
| VPN Technology | WireGuard |
| Cloud Providers | AWS (primary, 200+ accounts), GCP (handful) |
| Centralized Logging | Splunk |
| Noise Distribution | Clustered on 25% of gateways (not uniform) |
| Active Hunting | Authorized (canaries, honey tokens, decoys) |
| Developer Count | Thousands |
Operational Decomposition
Domain Assignment
| Domain | Specialist | Primary Scope |
|---|---|---|
| Identity & Access | Identity Specialist | WireGuard peer analysis, auth patterns, credential hygiene, VPN session anomaly detection |
| Cloud | Cloud Specialist | AWS/GCP egress mapping, API call volume baselining, data staging indicators, IAM credential anomalies, cross-cloud correlation |
| Wireless | N/A (ruled out) | Noise confirmed as network-level chatter, not RF spectrum |
Key Intelligence Lever
The noise is on 25% of gateways, not uniformly distributed. This is the primary triage pivot point. The adversary is likely operating through or adjacent to those specific gateways. Accounts and peers that disproportionately route through the noisy gateways are priority investigation targets.
Identity & Access Package
Deliverables
Location: wireguard-hunt-package/
| File | Type | Lines | Description |
|---|---|---|---|
README.md |
Documentation | 1,019 | Complete hunt package: SPL queries (§1.1–1.8), 5-phase playbook with decision gates, operator checklist, script reference, server key compromise section |
wg_analysis.py |
Python | 713 | WireGuard peer analysis engine - wg show all dump parser, stateful delta computation, baseline builder, anomaly detector. Modes: --stdin, --file, --interval, --baseline, --detect |
wg_canary.py |
Python | 506 | Canary keypair lifecycle - --generate, --monitor --local, --monitor --splunk, --deploy, --cleanup |
splunk_search.py |
Python | 430 | Splunk REST API wrapper - 8 detection functions exposed as importable library and CLI subcommands. Zero pip dependencies (stdlib urllib) |
hunt_orchestrator.py |
Python | 667 | Master orchestrator - chains triage → deep-hunt → canary → contain. Imports from wg_analysis and splunk_search |
wg_contain.sh |
Bash | 297 | Emergency containment - SSH-driven peer discovery, evidence collection, revocation, verification. Dry-run mode without --confirm |
build_gateway_lookup.sh |
Bash | 181 | Gateway lookup table builder - resolves hostnames to IPs, prompts for noisy/quiet tagging, outputs Splunk-compatible CSV |
Detection Surfaces Covered
- Handshake Rate Spikes - Rapid WireGuard handshake cycling (>5/min) indicates automated C2 keepalive or deliberate noise injection
- Peer Connection Churn - Peers repeatedly appearing/disappearing in
wg showsnapshots (>5 cycles in 10 min) indicates automated persistence - Endpoint IP Diversity - Single peer key authenticating from >3 distinct IPs in 24h indicates key exfiltration or adversary IP rotation
- Abnormal Handshake Timing - Off-hours handshake ratio >50%, burst patterns, uniform-interval beacons vs. human-driven patterns
- Gateway-Specific Noise Clustering - Peers exclusive to the noisy 25% of gateways when other gateways are available
- Data-Plane Volume Anomalies - Peers with >5x median byte transfer versus peer group on same gateway
- Configuration Drift - New peers added outside change management, AllowedIPs range expansions indicating lateral movement preparation
- Correlation: Noisy-Only × High Volume - Compound query combining gateway exclusivity with data transfer anomalies
Canary Deployment Architecture
- Decoy WireGuard keypairs deployed to noisy gateways with unique RFC 6598 subnets
- Honey services (HTTP/SSH) on canary subnets logging access attempts
- Canary peer handshake detection triggers immediate escalation
- Seeded WireGuard private keys as "leaked" credentials in internal repos
Server Key Compromise Coverage
- Detection of unauthorized
wg setcommands and config file modifications - Gateway impersonation detection via cloud traffic attribution
- Service restart/configuration reload monitoring outside maintenance windows
- Full server key rotation procedure with blast radius calculation
- Communication plan for affected developers
- Post-rotation validation checklist
Cloud Package
Deliverables
Location: cloud-hunt-package/
| File | Type | Lines | Description |
|---|---|---|---|
README.md |
Documentation | 740 | Complete hunt package: SPL queries, 5-phase playbook with decision gates, operator checklist, server key compromise section, script reference |
cloud_triage.py |
Python | 390 | Account triage via CloudTrail - enumerates all org accounts, cross-references CloudTrail source IPs against VPN gateway lookup, assigns Tier 0-3 based on noisy-call percentage |
egress_hunt.py |
Python | 497 | Deep egress analysis - VPC Flow Log querying, destination IP geolocation/ASN classification (ip-api.com), thin-slice exfiltration detection via coefficient of variation |
deploy_honeytokens.py |
Python | 711 | Full honeytoken lifecycle - creates IAM users/keys, S3 buckets, CloudTrail trails, GCP service accounts/GCS buckets. Monitor polls CloudTrail every 60s. Destroy cleans up all resources |
collect_forensics.sh |
Bash | 430 | Evidence preservation - CloudTrail events, S3 access logs, VPC Flow Logs, IAM snapshot, GuardDuty findings, GCP equivalents. SHA256 hashing with macOS fallback. Idempotent |
validate_prereqs.py |
Python | 410 | Pre-flight readiness - 11 checks (gateway lookup, AWS/GCP credentials, CloudTrail, GuardDuty, VPC Flow Logs, Splunk reachability). PASS/WARN/FAIL with colored output |
Detection Surfaces Covered
- Egress Volume Baseline Deviation - Per-account egress compared to 30-day rolling baseline, flagging >50% deviation with >100MB absolute volume
- Thin-Slice Exfiltration - Metronomic small transfers (50KB-5MB per 10-min bucket) with coefficient of variation <30% over >2 hours
- API Call Volume Inflation - Enumeration detection via List*/Describe*/Get* CloudTrail call spikes (>2x baseline, >500 calls in 15 min)
- Enumeration-to-Egress Correlation - Accounts with simultaneous API spike AND egress spike (staging + exfiltration signal)
- S3 Data Access Anomalies - Bulk GetObject from historically untouched buckets, access from non-VPN IPs
- IAM Credential Usage - Access keys used from non-VPN IPs, geographic impersonation (multi-country usage within <60 min), new keys created during noise window
- Gateway-to-Cloud Correlation - Account prioritization by proportion of CloudTrail events originating from noisy gateway IPs
- Data Staging Indicators - New S3 buckets with public/cross-account ACLs, CloudTrail trail disruption attempts, aggressive S3 lifecycle policies (<7 day expiration)
- GCP-Specific - Compute/storage enumeration spikes, service account key creation, GCS bulk access, BigQuery extract jobs
- Cross-Cloud Data Flows - AWS-to-GCP traffic not matching documented application dependencies
AWS Honey Token Architecture
- IAM user
honey-devops-bot-<suffix>with access key seeded into wikis, Slack, Git repos, CI/CD configs - Decoy S3 bucket
prod-db-backup-snapshot-<suffix>with realistic decoy objects - Canary CloudTrail trail
security-audit-trail-huntmonitoring for disruption - GCP equivalents: service account
honey-monitoring-bot-<suffix>, GCS bucketprod-database-exports-<suffix> - Monitor polls CloudTrail every 60 seconds; any trigger exits with code 1 and alert details
Server Key Compromise Coverage (Cloud Perspective)
- CloudTrail events from gateway IPs with zero active WireGuard peer sessions - definitive correlation query
- Anomalous IAM role assumption (
sts:AssumeRole) from gateway source IPs - Gateway isolation procedures: VPC endpoint policy denial, security group/NACL blocking
- Credential revocation for all roles accessible from compromised gateway's network path
- Forensic imaging guidance for gateway machine
- Long-term hardening: network segmentation, EventBridge alerting on gateway IP activity
Hunt Playbook (Unified)
Phase 1 - Triage (First 30 Minutes)
| Step | Owner | Action |
|---|---|---|
| 1.1 | Identity | Confirm noisy gateways via handshake volume z-score analysis |
| 1.2 | Identity | Time-bound noise onset - identify step-change inflection point |
| 1.3 | Identity | Peer key inventory - compare unique peer count pre/post noise onset per gateway |
| 1.4 | Identity | Identify peers exclusive to noisy gateways (never seen on quiet gateways) |
| 1.5 | Cloud | Run account triage: rank all AWS accounts by noisy-gateway affinity percentage |
| 1.6 | Cloud | Check for CloudTrail trail disruption events from noisy gateway IPs |
Decision Gate: ≥5 exclusive peers OR ≥1 Tier 0 account (>50% noisy affinity) → proceed to Phase 2. If both clean after 72h → stand down, monitor.
Phase 2 - Deep Hunt (2-4 Hours)
| Step | Owner | Action |
|---|---|---|
| 2.1 | Identity | Endpoint IP diversity on suspect peers - flag >3 IPs/24h, cross-country movement |
| 2.2 | Identity | Handshake rate profiling - burst patterns, off-hours clustering |
| 2.3 | Identity | Data volume deep dive - >5x median byte transfer vs peer group |
| 2.4 | Identity | AllowedIPs audit - route expansions, new 0.0.0.0/0 entries |
| 2.5 | Identity | Peer config provenance - verify suspect peers exist in config management |
| 2.6 | Cloud | Per-account egress deviation against 30-day baseline |
| 2.7 | Cloud | Destination profiling: ASN classification, threat intel check |
| 2.8 | Cloud | Thin-slice exfiltration detection on Tier 0-1 accounts |
| 2.9 | Cloud | IAM credential usage anomalies, new access keys from noisy gateways |
Decision Gate: ≥1 peer meets ≥3 criteria OR egress destination resolves to hosting/VPS ASN with >500MB → proceed to Phase 3.
Phase 3 - Data Staging Hunt
| Step | Owner | Action |
|---|---|---|
| 3.1 | Cloud | S3 access pattern analysis - principals with bulk GetObject from historically untouched buckets |
| 3.2 | Cloud | S3 bucket modification timeline - buckets created/modified during noise window |
| 3.3 | Cloud | IAM change audit - new roles/policies/keys created from noisy gateway IPs |
| 3.4 | Cloud | Data volume estimation - total bytes exfiltrated per suspicious principal |
Decision Gate: Staging indicators found → Phase 4. If egress confirmed but no staging indicators → expand to RDS/Redshift logs.
Phase 4 - Active Deception
| Step | Owner | Action |
|---|---|---|
| 4.1 | Identity | Deploy WireGuard canary peers to noisy gateways |
| 4.2 | Identity | Deploy honey services on canary subnets |
| 4.3 | Cloud | Deploy AWS honey token credentials (IAM user + access key) |
| 4.4 | Cloud | Create decoy S3 buckets with realistic data |
| 4.5 | Cloud | Create canary CloudTrail trail for disruption monitoring |
| 4.6 | Cloud | Deploy GCP honey tokens (service account, GCS bucket) |
| 4.7 | Both | Seed honey tokens into developer-facing locations |
| 4.8 | Both | Monitor for canary activation - any trigger = confirmed adversary |
Decision Gate: Any honey token triggered → IMMEDIATE containment (Phase 5). No trigger after 7 days → stand down active deception, maintain canaries as ongoing tripwires.
Phase 5 - Containment & Evidence
| Step | Owner | Action |
|---|---|---|
| 5.1 | Identity | Revoke compromised WireGuard peers via wg_contain.sh |
| 5.2 | Cloud | Revoke compromised IAM access keys (inactive, DO NOT delete) |
| 5.3 | Cloud | Revoke compromised role sessions via trust policy modification |
| 5.4 | Cloud | Apply emergency SCPs at OU level to deny egress (if Organizations) |
| 5.5 | Cloud | Rotate all access keys for principals in Tier 0-1 accounts |
| 5.6 | Both | Preserve evidence: CloudTrail exports, VPC Flow Logs, S3 access logs, IAM snapshots, WireGuard peer dumps |
| 5.7 | Both | Splunk forensic CSV exports for suspect peers and accounts |
| 5.8 | Both | Notify incident commander with full handoff package |
Escalation Triggers
| Trigger | Severity | Owner |
|---|---|---|
| Peer with >10 endpoint IPs in 24h OR cross-country IPs | Tier 2 | Identity |
| Peer handshake burst >10 in <60 seconds | Tier 2 | Identity |
| Peer data transfer >5x median for gateway | Tier 2 | Identity |
| Peer data transfer >500 MB in 24h | Tier 3 | Identity |
| AllowedIPs expanded to 0.0.0.0/0 | Tier 3 | Identity |
| Canary peer handshake detected | IMMEDIATE Tier 3 | Identity |
| CloudTrail StopLogging/DeleteTrail from noisy gateway IP | P0 Incident | Cloud |
| Honey access key used in AWS | IMMEDIATE Tier 3 | Cloud |
| GuardDuty Exfiltration:* finding | P0 Incident | Cloud |
| Egress to threat-intel-blocklisted IP | CRITICAL | Cloud |
| Server key compromise confirmed | CRITICAL - Full gateway rotation | Both |
Account Prioritization Matrix (200+ Accounts)
| Priority | Criteria | Action Window |
|---|---|---|
| P0 - Immediate | Honey token triggered OR CloudTrail trail disrupted OR GuardDuty Exfiltration finding OR dest IP on threat intel | 15 minutes |
| P1 - Urgent | Tier 0 account (>50% noisy affinity) + egress to unknown hosting ASN | 1 hour |
| P2 - High | Tier 1 account (25-50% noisy) + egress deviation >200% OR thin-slice pattern confirmed | 4 hours |
| P3 - Medium | Tier 1-2 account + egress deviation 100-200% | 24 hours |
| P4 - Monitor | Tier 3 account (<10% noisy) but any egress anomaly present | 72 hours |
| P5 - Stand Down | No gateway affinity, no egress anomalies, no IAM changes | N/A |
Cross-Cutting Dependencies
| Dependency | Owner | Status |
|---|---|---|
vpn_gateway_lookup.csv population |
Identity (build_gateway_lookup.sh) |
✅ Scripted |
WireGuard wg show all dump polling |
Identity (wg_analysis.py --interval) |
✅ Scripted |
| Splunk REST API access (bearer token or username/password) | Both (splunk_search.py, validate_prereqs.py) |
✅ Scripted |
| AWS Organizations access (list-accounts) | Cloud (cloud_triage.py --org) |
✅ Scripted |
| CloudTrail LookupEvents (7-day limit, 50/call paginated) | Cloud (cloud_triage.py) |
✅ Scripted with documented fallback |
| VPC Flow Logs in CloudWatch Logs (or S3 fallback) | Cloud (egress_hunt.py) |
✅ Scripted with documented limitation |
| S3 server access logs enabled on target buckets | Cloud | ⚠️ Must be pre-configured - scripts document this gap |
| CloudTrail data events enabled on S3 buckets | Cloud | ⚠️ Must be pre-configured - scripts document this gap |
| ip-api.com rate limiting (45 req/min) | Cloud (egress_hunt.py) |
✅ Built-in rate limiting |
WireGuard kernel module / wg tool available on gateways |
Identity | ⚠️ Required for local polling mode |
Deploy Sequence
# ==========================================
# IDENTITY SIDE
# ==========================================
cd wireguard-hunt-package/
# Step 1: Build gateway lookup (prerequisite for all correlation)
export SPLUNK_HOST=splunk.internal.example.com
export SPLUNK_TOKEN="your-bearer-token"
./build_gateway_lookup.sh --input gateways.txt --output vpn_gateway_lookup.csv
# Step 2: Run triage
python3 hunt_orchestrator.py --phase triage \
--gateway-list gateways.txt \
--output suspect_peers.json
# Step 3: If suspects found → deep hunt
python3 hunt_orchestrator.py --phase deep-hunt \
--suspect-peers suspect_peers.json \
--output deep_hunt_findings.json
# Step 4: Deploy canaries
python3 hunt_orchestrator.py --phase canary --deploy
# Step 5: Monitor (blocking)
python3 hunt_orchestrator.py --phase canary --monitor --timeout 3600
# Step 6: If confirmed → contain
python3 hunt_orchestrator.py --phase contain \
--peer-key <PUBLIC_KEY> \
--gateway-list gateways.txt \
--confirm
# ==========================================
# CLOUD SIDE (in parallel)
# ==========================================
cd cloud-hunt-package/
# Step 1: Validate environment
python3 validate_prereqs.py \
--gateway-lookup ../wireguard-hunt-package/vpn_gateway_lookup.csv
# Step 2: Triage all org accounts
python3 cloud_triage.py \
--gateway-lookup ../wireguard-hunt-package/vpn_gateway_lookup.csv \
--org --days 7 \
--output triage_results.json
# Step 3: Deep egress analysis on Tier 0/1 accounts
python3 egress_hunt.py \
--tier-file triage_results.json \
--days 14 \
--output egress_findings.json
# Step 4: Deploy honeytokens while investigation proceeds
python3 deploy_honeytokens.py --create --canary-trail
# Step 5: Monitor (blocking, run in screen/tmux)
python3 deploy_honeytokens.py --monitor --timeout 604800
# Step 6: Collect forensics if incident confirmed
./collect_forensics.sh \
--aws-account 123456789012 \
--output-dir /secure/forensics/
# Step 7: Cleanup after hunt concludes
python3 deploy_honeytokens.py --destroy
Server Key Compromise - Joint Detection Strategy
Why It Matters
A WireGuard server's private key is the cryptographic identity of the VPN gateway. If an adversary obtains this key, they can decrypt and inject traffic that appears to originate from the gateway's IP address - without establishing any peer session. This means they bypass all peer-level detection (handshake analysis, endpoint IP diversity, churn detection) because they never appear as a peer.
Detection
| Signal | Owner | Method |
|---|---|---|
Unauthorized wg set commands or private-key file modifications on gateway hosts |
Identity | File integrity monitoring, command audit logging, Splunk query for wg set events outside maintenance windows |
| CloudTrail events from gateway IPs with zero active WireGuard peer sessions | Cloud | SPL correlation query joining CloudTrail sourceIPAddress against WireGuard peer session data |
Anomalous sts:AssumeRole from gateway source IPs |
Cloud | CloudTrail monitoring - gateway IPs should rarely/never assume IAM roles |
| WireGuard service restarts or config reloads outside maintenance windows | Identity | Systemd/init script audit logging |
Response
- Isolate affected gateway from cloud networks - VPC endpoint policy denial, security group/NACL blocking
- Revoke all credentials accessible from gateway's network path - IAM users, roles, STS tokens
- Rotate WireGuard server key - generate new keypair, distribute updated configs to all legitimate peers
- Monitor for failed handshakes with old key - indicates adversary attempting to reconnect
- Rotate ALL credentials in every account accessible from the compromised gateway's network segment
- Forensic imaging of gateway machine - preserve memory dumps, WireGuard configs, system logs before rotation
- Long-term hardening - network segmentation, EventBridge alerting on gateway IP activity, client certificate authentication defense-in-depth
Known Gaps and Limitations
| Gap | Severity | Mitigation |
|---|---|---|
| No automated cross-correlation between Identity peer list and Cloud account tier list | Medium | Manual cross-reference in Phase 2; next iteration should add to hunt_orchestrator.py |
| S3 access logs and CloudTrail data events must be pre-enabled on target buckets | High | Documented in scripts; requires pre-configuration before hunt begins |
| CloudTrail LookupEvents limited to 7 days history and 50 results per page | Medium | Documented fallback: Athena queries against CloudTrail S3 data or Splunk for deeper history |
| VPC Flow Log analysis depends on CloudWatch Logs publishing; S3-only publishing requires pre-downloaded files | Low | Documented in egress_hunt.py |
| WireGuard server key compromise bypasses peer-level detection entirely | High | Addressed via joint detection strategy (see above); server key rotation procedure documented |
| ip-api.com free tier: 45 requests/minute, no SLA | Low | Built-in rate limiting in egress_hunt.py; optional MaxMind GeoLite2 local database path |
Package Sizes
| Package | Files | Total Lines | Python | Bash |
|---|---|---|---|---|
| Identity | 7 | ~3,800 | 4 scripts | 2 scripts |
| Cloud | 6 | ~2,500 | 4 scripts | 1 script |
| Combined | 13 | ~6,300 | 8 scripts | 3 scripts |
Identity Scripts
wg_analysis.py- 713 lines (WireGuard peer analysis engine)wg_canary.py- 506 lines (canary keypair lifecycle)splunk_search.py- 430 lines (Splunk REST API wrapper)hunt_orchestrator.py- 667 lines (phase orchestrator)wg_contain.sh- 297 lines (emergency containment)build_gateway_lookup.sh- 181 lines (gateway lookup builder)
Cloud Scripts
cloud_triage.py- 390 lines (account triage via CloudTrail)egress_hunt.py- 497 lines (deep egress analysis)deploy_honeytokens.py- 711 lines (honeytoken lifecycle)collect_forensics.sh- 430 lines (evidence preservation)validate_prereqs.py- 410 lines (pre-flight readiness)
Review Verdict
Both packages are cleared for operational deployment.
- All scripts are production-ready, not stubs
- Error handling is present throughout
- Python dependencies documented at top of each script
- Bash scripts handle Linux/macOS portability
- SPL queries preserved from initial detection library
- Playbooks and checklists are phase-structured with explicit decision gates
- Server key compromise coverage exists from both Identity (gateway-side) and Cloud (API-side) perspectives
- Escalation triggers are specific and threshold-based
- Prioritization matrix handles 200+ accounts without analyst burnout