2026-04-29 07:41:14 -07:00
2026-04-29 07:40:18 -07:00
2026-04-29 07:40:18 -07:00
2026-04-29 07:40:18 -07:00
2026-04-29 07:41:14 -07:00

Scenario: Noise Floor Cloud Hunt

SITREP - Engagement Overview

Date: April 2026 Engagement Type: Active Blue Team Hunt (Active Deception Authorized) Engagement Director: Cyber Operations Director

Initial Report

Corporate VPN network using WireGuard, connecting thousands of developers to 200+ AWS accounts (plus a handful of GCP accounts). Centralized logging in Splunk. Observed a ~2x increase in the background chatter "noise floor" - elevated network traffic volume concentrated on 25% of the VPN gateways.

Hypothesis

An adversary has established a slow, low-signal exfiltration channel buried in high-volume VPN traffic, and is deliberately inflating background noise to degrade signal-to-noise ratio in logging - making anomalous traffic indistinguishable from legitimate developer activity. Classic "hide in the noise" tradecraft.

Environment Profile

Attribute Value
VPN Technology WireGuard
Cloud Providers AWS (primary, 200+ accounts), GCP (handful)
Centralized Logging Splunk
Noise Distribution Clustered on 25% of gateways (not uniform)
Active Hunting Authorized (canaries, honey tokens, decoys)
Developer Count Thousands

Operational Decomposition

Domain Assignment

Domain Specialist Primary Scope
Identity & Access Identity Specialist WireGuard peer analysis, auth patterns, credential hygiene, VPN session anomaly detection
Cloud Cloud Specialist AWS/GCP egress mapping, API call volume baselining, data staging indicators, IAM credential anomalies, cross-cloud correlation
Wireless N/A (ruled out) Noise confirmed as network-level chatter, not RF spectrum

Key Intelligence Lever

The noise is on 25% of gateways, not uniformly distributed. This is the primary triage pivot point. The adversary is likely operating through or adjacent to those specific gateways. Accounts and peers that disproportionately route through the noisy gateways are priority investigation targets.


Identity & Access Package

Deliverables

Location: wireguard-hunt-package/

File Type Lines Description
README.md Documentation 1,019 Complete hunt package: SPL queries (§1.11.8), 5-phase playbook with decision gates, operator checklist, script reference, server key compromise section
wg_analysis.py Python 713 WireGuard peer analysis engine - wg show all dump parser, stateful delta computation, baseline builder, anomaly detector. Modes: --stdin, --file, --interval, --baseline, --detect
wg_canary.py Python 506 Canary keypair lifecycle - --generate, --monitor --local, --monitor --splunk, --deploy, --cleanup
splunk_search.py Python 430 Splunk REST API wrapper - 8 detection functions exposed as importable library and CLI subcommands. Zero pip dependencies (stdlib urllib)
hunt_orchestrator.py Python 667 Master orchestrator - chains triage → deep-hunt → canary → contain. Imports from wg_analysis and splunk_search
wg_contain.sh Bash 297 Emergency containment - SSH-driven peer discovery, evidence collection, revocation, verification. Dry-run mode without --confirm
build_gateway_lookup.sh Bash 181 Gateway lookup table builder - resolves hostnames to IPs, prompts for noisy/quiet tagging, outputs Splunk-compatible CSV

Detection Surfaces Covered

  1. Handshake Rate Spikes - Rapid WireGuard handshake cycling (>5/min) indicates automated C2 keepalive or deliberate noise injection
  2. Peer Connection Churn - Peers repeatedly appearing/disappearing in wg show snapshots (>5 cycles in 10 min) indicates automated persistence
  3. Endpoint IP Diversity - Single peer key authenticating from >3 distinct IPs in 24h indicates key exfiltration or adversary IP rotation
  4. Abnormal Handshake Timing - Off-hours handshake ratio >50%, burst patterns, uniform-interval beacons vs. human-driven patterns
  5. Gateway-Specific Noise Clustering - Peers exclusive to the noisy 25% of gateways when other gateways are available
  6. Data-Plane Volume Anomalies - Peers with >5x median byte transfer versus peer group on same gateway
  7. Configuration Drift - New peers added outside change management, AllowedIPs range expansions indicating lateral movement preparation
  8. Correlation: Noisy-Only × High Volume - Compound query combining gateway exclusivity with data transfer anomalies

Canary Deployment Architecture

  • Decoy WireGuard keypairs deployed to noisy gateways with unique RFC 6598 subnets
  • Honey services (HTTP/SSH) on canary subnets logging access attempts
  • Canary peer handshake detection triggers immediate escalation
  • Seeded WireGuard private keys as "leaked" credentials in internal repos

Server Key Compromise Coverage

  • Detection of unauthorized wg set commands and config file modifications
  • Gateway impersonation detection via cloud traffic attribution
  • Service restart/configuration reload monitoring outside maintenance windows
  • Full server key rotation procedure with blast radius calculation
  • Communication plan for affected developers
  • Post-rotation validation checklist

Cloud Package

Deliverables

Location: cloud-hunt-package/

File Type Lines Description
README.md Documentation 740 Complete hunt package: SPL queries, 5-phase playbook with decision gates, operator checklist, server key compromise section, script reference
cloud_triage.py Python 390 Account triage via CloudTrail - enumerates all org accounts, cross-references CloudTrail source IPs against VPN gateway lookup, assigns Tier 0-3 based on noisy-call percentage
egress_hunt.py Python 497 Deep egress analysis - VPC Flow Log querying, destination IP geolocation/ASN classification (ip-api.com), thin-slice exfiltration detection via coefficient of variation
deploy_honeytokens.py Python 711 Full honeytoken lifecycle - creates IAM users/keys, S3 buckets, CloudTrail trails, GCP service accounts/GCS buckets. Monitor polls CloudTrail every 60s. Destroy cleans up all resources
collect_forensics.sh Bash 430 Evidence preservation - CloudTrail events, S3 access logs, VPC Flow Logs, IAM snapshot, GuardDuty findings, GCP equivalents. SHA256 hashing with macOS fallback. Idempotent
validate_prereqs.py Python 410 Pre-flight readiness - 11 checks (gateway lookup, AWS/GCP credentials, CloudTrail, GuardDuty, VPC Flow Logs, Splunk reachability). PASS/WARN/FAIL with colored output

Detection Surfaces Covered

  1. Egress Volume Baseline Deviation - Per-account egress compared to 30-day rolling baseline, flagging >50% deviation with >100MB absolute volume
  2. Thin-Slice Exfiltration - Metronomic small transfers (50KB-5MB per 10-min bucket) with coefficient of variation <30% over >2 hours
  3. API Call Volume Inflation - Enumeration detection via List*/Describe*/Get* CloudTrail call spikes (>2x baseline, >500 calls in 15 min)
  4. Enumeration-to-Egress Correlation - Accounts with simultaneous API spike AND egress spike (staging + exfiltration signal)
  5. S3 Data Access Anomalies - Bulk GetObject from historically untouched buckets, access from non-VPN IPs
  6. IAM Credential Usage - Access keys used from non-VPN IPs, geographic impersonation (multi-country usage within <60 min), new keys created during noise window
  7. Gateway-to-Cloud Correlation - Account prioritization by proportion of CloudTrail events originating from noisy gateway IPs
  8. Data Staging Indicators - New S3 buckets with public/cross-account ACLs, CloudTrail trail disruption attempts, aggressive S3 lifecycle policies (<7 day expiration)
  9. GCP-Specific - Compute/storage enumeration spikes, service account key creation, GCS bulk access, BigQuery extract jobs
  10. Cross-Cloud Data Flows - AWS-to-GCP traffic not matching documented application dependencies

AWS Honey Token Architecture

  • IAM user honey-devops-bot-<suffix> with access key seeded into wikis, Slack, Git repos, CI/CD configs
  • Decoy S3 bucket prod-db-backup-snapshot-<suffix> with realistic decoy objects
  • Canary CloudTrail trail security-audit-trail-hunt monitoring for disruption
  • GCP equivalents: service account honey-monitoring-bot-<suffix>, GCS bucket prod-database-exports-<suffix>
  • Monitor polls CloudTrail every 60 seconds; any trigger exits with code 1 and alert details

Server Key Compromise Coverage (Cloud Perspective)

  • CloudTrail events from gateway IPs with zero active WireGuard peer sessions - definitive correlation query
  • Anomalous IAM role assumption (sts:AssumeRole) from gateway source IPs
  • Gateway isolation procedures: VPC endpoint policy denial, security group/NACL blocking
  • Credential revocation for all roles accessible from compromised gateway's network path
  • Forensic imaging guidance for gateway machine
  • Long-term hardening: network segmentation, EventBridge alerting on gateway IP activity

Hunt Playbook (Unified)

Phase 1 - Triage (First 30 Minutes)

Step Owner Action
1.1 Identity Confirm noisy gateways via handshake volume z-score analysis
1.2 Identity Time-bound noise onset - identify step-change inflection point
1.3 Identity Peer key inventory - compare unique peer count pre/post noise onset per gateway
1.4 Identity Identify peers exclusive to noisy gateways (never seen on quiet gateways)
1.5 Cloud Run account triage: rank all AWS accounts by noisy-gateway affinity percentage
1.6 Cloud Check for CloudTrail trail disruption events from noisy gateway IPs

Decision Gate: ≥5 exclusive peers OR ≥1 Tier 0 account (>50% noisy affinity) → proceed to Phase 2. If both clean after 72h → stand down, monitor.

Phase 2 - Deep Hunt (2-4 Hours)

Step Owner Action
2.1 Identity Endpoint IP diversity on suspect peers - flag >3 IPs/24h, cross-country movement
2.2 Identity Handshake rate profiling - burst patterns, off-hours clustering
2.3 Identity Data volume deep dive - >5x median byte transfer vs peer group
2.4 Identity AllowedIPs audit - route expansions, new 0.0.0.0/0 entries
2.5 Identity Peer config provenance - verify suspect peers exist in config management
2.6 Cloud Per-account egress deviation against 30-day baseline
2.7 Cloud Destination profiling: ASN classification, threat intel check
2.8 Cloud Thin-slice exfiltration detection on Tier 0-1 accounts
2.9 Cloud IAM credential usage anomalies, new access keys from noisy gateways

Decision Gate: ≥1 peer meets ≥3 criteria OR egress destination resolves to hosting/VPS ASN with >500MB → proceed to Phase 3.

Phase 3 - Data Staging Hunt

Step Owner Action
3.1 Cloud S3 access pattern analysis - principals with bulk GetObject from historically untouched buckets
3.2 Cloud S3 bucket modification timeline - buckets created/modified during noise window
3.3 Cloud IAM change audit - new roles/policies/keys created from noisy gateway IPs
3.4 Cloud Data volume estimation - total bytes exfiltrated per suspicious principal

Decision Gate: Staging indicators found → Phase 4. If egress confirmed but no staging indicators → expand to RDS/Redshift logs.

Phase 4 - Active Deception

Step Owner Action
4.1 Identity Deploy WireGuard canary peers to noisy gateways
4.2 Identity Deploy honey services on canary subnets
4.3 Cloud Deploy AWS honey token credentials (IAM user + access key)
4.4 Cloud Create decoy S3 buckets with realistic data
4.5 Cloud Create canary CloudTrail trail for disruption monitoring
4.6 Cloud Deploy GCP honey tokens (service account, GCS bucket)
4.7 Both Seed honey tokens into developer-facing locations
4.8 Both Monitor for canary activation - any trigger = confirmed adversary

Decision Gate: Any honey token triggered → IMMEDIATE containment (Phase 5). No trigger after 7 days → stand down active deception, maintain canaries as ongoing tripwires.

Phase 5 - Containment & Evidence

Step Owner Action
5.1 Identity Revoke compromised WireGuard peers via wg_contain.sh
5.2 Cloud Revoke compromised IAM access keys (inactive, DO NOT delete)
5.3 Cloud Revoke compromised role sessions via trust policy modification
5.4 Cloud Apply emergency SCPs at OU level to deny egress (if Organizations)
5.5 Cloud Rotate all access keys for principals in Tier 0-1 accounts
5.6 Both Preserve evidence: CloudTrail exports, VPC Flow Logs, S3 access logs, IAM snapshots, WireGuard peer dumps
5.7 Both Splunk forensic CSV exports for suspect peers and accounts
5.8 Both Notify incident commander with full handoff package

Escalation Triggers

Trigger Severity Owner
Peer with >10 endpoint IPs in 24h OR cross-country IPs Tier 2 Identity
Peer handshake burst >10 in <60 seconds Tier 2 Identity
Peer data transfer >5x median for gateway Tier 2 Identity
Peer data transfer >500 MB in 24h Tier 3 Identity
AllowedIPs expanded to 0.0.0.0/0 Tier 3 Identity
Canary peer handshake detected IMMEDIATE Tier 3 Identity
CloudTrail StopLogging/DeleteTrail from noisy gateway IP P0 Incident Cloud
Honey access key used in AWS IMMEDIATE Tier 3 Cloud
GuardDuty Exfiltration:* finding P0 Incident Cloud
Egress to threat-intel-blocklisted IP CRITICAL Cloud
Server key compromise confirmed CRITICAL - Full gateway rotation Both

Account Prioritization Matrix (200+ Accounts)

Priority Criteria Action Window
P0 - Immediate Honey token triggered OR CloudTrail trail disrupted OR GuardDuty Exfiltration finding OR dest IP on threat intel 15 minutes
P1 - Urgent Tier 0 account (>50% noisy affinity) + egress to unknown hosting ASN 1 hour
P2 - High Tier 1 account (25-50% noisy) + egress deviation >200% OR thin-slice pattern confirmed 4 hours
P3 - Medium Tier 1-2 account + egress deviation 100-200% 24 hours
P4 - Monitor Tier 3 account (<10% noisy) but any egress anomaly present 72 hours
P5 - Stand Down No gateway affinity, no egress anomalies, no IAM changes N/A

Cross-Cutting Dependencies

Dependency Owner Status
vpn_gateway_lookup.csv population Identity (build_gateway_lookup.sh) Scripted
WireGuard wg show all dump polling Identity (wg_analysis.py --interval) Scripted
Splunk REST API access (bearer token or username/password) Both (splunk_search.py, validate_prereqs.py) Scripted
AWS Organizations access (list-accounts) Cloud (cloud_triage.py --org) Scripted
CloudTrail LookupEvents (7-day limit, 50/call paginated) Cloud (cloud_triage.py) Scripted with documented fallback
VPC Flow Logs in CloudWatch Logs (or S3 fallback) Cloud (egress_hunt.py) Scripted with documented limitation
S3 server access logs enabled on target buckets Cloud ⚠️ Must be pre-configured - scripts document this gap
CloudTrail data events enabled on S3 buckets Cloud ⚠️ Must be pre-configured - scripts document this gap
ip-api.com rate limiting (45 req/min) Cloud (egress_hunt.py) Built-in rate limiting
WireGuard kernel module / wg tool available on gateways Identity ⚠️ Required for local polling mode

Deploy Sequence

# ==========================================
# IDENTITY SIDE
# ==========================================
cd wireguard-hunt-package/

# Step 1: Build gateway lookup (prerequisite for all correlation)
export SPLUNK_HOST=splunk.internal.example.com
export SPLUNK_TOKEN="your-bearer-token"
./build_gateway_lookup.sh --input gateways.txt --output vpn_gateway_lookup.csv

# Step 2: Run triage
python3 hunt_orchestrator.py --phase triage \
    --gateway-list gateways.txt \
    --output suspect_peers.json

# Step 3: If suspects found → deep hunt
python3 hunt_orchestrator.py --phase deep-hunt \
    --suspect-peers suspect_peers.json \
    --output deep_hunt_findings.json

# Step 4: Deploy canaries
python3 hunt_orchestrator.py --phase canary --deploy

# Step 5: Monitor (blocking)
python3 hunt_orchestrator.py --phase canary --monitor --timeout 3600

# Step 6: If confirmed → contain
python3 hunt_orchestrator.py --phase contain \
    --peer-key <PUBLIC_KEY> \
    --gateway-list gateways.txt \
    --confirm

# ==========================================
# CLOUD SIDE (in parallel)
# ==========================================
cd cloud-hunt-package/

# Step 1: Validate environment
python3 validate_prereqs.py \
    --gateway-lookup ../wireguard-hunt-package/vpn_gateway_lookup.csv

# Step 2: Triage all org accounts
python3 cloud_triage.py \
    --gateway-lookup ../wireguard-hunt-package/vpn_gateway_lookup.csv \
    --org --days 7 \
    --output triage_results.json

# Step 3: Deep egress analysis on Tier 0/1 accounts
python3 egress_hunt.py \
    --tier-file triage_results.json \
    --days 14 \
    --output egress_findings.json

# Step 4: Deploy honeytokens while investigation proceeds
python3 deploy_honeytokens.py --create --canary-trail

# Step 5: Monitor (blocking, run in screen/tmux)
python3 deploy_honeytokens.py --monitor --timeout 604800

# Step 6: Collect forensics if incident confirmed
./collect_forensics.sh \
    --aws-account 123456789012 \
    --output-dir /secure/forensics/

# Step 7: Cleanup after hunt concludes
python3 deploy_honeytokens.py --destroy

Server Key Compromise - Joint Detection Strategy

Why It Matters

A WireGuard server's private key is the cryptographic identity of the VPN gateway. If an adversary obtains this key, they can decrypt and inject traffic that appears to originate from the gateway's IP address - without establishing any peer session. This means they bypass all peer-level detection (handshake analysis, endpoint IP diversity, churn detection) because they never appear as a peer.

Detection

Signal Owner Method
Unauthorized wg set commands or private-key file modifications on gateway hosts Identity File integrity monitoring, command audit logging, Splunk query for wg set events outside maintenance windows
CloudTrail events from gateway IPs with zero active WireGuard peer sessions Cloud SPL correlation query joining CloudTrail sourceIPAddress against WireGuard peer session data
Anomalous sts:AssumeRole from gateway source IPs Cloud CloudTrail monitoring - gateway IPs should rarely/never assume IAM roles
WireGuard service restarts or config reloads outside maintenance windows Identity Systemd/init script audit logging

Response

  1. Isolate affected gateway from cloud networks - VPC endpoint policy denial, security group/NACL blocking
  2. Revoke all credentials accessible from gateway's network path - IAM users, roles, STS tokens
  3. Rotate WireGuard server key - generate new keypair, distribute updated configs to all legitimate peers
  4. Monitor for failed handshakes with old key - indicates adversary attempting to reconnect
  5. Rotate ALL credentials in every account accessible from the compromised gateway's network segment
  6. Forensic imaging of gateway machine - preserve memory dumps, WireGuard configs, system logs before rotation
  7. Long-term hardening - network segmentation, EventBridge alerting on gateway IP activity, client certificate authentication defense-in-depth

Known Gaps and Limitations

Gap Severity Mitigation
No automated cross-correlation between Identity peer list and Cloud account tier list Medium Manual cross-reference in Phase 2; next iteration should add to hunt_orchestrator.py
S3 access logs and CloudTrail data events must be pre-enabled on target buckets High Documented in scripts; requires pre-configuration before hunt begins
CloudTrail LookupEvents limited to 7 days history and 50 results per page Medium Documented fallback: Athena queries against CloudTrail S3 data or Splunk for deeper history
VPC Flow Log analysis depends on CloudWatch Logs publishing; S3-only publishing requires pre-downloaded files Low Documented in egress_hunt.py
WireGuard server key compromise bypasses peer-level detection entirely High Addressed via joint detection strategy (see above); server key rotation procedure documented
ip-api.com free tier: 45 requests/minute, no SLA Low Built-in rate limiting in egress_hunt.py; optional MaxMind GeoLite2 local database path

Package Sizes

Package Files Total Lines Python Bash
Identity 7 ~3,800 4 scripts 2 scripts
Cloud 6 ~2,500 4 scripts 1 script
Combined 13 ~6,300 8 scripts 3 scripts

Identity Scripts

  • wg_analysis.py - 713 lines (WireGuard peer analysis engine)
  • wg_canary.py - 506 lines (canary keypair lifecycle)
  • splunk_search.py - 430 lines (Splunk REST API wrapper)
  • hunt_orchestrator.py - 667 lines (phase orchestrator)
  • wg_contain.sh - 297 lines (emergency containment)
  • build_gateway_lookup.sh - 181 lines (gateway lookup builder)

Cloud Scripts

  • cloud_triage.py - 390 lines (account triage via CloudTrail)
  • egress_hunt.py - 497 lines (deep egress analysis)
  • deploy_honeytokens.py - 711 lines (honeytoken lifecycle)
  • collect_forensics.sh - 430 lines (evidence preservation)
  • validate_prereqs.py - 410 lines (pre-flight readiness)

Review Verdict

Both packages are cleared for operational deployment.

  • All scripts are production-ready, not stubs
  • Error handling is present throughout
  • Python dependencies documented at top of each script
  • Bash scripts handle Linux/macOS portability
  • SPL queries preserved from initial detection library
  • Playbooks and checklists are phase-structured with explicit decision gates
  • Server key compromise coverage exists from both Identity (gateway-side) and Cloud (API-side) perspectives
  • Escalation triggers are specific and threshold-based
  • Prioritization matrix handles 200+ accounts without analyst burnout
Description
Engagement scenario: filter through an intentionally elevated noise floor to find suspicious traffic
Readme 171 KiB
Languages
Python 83.5%
Shell 16.5%