Zero Trust Firewalls: Hardening AWS VPCs Against Lateral Movement

If you’d like to play around please clone my repo at https://github.com/cybersecbella/zero-trust-aws

git clone https://github.com/cybersecbella/zero-trust-aws.git

Firewalls

Firewalls act as the bodyguard at the entrance of a network. A data packet meets the firewall. The contents of the data packet are checked against the firewall’s rules. The incoming and outgoing traffic is inspected at this spot. Based on rules configured to a specific network, firewalls return an allow or deny. Also known as packet filtering; Filtering which packets can go in and out of a network.

Firewalls can either be stateless (they do not remember past interactions/do not store session data) or stateful (they do remember past interactions/stores session data). Proxy firewalls are placed inbetween the Internet and private network; they can inspect the contents of a packet. Whether a firewalls allows/denies entry to the private network is based on the content of the packet. Next generation firewall (NGFW) is an IPS which recognizes patterns of an attack and blocks the packet from entering the Network.

Built in rules have:

Source address: The machine’s IP address that would originate the traffic
Destination address: The machine’s IP address that would receive the data
Port: The port number for the traffic
Protocol: The protocol that would be used during the communication
Action: the action that would be taken upon identifying any traffic of this particular nature
Direction: the rule’s applicability to incoming or outgoing tr

Directions of rules:

Inbound rules: rules apply to incoming traffic
Outbound rules: rules apply to outgoing traffic
Forward rules: forward specific traffic inside the network.

Firewalls can:

Allow: Traffic inside packet does not interfere with rules; accept the data packet
Deny: Traffic inside packet interferes with rules; block packet from entering
Forward: sends traffic to a different network segment using forwarding rules; gateway between different network segments

fwall1

Firewall log…an example of a network-centric log (logs that say what’s happening between the devices; source and dest ips, protocols, actions taken, tell a story of an attacker’s whereabouts)

fwall2

Means: The same external IP (203.0.113.10) is trying to connect to multiple ports on the same internal machine quickly. > attacker is performing a port scan trying to look for an open service to target.

Can check your inbound and outbound rules in Windows Defender

nftables – engine

Iptables – legacy interface to roughly that same engine

firewalld/ufw – wrappers

Tool	Layer	Role	Output
iptables	netfilter (legacy interface)	Direct packet-filtering via tables/chains	rules with packet/byte counters, protocol, source/dest, target (ACCEPT/DROP/REJECT)
nftables	netfilter (modern engine)	unified filtering engine	ruleset dump: tables/chains/rules in nested, near-JSON-like syntax
firewalld	Management layer (on top of nftables/iptables)	Zone-based dynamic firewall management	Zone summary: interfaces, allowed services, open ports, rich rules
ufw	Management layer (on top of nftables/iptables)	Simple allow/deny	Simple status table: rule number, port/protocol, action (ALLOW/DENY), source

Zeek

Network monitoring is the process of analyzing the availability/uptime of a service, performance, and network traffic configurations to look for potential threats to the network; Troubleshooting and finding the root cause of an issue is done as well. Network security monitoring adds a focus on network traffic and suspicious events. Zeek is a tool used for network monitoring.

Zeek can analyze events and logs to give an analysis of potential threats and actions needed. The Event Engine layer processes the packets and gives a description of the event while dividing them into parts (source and destination addresses, protocol identification, session analysis and file extraction).

The Policy Script Interpretation Layer is where the events are analyzed and correlated using zeek scripts. Zeek produces log files that help with network monitoring, intrusion detection, and threat hunting. It takes in a pcap file, analyzes the traffic in the packets, and produces logs. Zeek can attach a signature to chain multiple events (similar patterns, .sig extension). There’s much more to learn from zeek like signatures and frameworks like Extracting files, Hashes, Intelligence, and others.

sudo su #need superuser privileges
zeekctl #to start zeek

Run zeek service

sudo su
zeek -c -r sample.pcap #be in the folder that has the pcap #generates logs 
ls -l #list logs shown

Generates logs for a pcap file

zeek2

Once logs are generated can investigate with zeek-cut

cat dhcp.log | zeek-cut host_name #available hostname 
cat dns.log  | zeek-cut query  #number of unique DNS queries 
cat conn.log | zeek-cut duration | sort -n -r | head -n 1  #longest connection duration

zeek3a

Can find hostname in dhcp.log

Conn.log is the backbone which shows who talked to whom. Start investigating here. Protocol specific logs (dns, http, ssl, ssh) shows what the conversation is. Suspicious or malformed flags are found in weird.log and notice.log.

Hover over this for a table of detailed logs.

Log File	What it captures	Key Fields
conn.log	Summary of every network connection -- master log	Timestamps, source/dest IP & port, protocol, duration, bytes sent/received, connection state
dns.log	DNS queries and responses	Query name, query type, response codes, answers (resolved IPs), TTL (Time to Live, expiration date)
http.log	HTTP request/response transactions	Method, URI, host, user-agent, status code, MIME type, referrer
ssl.log	TLS/SSL handshake details	Server name (SNI), cipher suite, certificate validation status, TLS version
x509.log	Details of X.509 certificates seen on the wire	Certificate subject/issuer, validity period, key usage, serial number
files.log	Metadata on files extracted/observed in transit (e.g., downloads, attachments)	Filename, MIME type, file size, source protocol, hash (MD5/SHA1/SHA256 if extraction enabled)
weird.log	Anomalies or unexpected protocol behavior that doesn't fit normal parsing	Anomaly name, connection it's tied to, notice flag
notice.log	Alerts generated by Zeek's policy/scripts when something noteworthy happens	Notice type, message, associated connection, severity
ssh.log	SSH session metadata (not decrypted content)	Auth success/failure, client/server software version, SSH protocol version
ftp.log	FTP session activity	Commands issued, usernames, file transfer details
dhcp.log	DHCP lease assignments	Client MAC, assigned IP, lease time, hostname requested
kerberos.log	Kerberos authentication exchanges	Client/service principal, ticket validity, encryption type
ntlm.log	NTLM authentication attempts	Username, domain, hostname, success/failure
rdp.log	RDP session metadata	Client/server info, encryption level, cookie

Zero trust principles

Never trust, always verify — treat every packet as untrusted regardless of source IP or VPC origin

Least privilege access — restrict Security Group (SG) rules to exact ports/protocols needed (minimum access), not broad CIDR ranges CIDR (Classless Inter-Domain Routing) – allocating IP addresses and routing internet traffic (ex. 192.168.1.0/2)

Implementing Zero Trust Principles

Microsegmentation — isolate workloads into separate subnets or VPCs to limit blast radius
Assume breach — design controls assuming an attacker is already inside the perimeter
Explicit allow-listing — default deny all, then grant specific access rather than blocking known threats
Identity-aware access — tie network rules to IAM roles and instance identities, not just IP addresses
Continuous verification — enforce re-authentication and re-authorization on session state changes

AWS Console

AWS Management Console – dashboard for AWS/AWS’s cloud platform where you can spin up servers, storage, or databases virtually (EC2 instances, S3 storage bucket, IAM users and roles)

VPC (Virtual Private Cloud) – isolated virtual private network built within a public cloud network (like AWS, Google Cloud, etc.) where you can launch, connect, and manage your computing resources in a secure environment
EC2 (Elastic Compute Cloud) Instance – a scalable virtual server in the AWS cloud that allows you to rent/run computing capacity; can deploy applications globally

Security Groups vs NACLs – both firewalls but work differently

Security Groups are stateful firewalls that control inbound and outbound traffic – return traffic is automatically allowed

NACLs (Network Access Control List) are stateless firewall used in cloud environments that acts at the subnet level to control traffic entering and leaving a subnet — both directions must be explicitly permitted

Hover over for a compare and contrast on SG and NACL.

Security Groups	NACLs
stateful–remember past interactions	stateless-forget past interactions
layer: (Elastic network interface)	layer: subnet
allow-only model–cannot block specific IPs	explicit allow/deny rules–can block specific IPs
evaluate all rules	process rules numerically, stop at first match
use for east-west traffic; traffic within an internal network	use for perimeter control (ex. Blocking known bad CIDRs, rate-limit abuse)

Security Group have rule limits (60 inbound/outbound per group). Combining both creates defense-in-depth — a misconfigured Security Group doesn’t automatically expose a subnet.

Project Architecture

Purpose: revolves around the Zero Trust principle – never implicitly trust traffic, every identity and rule has to be continuously verified

Building: A zero trust security automation toolkit for AWS with 5 tools

(1) SG Auditor (auditor/sg_auditor.py)

Purpose: Scans every Security Group in every AWS region in your account and flags any rule that allows unrestricted internet access (0.0.0.0/0 or ::/0) on admin ports like SSH (22) or RDP (3389).

Why it matters: An open port 22 to the entire internet is the single most common way AWS accounts get compromised.

What it produces: ASFF-formatted findings (Amazon Security Finding Format – JSON schema used by AWS Security Hub to aggregate compliance findings) posted directly to AWS Security Hub, which is AWS’s centralised security dashboard. It also exits with code 1 if findings exist, so it can act as a gate in CI — if someone accidentally opens port 22 to the world, the daily audit workflow fails and you get notified.

(2) Ansible automation: Ansible Hardening + AI Rule Reviewer

ansible/ — a set of Ansible playbooks that harden EC2 instances to the CIS (Center for Internet Security) Level 2 benchmark. CIS Level 2 is a published standard of 200+ security controls for Linux servers — things like disabling unused filesystems, hardening SSH config, enabling the audtid daemon, enforcing strict file permissions, and installing AIDE integrity checking.

The playbooks run in sequence: pre-scan → harden → post-scan → AI review

reviewer/sg_diff.py — the AI rule reviewer

Ansible runs > script takes a snapshot of the Security Group rules before and after the hardening run > sends the changed rules to Claude with context about what the instance does (its role, environment, owner tags) > Claude returns a structured JSON assessment of whether any changes look dangerous or unintended

Why the two together: Ansible can apply known-good configurations, but it can’t reason whether a specific rule change makes sense for a specific workload. The AI rule reviewer explains if a rule change makes sense— it can say “this rule opens port 5985 (WinRM) on a Linux web server, which makes no sense”

What it produces: A JSON findings file with severity, rule_id, finding, and suggested_fix fields. If any finding is critical, the Ansible play fails and the hardening is blocked

(3) Zeek Log Analyzer (zeek/)

Zeek is a network analysis framework that runs on your VPC traffic (via VPC Traffic Mirroring) and writes structured logs. This tool reads those logs and looks for attack patterns.

zeek/analyzer.py — the entry point. Loads conn.log and dns.log from disk, hands them to the detection modules, collects findings, and writes NDJSON (Newline Delimited JSON) output.

zeek/detections/lateral_movement.py — looks for signs an attacker is moving through your network after an initial compromise:

Fan-out: one internal host making SSH/RDP connections to 5+ other internal hosts (credential spray or worm)
Sequential sweep: connections to incrementally addressed IPs (automated scanning)
Admin from workstation: successful SSH connections between internal hosts where the source isn’t a designated jump host – violates zero trust since all admin access should go through SSM (AWS System Manager – operation hub to manage E2 instances) or a bastion

zeek/detections/data_staging.py — looks for signs an attacker is collecting data before exfiltrating it:

Volume spike: a single source→destination pair transfers more than 100 MB (database dump, file copy)
Fan-in: many internal hosts all sending large transfers to one destination (staging host)
Unusual protocol bulk: large transfers on non-standard ports (custom exfil tooling)

zeek/detections/dns_entropy.py — looks for malicious DNS behaviour:

High-entropy subdomains: random-looking subdomain labels that indicate DGA (Domain Generation Algorithm) malware trying to find its C2 server
Query volume spike: one host making hundreds of DNS queries per hour (C2 beaconing)
Long labels: subdomain labels >50 characters (DNS tunnelling tools like iodine encode data in DNS queries)
NX domain storm: a host getting NXDOMAIN on 70%+ of its queries (DGA malware cycling through generated domains)

What it produces: NDJSON findings to stdout or a file, one JSON object per line. Designed to be ingested into S3/Athena or OpenSearch for historical analysis.

(4) Gap Analyzer (gap_analyzer/)

gap_analyzer/controls/aws_controls_map.py — pure data; Maps each of the 7 NIST SP 800-207 Zero Trust tenets to concrete AWS controls with a scoring rubric (0=missing, 1=partial, 2=implemented, 3=automated)

gap_analyzer/nist_800_207.py — the analyzer;

Makes live read-only AWS API calls to check whether each control is actually in place (ex. Is GuardDuty enabled? Is CloudTrail multi-region with log validation? Are there Security Groups open to the internet? Is root MFA enabled? Are there stale access keys?) > scores each tenet (principle used to resolve trade offs) 0–3 > feeds the scores to Claude > Claude returns an executive summary and prioritized remediation list > Exports a Markdown report.

Why it matters: lists misconfigurations while mapping the actual AWS posture against a published security framework; tells you specifically what to fix in order of impact

Hover for more info on tenets level.

Level	Purpose	Tool
1	All resources identified	AWS Config, SSM Inventory
2	All communication secured	VPC Flow Logs, ACM, Macie
3	Per-session access	IAM roles, MFA, STS
4	Dynamic policy	Security Groups, GuardDuty, WAF
5	Continuous monitoring	CloudTrail, Inspector, CloudWatch
6	Dynamic auth/authz	Access Analyzer, credential rotation
7	Data collection	S3 logging, Config conformance packs

(5) Infrastructure files

ansible/inventory/hosts.yml — tells Ansible which EC2 instances to harden; Mode 1 for static IPs for testing, Mode 2 for aws_ec2 dynamic inventory plugin that queries the AWS API at run time based on instance tags

ansible/inventory/group_vars/all.yml — global CIS override settings; Which CIS rules to skip, SSH policy, password policy, audit config

ansible/inventory/group_vars/ec2_ubuntu.yml — Ubuntu-specific setting; Crucially, sets ansible_connection: aws_ssm so Ansible talks to instances through SSM Session Manager rather than SSH — no port 22 needed, which directly supports zero trust

ansible/playbooks/scan_pre.yml and scan_post.yml — run OpenSCAP (framework used to enforce security compliance and vulnerability baselines on Linux systems) before and after hardening to produce a compliance score. The post-scan playbook diffs the results and fails if hardening introduced regressions (controls that were passing before but aren’t after)

ansible/playbooks/ai_review.yml — calls sg_diff.py after the hardening run

Testing – tested against live ec2 instances in AWS Console

In simple terms, a VPC is a network and an EC2 instance is like a VM that sit in the network.

Note: Ansible does not run on windows machine, have to test through wsl

Need boto3 – AWS Software Development Kit (SDK) for Python; can interact with AWS EC2 through scripts

pip install boto3 moto[ec2,securityhub,sts] pytest pytest-cov -r requirements.txt

Installs boto3

Commands to set up aws cli and deploy live ec2 instances in README.md

Note: There is a limit to 5 VPCs to each region, change regions if necessary.

Step 1) Create a VPC > Enable DNS hostnames > Create a subnet > Create and attach an Internet Gateway > add route to the internet

What it looks like in AWS console

VPC created:

vpc1

Subnet & Routing table:

subrout

Step 2) Create security groups – 1 deliberately bad/misconfigured – with open SSH (22 port) RDP (3389 port) and default All traffic egress rule and 1 clean one with port 443 (HTTPS)

sg1

Good – 443 allows only trusted ips; uses end-to-end encryption (via TLS/SSL); allows stateful filtering meaning inbound traffic and outbound traffic on port 443 is allowed

sg2

Bad – Rule that allows unrestricted internet access (0.0.0.0/0 or ::/0) on admin ports like SSH (22) or RDP (3389)

sg3

Bad – All traffic allowed to go outbound

Step 3) Launch ec2 instances

ins1

(1) SG Auditor

python3 auditor/sg_auditor.py \ 
--regions us-east-2 \ 
--output test_findings.json

This posts findings to Security Hub and writes the full ASFF JSON to test_findings.json

sgaud1

(2) Gap Analyzer

python3 gap_analyzer/nist_800_207.py \
  --region us-east-2 \
  --output test_report_ai.md \
  --json-output test_scores_ai.json

gp1

(3) Zeek Log analyzer

python3 zeek/analyzer.py \
  --conn zeek/tests/fixtures/conn.log \
  --dns zeek/tests/fixtures/dns.log \
  --output zeek_findings.ndjson \
  --min-severity low

zlog1

(4) AI reviews – sg_diff

Take a current state of the bad SG

aws ec2 describe-security-groups \
  --group-ids $BAD_SG_ID \
  --region us-east-2 \
  --output json > /tmp/sg_before.json
echo "Before snapshot saved"
cat /tmp/sg_before.json | python3 -m json.tool | head -30

Rule modification - remove RDP rule

aws ec2 revoke-security-group-ingress \
  --group-id $BAD_SG_ID \
  --protocol tcp \
  --port 3389 \
  --cidr 0.0.0.0/0 \
  --region us-east-2
echo "RDP rule removed"

Do a dry run or use claude

python3 reviewer/sg_diff.py \
  --before /tmp/sg_before.json \
  --after  /tmp/sg_after.json \
  --instance zt-target \
  --role app \
  --env test \
  --owner yourself \
  --dry-run

To run with claude, change last line: —output /tmp/review_findings.json

huh1

This article is a basic overview of firewalls, principles, and the AWS console with an implementation of the ZTA principles by hardening. Thank you for reading, if you’d like to try it please clone my repo and use the commands in the README.md. The goal is to understand what a good SG policy looks like and work with live ec2 instances to understand the AWS console. Remember to delete the instances after as there are open ssh port 22 connections open which leave those instances vulnerable.