Beyond the Basics#
The basics post covered what each OSI layer does. This post covers how they interact in practice — the things that break in production and the tools to diagnose them.
Packet Captures with tcpdump#
tcpdump lets you observe traffic at Layers 2–7 in real time. It’s the fastest way to confirm what’s actually happening on the wire.
Capture all traffic on an interface#
sudo tcpdump -i eth0 -nFilter by host and port#
sudo tcpdump -i eth0 host 10.0.0.5 and port 443 -nCapture to a file for Wireshark analysis#
sudo tcpdump -i eth0 -w /tmp/capture.pcap -c 1000Watch a TCP handshake#
sudo tcpdump -i eth0 'tcp[tcpflags] & (tcp-syn) != 0' -nRead specific fields#
# Show packet sizes
sudo tcpdump -i eth0 -n -l | awk '{print $NF}'
# Show only RST packets (connection resets)
sudo tcpdump -i eth0 'tcp[tcpflags] & (tcp-rst) != 0' -nWhat to look for:
| Symptom | Likely Layer | Cause |
|---|---|---|
| SYN sent, no SYN-ACK | 3/4 | Firewall drop, host down, wrong route |
| SYN-ACK received, then RST | 4/7 | Port closed, application crashed |
| Retransmissions | 3/4 | Packet loss, congestion, MTU issues |
| Connection established, no data | 7 | Application-level bug, TLS failure |
MTU and Fragmentation#
MTU (Maximum Transmission Unit) is the largest packet size a link can carry. Standard Ethernet MTU is 1500 bytes.
Why MTU matters#
When a packet exceeds the link’s MTU:
- If the Don’t Fragment (DF) bit is set → packet is dropped, ICMP “fragmentation needed” is sent back
- If DF is not set → packet is fragmented into smaller pieces
Detecting MTU issues#
Path MTU discovery failures are a common cause of mysterious connectivity problems — connections establish (small packets) but hang when transferring data (large packets).
# Find the path MTU to a host
ping -c 4 -M do -s 1472 example.com
# If this fails, reduce size until it works
ping -c 4 -M do -s 1400 example.com
# Check interface MTU
ip link show eth0 | grep mtu
# Trace path MTU
tracepath example.comCommon MTU scenarios#
| Scenario | Effective MTU | Why |
|---|---|---|
| Standard Ethernet | 1500 | Default |
| VPN tunnel (WireGuard) | ~1420 | Tunnel headers consume space |
| PPPoE (some ISPs) | 1492 | PPPoE header is 8 bytes |
| Docker overlay network | ~1450 | VXLAN encapsulation overhead |
| Jumbo frames (datacenter) | 9000 | Must be configured end-to-end |
Fixing MTU problems#
# Set interface MTU
sudo ip link set eth0 mtu 1400
# Persistent (NetworkManager)
nmcli connection modify "Wired" 802-3-ethernet.mtu 1400
# Clamp TCP MSS in iptables (fixes MTU issues for TCP without changing MTU)
sudo iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtuTCP Deep Dive#
Connection states#
A TCP connection transitions through states. Understanding these helps diagnose stuck connections:
LISTEN → SYN_RECEIVED → ESTABLISHED → FIN_WAIT_1 → FIN_WAIT_2 → TIME_WAIT → CLOSED
→ CLOSE_WAIT → LAST_ACK → CLOSED# Count connections by state
ss -ant | awk '{print $1}' | sort | uniq -c | sort -rnTIME_WAIT accumulation#
After closing a connection, TCP enters TIME_WAIT for 2×MSL (usually 60 seconds). High-traffic servers can accumulate thousands:
# Count TIME_WAIT connections
ss -ant state time-wait | wc -l
# Check system limits
cat /proc/sys/net/ipv4/tcp_max_tw_bucketsIf TIME_WAIT is exhausting ports:
# Enable socket reuse (safe for clients)
sudo sysctl -w net.ipv4.tcp_tw_reuse=1
# Make persistent
echo "net.ipv4.tcp_tw_reuse = 1" | sudo tee -a /etc/sysctl.d/99-tcp.confTCP window size and throughput#
TCP window size limits how much data can be in-flight before an acknowledgment. On high-latency links, small windows throttle throughput:
# Check current TCP buffer sizes
sysctl net.ipv4.tcp_rmem
sysctl net.ipv4.tcp_wmem
# Increase for high-bandwidth links
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"Retransmissions#
Retransmissions indicate packet loss. High retransmission rates degrade performance:
# Check retransmission stats
netstat -s | grep -i retrans
# Watch in real time
watch -n 1 'netstat -s | grep retrans'
# Per-connection retransmissions
ss -ti | grep -A1 retransNAT (Network Address Translation)#
NAT rewrites packet headers at Layer 3, allowing multiple devices to share a single public IP.
Types of NAT#
| Type | Description | Use Case |
|---|---|---|
| SNAT | Rewrites source IP | Outbound traffic from private network |
| DNAT | Rewrites destination IP | Port forwarding, load balancing |
| Masquerade | SNAT with dynamic source IP | Home routers, VPNs |
NAT with nftables#
# Masquerade outbound traffic
sudo nft add table nat
sudo nft add chain nat postrouting { type nat hook postrouting priority 100 \; }
sudo nft add rule nat postrouting oifname "eth0" masquerade
# Port forward 8080 → internal server:80
sudo nft add chain nat prerouting { type nat hook prerouting priority -100 \; }
sudo nft add rule nat prerouting iifname "eth0" tcp dport 8080 dnat to 192.168.1.10:80Diagnosing NAT issues#
# View active NAT connections
sudo conntrack -L
# Count NAT table entries
sudo conntrack -C
# Check if NAT table is full
cat /proc/sys/net/netfilter/nf_conntrack_max
cat /proc/sys/net/netfilter/nf_conntrack_count
# Symptoms of NAT table exhaustion: new connections randomly fail
# Fix: increase the limit
sudo sysctl -w net.netfilter.nf_conntrack_max=262144Firewalls Mapped to OSI Layers#
Linux firewalls (iptables/nftables) operate across multiple layers:
| Chain/Hook | OSI Layer | What it filters |
|---|---|---|
| Raw/prerouting | 3 | Before connection tracking |
| Mangle | 3/4 | Packet modification (TTL, TOS, MSS) |
| NAT | 3/4 | Address/port rewriting |
| Filter (input/forward/output) | 3/4 | Accept/drop decisions |
nftables examples at different layers#
# Layer 3: Block an IP
sudo nft add rule inet filter input ip saddr 10.0.0.50 drop
# Layer 4: Allow only specific ports
sudo nft add rule inet filter input tcp dport { 22, 80, 443 } accept
sudo nft add rule inet filter input drop
# Layer 4: Rate limit connections
sudo nft add rule inet filter input tcp dport 22 ct state new limit rate 3/minute accept
# Stateful filtering (Layer 4/5)
sudo nft add rule inet filter input ct state established,related accept
sudo nft add rule inet filter input ct state invalid dropDiagnosing firewall drops#
# Log dropped packets
sudo nft add rule inet filter input log prefix "DROPPED: " drop
# Watch the log
sudo journalctl -f | grep DROPPED
# Count drops per chain
sudo nft list ruleset | grep -c dropVLAN Trunking (Layer 2/3 Boundary)#
VLANs segment a physical network into logical broadcast domains at Layer 2.
Creating a VLAN interface#
# Add VLAN 100 on eth0
sudo ip link add link eth0 name eth0.100 type vlan id 100
sudo ip addr add 10.100.0.1/24 dev eth0.100
sudo ip link set eth0.100 upInter-VLAN routing#
Devices on different VLANs can’t communicate at Layer 2. They need a Layer 3 router:
# Enable IP forwarding (acts as router between VLANs)
sudo sysctl -w net.ipv4.ip_forward=1
# Device on VLAN 100 (10.100.0.0/24) can now reach VLAN 200 (10.200.0.0/24)
# if this host has interfaces on both VLANsDiagnosing VLAN issues#
# Verify VLAN is configured
cat /proc/net/vlan/config
# Check tagged traffic is arriving
sudo tcpdump -i eth0 -e vlan -n
# Common problem: switch port not set to trunk mode
# You'll see no traffic on the VLAN interfaceL4 vs L7 Load Balancing#
Load balancers operate at different OSI layers with different tradeoffs:
Layer 4 (Transport)#
Routes based on IP and port. Doesn’t inspect content.
Client → [L4 LB] → Backend
↓
Decision: IP hash, round-robin, least connections
Sees: Source/dest IP, source/dest port
Doesn't see: HTTP headers, URLs, cookiesPros: Fast, low overhead, protocol-agnostic Cons: Can’t route based on content, no SSL termination
Example: Linux IPVS
# Install
sudo dnf install ipvsadm
# Add virtual service
sudo ipvsadm -A -t 10.0.0.1:80 -s rr
# Add backends
sudo ipvsadm -a -t 10.0.0.1:80 -r 10.0.0.10:80 -m
sudo ipvsadm -a -t 10.0.0.1:80 -r 10.0.0.11:80 -m
# Check status
sudo ipvsadm -L -nLayer 7 (Application)#
Inspects application content to make routing decisions.
Client → [L7 LB] → Backend
↓
Decision: URL path, Host header, cookie, method
Sees: Full HTTP request
Can: Terminate TLS, rewrite headers, cachePros: Content-based routing, SSL termination, header manipulation Cons: Higher latency, more resource intensive, protocol-specific
Example: nginx as L7 load balancer
upstream api_servers {
server 10.0.0.10:8080;
server 10.0.0.11:8080;
}
upstream static_servers {
server 10.0.0.20:80;
server 10.0.0.21:80;
}
server {
listen 443 ssl;
location /api/ {
proxy_pass http://api_servers;
}
location /static/ {
proxy_pass http://static_servers;
}
}When to use which#
| Requirement | Use |
|---|---|
| Simple TCP/UDP distribution | L4 |
| Route by URL or header | L7 |
| Non-HTTP protocols (database, MQTT) | L4 |
| SSL termination | L7 |
| Maximum performance | L4 |
| A/B testing, canary deploys | L7 |
Best Practices#
- Capture packets before guessing —
tcpdumpeliminates speculation - Test MTU end-to-end when setting up VPNs or overlay networks
- Monitor TIME_WAIT and conntrack table size on high-traffic systems
- Log firewall drops — silent drops are the hardest to troubleshoot
- Choose load balancer layer based on whether you need content awareness
- When debugging, correlate layers — a Layer 7 timeout might be caused by Layer 3 packet loss

