From Understanding to Operating#
The fundamentals covered what each layer does. The intermediate post covered how to diagnose issues. This post covers how production systems are architected across the network stack.
BGP — How the Internet Routes#
BGP (Border Gateway Protocol) operates at Layer 3 and is how autonomous systems (ASes) — ISPs, cloud providers, large organizations — exchange routing information.
Why it matters#
Every time you connect to a website, BGP determines the path your packets take across multiple networks. BGP misconfigurations or hijacks can take down large portions of the internet.
Key concepts#
| Term | Meaning |
|---|---|
| AS (Autonomous System) | A network under single administrative control (e.g., your ISP) |
| AS Number (ASN) | Unique identifier for an AS (e.g., AS13335 is Cloudflare) |
| Prefix | An IP range advertised via BGP (e.g., 104.16.0.0/12) |
| Peering | Direct connection between two ASes |
| Transit | Paying another AS to carry your traffic |
Viewing BGP routes#
# Check which AS announces an IP
whois -h whois.radb.net 1.1.1.1
# View BGP path to a prefix
traceroute -A 1.1.1.1
# Query a route server for BGP info
dig +short TXT 1.1.1.1.origin.asn.cymru.com
# Returns: "13335 | 1.1.1.0/24 | US | arin | 2014-03-28"BGP on Linux (bird)#
For environments where you run your own BGP (datacenters, Kubernetes with MetalLB/Calico):
# Check BGP session status
sudo birdc show protocols
# View received routes
sudo birdc show route
# Check specific prefix
sudo birdc show route for 10.0.0.0/24BGP security concerns#
- Route hijacking — malicious AS advertises someone else’s prefix
- Route leaks — accidental propagation of routes to unintended networks
- Mitigation — RPKI (Resource Public Key Infrastructure) validates route origins
# Check RPKI validation status of a prefix
curl -s https://stat.ripe.net/data/rpki-validation/data.json?resource=AS13335&prefix=1.1.1.0/24Anycast — One IP, Many Locations#
Anycast assigns the same IP address to multiple servers in different geographic locations. The network routes users to the nearest instance.
How it works#
User in Tokyo → routes to Tokyo server (same IP: 1.1.1.1)
User in London → routes to London server (same IP: 1.1.1.1)
User in New York → routes to New York server (same IP: 1.1.1.1)BGP makes this possible — each location advertises the same prefix, and routing naturally selects the closest path.
Use cases#
| Service | Why Anycast |
|---|---|
| DNS (root servers, Cloudflare, Google) | Low-latency name resolution worldwide |
| CDNs | Serve static content from nearest edge |
| DDoS mitigation | Distribute attack traffic across many locations |
Anycast vs Unicast vs Multicast#
| Type | Description | Example |
|---|---|---|
| Unicast | One-to-one, single destination | Regular web server |
| Anycast | One-to-nearest, same IP at many locations | CDN, DNS |
| Multicast | One-to-many, group delivery | Video streaming, IPTV |
| Broadcast | One-to-all on a segment | ARP, DHCP discover |
Verifying anycast behavior#
# Same IP resolves to different servers depending on your location
dig +short whoami.cloudflare.com @1.1.1.1
# Trace to see which PoP you hit
traceroute 1.1.1.1
mtr --report 1.1.1.1Container Networking — Network Namespaces in Practice#
Containers use Linux network namespaces to create isolated network stacks. Understanding this demystifies Docker and Kubernetes networking.
Network namespaces#
Each namespace gets its own interfaces, routes, and iptables rules:
# Create a network namespace
sudo ip netns add app1
# List namespaces
ip netns list
# Run a command inside the namespace
sudo ip netns exec app1 ip addr show
# Only sees loopback — fully isolated
# Create a veth pair (virtual cable between namespaces)
sudo ip link add veth-host type veth peer name veth-app1
# Move one end into the namespace
sudo ip link set veth-app1 netns app1
# Assign IPs
sudo ip addr add 10.0.0.1/24 dev veth-host
sudo ip netns exec app1 ip addr add 10.0.0.2/24 dev veth-app1
# Bring interfaces up
sudo ip link set veth-host up
sudo ip netns exec app1 ip link set veth-app1 up
# Test connectivity
ping -c 2 10.0.0.2Docker networking#
Docker uses this same mechanism with a bridge:
# View Docker's bridge network
docker network inspect bridge
# See veth pairs connecting containers to the bridge
bridge link show
# View NAT rules Docker creates
sudo iptables -t nat -L -n | grep DOCKER
# Debug container DNS
docker exec mycontainer cat /etc/resolv.confKubernetes networking model#
Kubernetes requires:
- Every Pod gets its own IP
- Pods can communicate without NAT
- Nodes can communicate with Pods without NAT
CNI plugins (Calico, Cilium, Flannel) implement this differently:
| CNI Plugin | Approach | Layer |
|---|---|---|
| Flannel | VXLAN overlay | L2 over L3 |
| Calico | BGP routing | L3 |
| Cilium | eBPF datapath | L3/L4 |
# View Pod IPs and node assignments
kubectl get pods -o wide
# Check CNI configuration
ls /etc/cni/net.d/
# View routes on a node (Calico)
ip route | grep cali
# Debug connectivity between pods
kubectl exec pod-a -- ping <pod-b-ip>
kubectl exec pod-a -- traceroute <pod-b-ip>mTLS — Mutual TLS for Zero Trust#
Standard TLS (HTTPS) only verifies the server’s identity. mTLS requires both client and server to present certificates — enabling zero-trust service-to-service authentication.
How mTLS works#
Client Server
│ │
├── ClientHello ───────────────→│
│←── ServerHello + ServerCert ──┤
│←── CertificateRequest ────────┤ (server asks for client cert)
├── ClientCert + Verify ───────→│ (client proves identity)
│ │
│←── Encrypted connection ─────→│ (both sides verified)Testing mTLS with curl#
# Generate CA
openssl req -x509 -newkey rsa:4096 -keyout ca-key.pem -out ca-cert.pem -days 365 -nodes -subj "/CN=MyCA"
# Generate server cert
openssl req -newkey rsa:4096 -keyout server-key.pem -out server-csr.pem -nodes -subj "/CN=server.local"
openssl x509 -req -in server-csr.pem -CA ca-cert.pem -CAkey ca-key.pem -CAcreateserial -out server-cert.pem -days 365
# Generate client cert
openssl req -newkey rsa:4096 -keyout client-key.pem -out client-csr.pem -nodes -subj "/CN=client-app"
openssl x509 -req -in client-csr.pem -CA ca-cert.pem -CAkey ca-key.pem -CAcreateserial -out client-cert.pem -days 365
# Connect with mTLS
curl --cacert ca-cert.pem --cert client-cert.pem --key client-key.pem https://server.local:8443/Where mTLS is used#
- Service meshes (Istio, Linkerd) — automatic mTLS between all services
- Kubernetes API server — kubelet ↔ API server communication
- Database connections — PostgreSQL, MongoDB client certificate auth
- Zero-trust networks — replace VPN with per-service identity
eBPF and XDP — Programmable Packet Processing#
eBPF (extended Berkeley Packet Filter) lets you run sandboxed programs in the Linux kernel. XDP (eXpress Data Path) hooks into the network driver for line-rate packet processing.
Where eBPF operates in the stack#
Packet arrives at NIC
→ XDP (before kernel networking stack) ← fastest, can drop/redirect
→ tc (traffic control, after sk_buff allocation)
→ Netfilter/iptables
→ Socket layer
→ ApplicationPractical eBPF networking tools#
# Install bcc tools (eBPF toolkit)
sudo dnf install bcc-tools
# Trace TCP connections in real time
sudo tcptracer-bpfcc
# Monitor TCP retransmissions
sudo tcpretrans-bpfcc
# Track TCP connection latency
sudo tcpconnlat-bpfcc
# Snoop DNS queries without tcpdump
sudo gethostlatency-bpfccCilium — eBPF-based Kubernetes networking#
Cilium replaces iptables with eBPF programs for faster, more observable networking:
# Check Cilium status
cilium status
# View eBPF-enforced network policies
cilium endpoint list
# Monitor packet drops with reason
cilium monitor --type drop
# Trace a specific flow
cilium monitor --from-ip 10.0.0.5 --to-ip 10.0.0.10Why eBPF matters#
| Traditional | eBPF |
|---|---|
| iptables with thousands of rules → slow | O(1) lookup via BPF maps |
| tcpdump captures everything → disk pressure | Targeted in-kernel filtering |
| Sidecar proxy per pod → overhead | Kernel-level enforcement |
Service Mesh — L4/L7 Networking Abstraction#
A service mesh adds a network proxy alongside each service, handling concerns like routing, observability, and security transparently.
Architecture#
Service A → [Envoy Sidecar] → network → [Envoy Sidecar] → Service B
↕ ↕
Control Plane (Istio/Linkerd)What the mesh handles by layer#
| OSI Layer | Mesh Feature |
|---|---|
| 7 | HTTP routing, retries, header-based traffic splitting |
| 6 | mTLS encryption between services |
| 5 | Connection pooling, circuit breaking |
| 4 | TCP health checks, load balancing |
Istio traffic management#
# Route 90% of traffic to v1, 10% to v2 (canary)
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: my-service
spec:
hosts:
- my-service
http:
- route:
- destination:
host: my-service
subset: v1
weight: 90
- destination:
host: my-service
subset: v2
weight: 10Debugging mesh networking#
# Check sidecar proxy status
istioctl proxy-status
# View Envoy configuration for a pod
istioctl proxy-config routes deploy/my-service
# Check if mTLS is active between services
istioctl authn tls-check my-service.defaultNetwork Observability#
At production scale, you need visibility across all layers simultaneously.
Distributed tracing#
Traces follow a request across network hops and services:
# View traces (requires Jaeger/Zipkin)
# Each span shows:
# - DNS resolution time (L7)
# - TCP connection time (L4)
# - TLS handshake time (L6)
# - Application processing time (L7)Key metrics by layer#
| Layer | What to Monitor | Tool |
|---|---|---|
| 1/2 | Link errors, interface drops | ethtool -S eth0 |
| 3 | Route changes, ICMP unreachable | ip monitor route |
| 4 | Retransmissions, connection states, port exhaustion | ss -s, netstat -s |
| 7 | Request latency, error rates, throughput | Prometheus, application metrics |
Continuous network monitoring#
# Watch for interface errors
watch -n 5 'ip -s link show eth0 | grep -E "errors|dropped"'
# Monitor conntrack table usage
watch -n 5 'cat /proc/sys/net/netfilter/nf_conntrack_count; echo "/"; cat /proc/sys/net/netfilter/nf_conntrack_max'
# Alert on TCP retransmissions (quick one-liner)
while true; do
retrans=$(netstat -s | grep "segments retransmitted" | awk '{print $1}')
echo "$(date): $retrans retransmissions"
sleep 60
done
# Packet drop reasons (requires dropwatch or perf)
sudo dropwatch -l kasFlowlogs and packet metadata#
# Capture flow summaries (not full packets)
sudo tcpdump -i eth0 -n -q -t | awk '{print $1, $3, $5}' | sort | uniq -c | sort -rn | head -20
# Conntrack flow accounting
sudo conntrack -L -o extended | awk '{print $3}' | sort | uniq -c | sort -rnBest Practices#
- Learn BGP fundamentals even if you don’t run it — understanding AS paths helps explain latency and outages
- Use network namespaces locally to test firewall rules and routing before deploying
- Implement mTLS between services instead of relying on network perimeter security
- Consider eBPF-based tools over iptables for high-connection-rate environments
- Instrument at every layer — a slow HTTP response might be caused by TCP retransmissions, which are caused by MTU issues
- Treat the network as untrusted by default — zero trust means verifying identity at Layer 6/7 regardless of network position
- Automate observability — you can’t troubleshoot what you can’t see

