iptables The Firewall Under Everything

#iptables #Netfilter #Linux ## 1. The Firewall Under Everything Here is something that tends to surprise engineers who are new to Kubernetes: when you install a cluster and deploy a service, iptables rules start appearing on your nodes whether you asked for it or not. kube-proxy writes them. Flannel writes them. Calico writes them. Your CNI plugin is manipulating Netfilter chains every time a pod starts or a service changes. You may have spent years configuring cloud security groups and AWS VPC routing, never touching a raw iptables command, and yet your entire workload routing has been running on top of it the whole time. That is worth pausing on. iptables is not legacy technology being phased out. It is the load balancer for ClusterIP services on most clusters today. It is how Docker builds network isolation between containers. It is why your VPN server correctly routes traffic and why your homelab bastion can forward SSH to internal nodes. If you operate Linux infrastructure at any depth, you are operating on top of iptables — and if you do not understand it, you are flying blind. > [!note] > > The engineers who get burned by iptables are almost never the ones who know too much about it. They are the ones who assumed someone else handled it. This guide is not a command reference. You can read man pages for that. What this is about is building the correct mental model — understanding how packets actually move through the system, where the real risks are, and how to design firewall logic that does not fail you at 2am. --- ![[Pasted image 20260217122547.png]] ## 2. What iptables Actually Is The first thing to understand is that `iptables` itself does almost nothing. It is a user-space tool — a command-line interface that talks to the kernel subsystem called Netfilter. Netfilter is a set of hooks embedded into the Linux network stack. Every packet that enters, exits, or is forwarded through the kernel passes through these hooks. Netfilter is the engine. iptables is how you program it. This distinction matters because it explains the architecture's stability. The Netfilter hooks have been in the Linux kernel since version 2.4, and they have not fundamentally changed. iptables as a front-end is older than most current infrastructure engineers' careers. When people say iptables is "old," they are right — and that is part of why it is everywhere. Every Linux tool that needs packet filtering either uses it directly or builds on the same Netfilter hooks underneath. The complexity of iptables comes from one design decision: it tries to be general enough to handle almost any packet manipulation task. Filtering, NAT, packet marking, connection tracking, logging — all of it flows through the same framework. That generality is what makes it powerful and what makes it confusing. When you start learning iptables, you are not learning a firewall. You are learning a packet processing pipeline that can be used to build a firewall among other things. --- ## 3. How Packets Actually Move Through the System If there is one concept to internalize before writing a single rule, it is the packet flow model. Most production mistakes I have seen come directly from engineers who did not understand where in the flow their rule was being applied. ### Tables and Their Purpose Rules in iptables are organized into tables, each serving a different concern. The four you will realistically encounter are: | Table | Purpose | When You Use It | | -------- | -------------------------- | ---------------------------------------------------------------------- | | `filter` | Allow or deny packets | All standard firewalling. This is the default if you specify no table. | | `nat` | Address translation | Port forwarding, masquerading, DNAT for load balancing. | | `mangle` | Packet header modification | Changing TTL, marking packets for routing policy, QoS. | | `raw` | Bypass connection tracking | High-throughput scenarios where conntrack overhead is a problem. | ### Chains: Where Packets Enter the Pipeline Within each table, packets enter at a chain. The chain name reflects the hook in the network stack. `INPUT` handles packets destined for the local machine. `OUTPUT` handles packets generated by local processes. `FORWARD` handles packets that are routing through the machine to somewhere else — this is what kube-proxy is writing to when it routes between pods. The critical mental model: a packet destined for your server hits `PREROUTING` first, then gets routed, and if it is for a local process it hits `INPUT`. A packet you are forwarding hits `PREROUTING`, then `FORWARD`, then `POSTROUTING -**MASQUERADE**`. Never `INPUT`, never `OUTPUT`. This is where engineers write rules in the wrong chain and then spend an hour wondering why traffic is not matching. ### Rules and Targets Each rule in a chain specifies match conditions — protocol, source IP, destination port, connection state — and a target, which is what happens when the packet matches. `ACCEPT` and `DROP` are the common ones. `REJECT` sends back an ICMP error. `LOG` writes a log entry and continues processing. Rules are evaluated in order, and the first match wins. That ordering is not optional — it is the entire logic of how the firewall behaves. --- ## 4. Practical DevOps Usage Examples ### Protecting SSH Without Locking Yourself Out SSH access is both the most important thing to protect and the most dangerous rule to write incorrectly. The canonical failure mode: engineer sets default policy to DROP, forgets to allow port 22 first, saves the rules, and is now locked out of a cloud VM with no console access. The discipline is simple. Before you set a restrictive default policy, put the allow rules in first. Always test with the default policy still set to `ACCEPT`, verify your rules match, then change the policy. And add rate limiting to SSH from day one — it is not optional on anything internet-facing: ```bash # Rate-limit new SSH connections to 4 per minute per IP iptables -A INPUT -p tcp --dport 22 -m conntrack --ctstate NEW \ -m recent --name ssh --rcheck --seconds 60 --hitcount 4 -j DROP iptables -A INPUT -p tcp --dport 22 -m conntrack --ctstate NEW \ -m recent --name ssh --set -j ACCEPT ``` The `recent` module tracks per-IP hit counts in a kernel table. No external dependency, no daemon to manage. This alone eliminates a significant fraction of brute force exposure on SSH. ### Database Server Isolation Database servers are where misconfiguration is most costly. The correct design is to explicitly enumerate which source IPs are allowed on the database port, and drop everything else, rather than relying only on application-layer authentication: ```bash # Only these application servers may connect to PostgreSQL iptables -A INPUT -p tcp -s 192.168.10.5 --dport 5432 -j ACCEPT iptables -A INPUT -p tcp -s 192.168.10.6 --dport 5432 -j ACCEPT # Drop everything else attempting port 5432 — no REJECT, no logging, just drop iptables -A INPUT -p tcp --dport 5432 -j DROP ``` This pattern gives you defense in depth. Even if a database credential leaks, an attacker connecting from an unauthorized IP hits a wall before the authentication handshake begins. ### Kubernetes Node Hardening On a Kubernetes node, you are largely a passenger with respect to iptables — kube-proxy owns the `KUBE-*` chains and will overwrite rules it manages. What you can control is host-level traffic. Restrict `INPUT` on the host to kubelet API, SSH, and node-to-node ports; let your CNI handle pod-level policy. Do not try to manually manage rules in the same chains kube-proxy writes to. You will lose that battle. > **Operational Warning:** Never run `iptables -F` on a live Kubernetes node in production. It will flush the `KUBE-SERVICES` and `KUBE-FORWARD` chains and break all ClusterIP routing immediately. Pods will still run, but inter-service communication dies. Recovery requires restarting kube-proxy to rebuild its chains. --- ## 5. Security and Operational Best Practices ### Connection Tracking Is Not Optional Stateless packet filtering — matching only on IP, protocol, and port — was the approach before conntrack existed. You could make it work, but you had to write explicit rules for both directions of every connection. Connection tracking was a significant improvement: the kernel maintains a state table of active connections, and rules can match on state rather than re-specifying the whole connection pattern. The rule that should appear near the top of every INPUT chain: ```bash iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT iptables -A INPUT -m conntrack --ctstate INVALID -j DROP ``` The first line allows return traffic for connections your server already established outbound — outbound DNS queries, package downloads, API calls. The second drops malformed packets that do not match any tracked connection. Without these two rules, you will write rules that seem to allow outbound connections but responses never make it back, and you will spend real time debugging what is a fundamentally simple misconfiguration. ### Log What You Drop, Not Everything Logging every dropped packet on an internet-facing server will fill your disk and degrade performance. The useful logging strategy is targeted: log on custom chains for specific threat categories, rate-limit the log rule itself, and write to a dedicated file. A `LOG` target does not terminate processing — the packet continues to the next rule — so you can log and then drop cleanly: ```bash # Create a reusable chain that logs then drops iptables -N LOG_AND_DROP iptables -A LOG_AND_DROP -m limit --limit 5/min \ -j LOG --log-prefix "FW-DROP: " --log-level 4 iptables -A LOG_AND_DROP -j DROP ``` ### Rule Order Is Not a Detail Rules are evaluated top to bottom, first match wins. A broad `ACCEPT` rule before a narrow `DROP` rule means the `DROP` never fires. Engineers who learned iptables by adding rules incrementally often end up with chains where early rules silently supersede later ones. If a rule never shows packet hits in `iptables -L -v`, it is either never matched or shadowed by an earlier rule — both of which deserve investigation. --- ## 6. Performance and Scaling Considerations iptables performance becomes a real concern when rule sets grow large. Each packet traverses the chain linearly until it matches. A chain with five hundred rules means, in the worst case, five hundred comparisons per packet. This is not hypothetical — Kubernetes clusters with hundreds of services accumulate `KUBE-SERVICES` chain entries at exactly this rate, and it was a known scalability problem that motivated the development of IPVS mode for kube-proxy and eventually nftables. ### The Connection Tracking Table Limit The most operationally significant performance constraint is `nf_conntrack_max` — the maximum number of connections the kernel will track simultaneously. The default on many distributions is 65,536. On a high-traffic server or a Kubernetes node handling many short-lived pod connections, this fills up. When it does, new connections fail with cryptic errors and the kernel logs `nf_conntrack: table full, dropping packet`. Check your current table usage: ```bash # Current tracked connections vs maximum cat /proc/net/nf_conntrack | wc -l cat /proc/sys/net/netfilter/nf_conntrack_max # Increase limit and reduce TCP timeout for established connections sysctl -w net.netfilter.nf_conntrack_max=262144 sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=300 ``` ### Use ipset for Large IP Lists If you need to block or allow hundreds of IP addresses, do not write hundreds of iptables rules. Use `ipset` to create a hash table of IPs in the kernel, then match the set in a single rule. The performance difference is not marginal — an ipset lookup is O(1), a chain traversal is O(n). For geo-blocking or blocklist management, ipset is the only operationally sane approach: ```bash # Create and populate a blocklist set ipset create blocklist hash:ip ipset add blocklist 198.51.100.0/24 # One rule handles the entire list iptables -A INPUT -m set --match-set blocklist src -j DROP ``` --- ## 7. Common Mistakes That Will Ruin Your Day - **Forgetting persistence.** iptables rules live in kernel memory. A reboot or a service restart flushes everything. If you wrote the perfect rule set and did not run `iptables-save` or configure `iptables-persistent`, it is gone in the morning. This one has caught engineers who have been working with iptables for years. - **No conntrack rule, but a restrictive default policy.** The server accepts the connection, sends back SYN-ACK, then the response to the client's ACK hits INPUT and drops because there is no ESTABLISHED allow rule. The client sees a connection that never completes. Debugging this via tcpdump and iptables logs takes time that a three-line conntrack rule would have prevented. - **Testing rule changes in production without a fallback.** The safe approach: set a cron job to run `iptables -P INPUT ACCEPT && iptables -F` every ten minutes before you start, then delete the cron job after you verify access. If you lock yourself out, the cron fires and saves you. This feels paranoid until the one time it matters. - **Assuming cloud security groups replace host firewalls.** They are different layers. A cloud security group blocks traffic before it reaches your instance. iptables operates on traffic after it arrives at the kernel. For internal VPC traffic, or for Kubernetes pod-to-pod communication, the cloud security group may not be enforcing what you expect. - **Writing DROP rules instead of logging first.** When debugging a service that cannot reach another service, if you have no logging on your DROP rules you have no visibility. Write your rules to log first, verify the log shows what you expect, then convert to drop. Debugging is much faster when you have evidence. --- ## 8. iptables in Modern Infrastructure The question engineers ask is: if we have cloud security groups, Kubernetes NetworkPolicies, and service meshes, why do I still need to understand iptables? The answer is that none of those things replace it at the kernel level — they either implement their policies using it, or they sit at a different layer entirely. Kubernetes NetworkPolicies, depending on your CNI plugin, are often implemented as iptables rules. Calico translates NetworkPolicy objects into iptables chains. Flannel uses iptables for masquerading egress traffic. Even with Cilium — which uses eBPF — the iptables rules do not entirely disappear from a cluster running in a compatibility mode. Docker builds network bridges and masquerade rules using iptables every time a container starts. You can verify this yourself on any Docker host by comparing `iptables -L -t nat` before and after `docker run`. The practical implication: when something in your container or Kubernetes networking is broken, knowing how to read and interpret iptables rules is a core debugging skill. The engineer who can look at `iptables -L -v -n` output and trace where a packet is being dropped will find the problem in minutes. The engineer who cannot will escalate it as a "network issue" and wait for someone else. --- ## 9. When to Move Past iptables iptables has real limitations that are worth being honest about. The linear rule traversal model does not scale well past a few hundred rules in a chain. The atomic rule management is difficult — you cannot update the rule set transactionally, so there are brief windows during rule replacement where the firewall state is inconsistent. The syntax is arcane enough that rule review in code review is error-prone. **nftables** is the direct successor, built on the same Netfilter hooks but with a cleaner rule language and better performance characteristics. It ships with all major current distributions and is what `iptables` calls transparently on some systems via a compatibility layer. If you are designing a new system from scratch and you have the option, nftables is the correct choice. The concepts transfer almost directly. **Cloud security groups** are appropriate for external perimeter control on cloud VMs. They operate before traffic reaches the instance and have no per-packet processing overhead at the OS level. Use them. But do not use them as a substitute for host-level firewalling — they are a different layer. **eBPF-based solutions** like Cilium operate at a fundamentally different performance level and allow policy enforcement with much finer granularity and lower latency than iptables. On large Kubernetes clusters running thousands of services, the difference in kube-proxy behavior between iptables mode and eBPF mode is measurable. But eBPF is a newer operational surface — if your team does not have experience debugging eBPF programs, adding that dependency to your production stack requires honest assessment. For most teams managing ten to a hundred Linux servers, a mix of cloud security groups at the perimeter and iptables (or nftables) on the host is still the right answer. It is well-understood, well-documented, and the operational tooling is mature. Adopt newer technology when you have a specific, measurable problem that the new technology solves — not because it is newer. --- ## 10. The Engineering Takeaway iptables is powerful, genuinely useful, and still operating under most Linux infrastructure whether you interact with it directly or not. The engineers who get burned by it are rarely the ones who spent time understanding the packet flow model. They are the ones who copied rules from Stack Overflow without understanding what the rules were doing, or who assumed their cloud security group made host firewalling unnecessary, or who never thought about what happens to their firewall rules after a reboot. The discipline iptables rewards is systematic thinking about network state: what traffic should be allowed, where rules are applied in the packet flow, what happens when an assumption is wrong. That discipline transfers to every other network security tool in your stack. Whether you end up managing iptables directly, debugging Kubernetes CNI behavior, or tuning nftables rules for a high-throughput router, the mental model you build here applies. Understand the flow. Write rules with intent. Test before you trust. Log before you drop. And always, before you change the default policy to DROP, make sure your SSH allow rule is already in place. --- _Infrastructure Engineering Series — Linux, Networking & Security_