The Firewall Under Everything

You may never touch an iptables command directly — but it is already running on every Linux server, every Kubernetes node, and every Docker host you operate. Here is what you actually need to understand about it. ## 1. The Firewall You Did Not Configure Is Already Running The internet works by sending and receiving small chunks of data called packets. In the early days, when the internet was still finding its footing, packets moved freely between all connected systems. Anyone could send packets to your machine; all services were exposed by default. The connected world was small enough that this felt acceptable. Then it grew, and the problems grew with it — worms, viruses, unauthorized access, denial-of-service attacks, IP spoofing. The need to control what traffic could reach a system became urgent. iptables is the answer Linux arrived at after two earlier attempts that did not go far enough. Here is something that tends to surprise engineers who are new to Kubernetes: when you install a cluster and deploy a service, iptables rules start appearing on your nodes whether you asked for it or not. kube-proxy writes them. Flannel writes them. Calico writes them. Your CNI plugin is manipulating Netfilter chains every time a pod starts or a service changes. You may have spent years configuring cloud security groups and AWS VPC routing, never touching a raw iptables command, and yet your entire workload routing has been running on top of it the whole time. That is worth pausing on. iptables is not legacy technology being phased out. It is the load balancer for ClusterIP services on most clusters today. It is how Docker builds network isolation between containers. It is why your VPN server correctly routes traffic and why your homelab bastion can forward SSH to internal nodes. If you operate Linux infrastructure at any depth, you are operating on top of iptables — and if you do not understand it, you are flying blind. > The engineers who get burned by iptables are almost never the ones who know too much about it. They are the ones who assumed someone else handled it. This guide is not a command reference. You can read man pages for that. What this is about is building the correct mental model — understanding how packets actually move through the system, where the real risks are, and how to design firewall logic that does not fail you at 2am. ## 2. What iptables Actually Is The first thing to understand is that `iptables` itself does almost nothing. It is a user-space tool — a command-line interface that talks to the kernel subsystem called Netfilter. Netfilter is a set of hooks embedded into the Linux network stack. Every packet that enters, exits, or is forwarded through the kernel passes through these hooks. Netfilter is the engine. iptables is how you program it. This distinction matters because it explains the architecture's stability. The Netfilter hooks have been in the Linux kernel since version 2.4, and they have not fundamentally changed. iptables as a front-end is older than most current infrastructure engineers' careers. When people say iptables is "old," they are right — and that is part of why it is everywhere. Every Linux tool that needs packet filtering either uses it directly or builds on the same Netfilter hooks underneath. The complexity of iptables comes from one design decision: it tries to be general enough to handle almost any packet manipulation task. Filtering, NAT, packet marking, connection tracking, logging — all of it flows through the same framework. That generality is what makes it powerful and what makes it confusing. When you start learning iptables, you are not learning a firewall. You are learning a packet processing pipeline that can be used to build a firewall among other things. ![[Pasted image 20260217122547.png]] ## 3. How We Got Here: A Brief History Understanding why iptables is designed the way it is requires a short look at what came before it. In early Linux systems there were basic tools to handle networking — connecting to remote machines, sharing data, running servers. But raw connectivity quickly proved insufficient as the internet grew. Two demands drove the need for something better: firewalls that could block or allow traffic based on defined rules, and Network Address Translation to let multiple devices share a single IPv4 address as the address space ran short. The first attempt was **ipfwadm**, introduced directly into the kernel. It offered only rudimentary packet filtering and had no stateful inspection whatsoever. If you allowed outbound ICMP, the kernel had no automatic mechanism to permit the reply — you had to configure both directions manually, for every protocol, every service. There was no concept of connection state. You were writing rules for a stateless world, and the operational overhead was significant. The second attempt was **ipchains**. It was a meaningful improvement: more flexible rule matching, the beginnings of connection tracking. But the connection tracking still was not truly stateful, there was no IPv6 support, and the design did not scale — as rule sets grew, traversal slowed and management became increasingly error-prone. The architecture was showing its age before the internet had finished growing into it. iptables was the third attempt, and it was designed differently from the ground up. Rather than embedding packet filtering logic directly into the kernel networking code, the developers created a general-purpose framework — Netfilter — that could sit underneath multiple tools, not just iptables. That architectural decision is why iptables has remained relevant for two decades while most other tools from that era are long retired. ## 4. Netfilter: The Engine Underneath Before iptables made sense as a tool, the kernel needed a framework it could plug into. That framework is Netfilter. Initial packet filtering in Linux was linear and inflexible — there was no clean way to intercept, modify, or drop a packet mid-flight without patching the core networking code. Netfilter solved this by introducing well-defined **hook points** at critical stages of the Linux networking stack, and providing an API that lets kernel modules register callback functions at those hooks. The five hook points correspond to stages every packet passes through: - `PREROUTING` — Immediately after a packet arrives on an interface, before any routing decision. - `INPUT` — After routing confirms the packet is destined for the local machine. - `FORWARD` — For packets the routing decision sends to another interface rather than delivering locally. - `OUTPUT` — Packets generated by local processes, before they leave. - `POSTROUTING` — After all routing decisions are complete, just before a packet exits the machine. With these hooks in place, the Linux networking stack calls each registered function as packets traverse the pipeline. This is how iptables installs its filtering, NAT, and logging logic — it is just a structured set of Netfilter hook registrations. The registration API looks like this: ```c /* Register a single hook function at a Netfilter hook point */ int nf_register_net_hook(struct net *net, const struct nf_hook_ops *ops); /* Register multiple hooks at once */ int nf_register_net_hooks(struct net *net, const struct nf_hook_ops *ops, unsigned int n); /* nf_hook_ops defines the hook function pointer, the hook point (e.g. NF_INET_INPUT), the protocol family, and the priority — lower priority value runs first */ ``` You can think of iptables as the frontend and Netfilter as the backend. Instead of writing custom kernel modules for every packet manipulation requirement, iptables gives you a configuration layer on top of Netfilter's hooks — covering filtering, NAT, packet marking, logging, and more through a single, consistent interface. This architecture also explains something important for modern infrastructure: Netfilter is why kube-proxy can install `KUBE-SERVICES` chains and Calico can install `cali-*` chains and both coexist on the same node without conflict. They are both registering Netfilter hooks, at different priorities, on the same pipeline. Understanding Netfilter is understanding why modern Linux networking is composable — and occasionally very difficult to debug when something goes wrong. ## 5. How Packets Actually Move Through the System If there is one concept to internalize before writing a single rule, it is the packet flow model. Most production mistakes I have seen come directly from engineers who did not understand where in the flow their rule was being applied. ### Tables and Their Purpose Rules in iptables are organized into tables, each serving a different concern. The four you will realistically encounter are: | Table | Purpose | When You Use It | |----------|----------------------------|------------------------------------------------------------------------| | `filter` | Allow or deny packets | All standard firewalling. This is the default if you specify no table. | | `nat` | Address translation | Port forwarding, masquerading, DNAT for load balancing. | | `mangle` | Packet header modification | Changing TTL, marking packets for routing policy, QoS. | | `raw` | Bypass connection tracking | High-throughput scenarios where conntrack overhead is a problem. | ### Chains: Where Packets Enter the Pipeline Within each table, packets enter at a chain. The chain name reflects the hook in the network stack. `INPUT` handles packets destined for the local machine. `OUTPUT` handles packets generated by local processes. `FORWARD` handles packets routing through the machine to somewhere else — this is what kube-proxy writes to when it routes between pods. The critical mental model: a packet destined for your server hits `PREROUTING` first, then gets routed, and if it is for a local process it hits `INPUT`. A packet you are forwarding hits `PREROUTING`, then `FORWARD`, then `POSTROUTING`. Never `INPUT`, never `OUTPUT`. This is where engineers write rules in the wrong chain and then spend an hour wondering why traffic is not matching. Beyond directing packets to the correct chain, chains also serve as the primary tool for keeping rule sets manageable. A flat list of rules is easy to start with but painful to maintain — a badly placed broad `ACCEPT` can silently override a specific `DROP` you added three weeks later. iptables addresses this through **custom chains**: you can create named chains and call them from built-in chains using the `JUMP` (`-j`) target. This lets you group related rules logically — one chain for SSH policy, one for web traffic, one for database access — and jump into them conditionally: ```bash # Create a custom chain for SSH policy iptables -N SSH_POLICY # Add rules to the custom chain iptables -A SSH_POLICY -s 192.168.1.0/24 -j ACCEPT iptables -A SSH_POLICY -m recent --name ssh --rcheck --seconds 60 --hitcount 4 -j DROP iptables -A SSH_POLICY -j ACCEPT # Jump to it from INPUT iptables -A INPUT -p tcp --dport 22 -j SSH_POLICY ``` If no rule in a custom chain matches, the packet falls back to the calling chain and continues processing. This makes large rule sets significantly easier to reason about and review. ### Rules and Targets Each rule in a chain specifies match conditions — protocol, source IP, destination port, connection state — and a target, which is what happens when the packet matches. `ACCEPT` and `DROP` are the common ones. `REJECT` sends back an ICMP error. `LOG` writes a log entry and continues processing. Rules are evaluated in order, and the first match wins. That ordering is not optional — it is the entire logic of how the firewall behaves. ## 6. Practical DevOps Usage: What You Actually Need ### Protecting SSH Without Locking Yourself Out SSH access is both the most important thing to protect and the most dangerous rule to write incorrectly. The canonical failure mode: engineer sets default policy to DROP, forgets to allow port 22 first, saves the rules, and is now locked out of a cloud VM with no console access. The discipline is simple. Before you set a restrictive default policy, put the allow rules in first. Always test with the default policy still set to `ACCEPT`, verify your rules match, then change the policy. And add rate limiting to SSH from day one — it is not optional on anything internet-facing: ```bash # Rate-limit new SSH connections to 4 per minute per IP iptables -A INPUT -p tcp --dport 22 -m conntrack --ctstate NEW \ -m recent --name ssh --rcheck --seconds 60 --hitcount 4 -j DROP iptables -A INPUT -p tcp --dport 22 -m conntrack --ctstate NEW \ -m recent --name ssh --set -j ACCEPT ``` The `recent` module tracks per-IP hit counts in a kernel table. No external dependency, no daemon to manage. This alone eliminates a significant fraction of brute force exposure on SSH. ### Database Server Isolation Database servers are where misconfiguration is most costly. The correct design is to explicitly enumerate which source IPs are allowed on the database port, and drop everything else, rather than relying only on application-layer authentication: ```bash # Only these application servers may connect to PostgreSQL iptables -A INPUT -p tcp -s 192.168.10.5 --dport 5432 -j ACCEPT iptables -A INPUT -p tcp -s 192.168.10.6 --dport 5432 -j ACCEPT # Drop everything else attempting port 5432 — no REJECT, no logging, just drop iptables -A INPUT -p tcp --dport 5432 -j DROP ``` This pattern gives you defense in depth. Even if a database credential leaks, an attacker connecting from an unauthorized IP hits a wall before the authentication handshake begins. ### Kubernetes Node Hardening On a Kubernetes node, you are largely a passenger with respect to iptables — kube-proxy owns the `KUBE-*` chains and will overwrite rules it manages. What you can control is host-level traffic. Restrict `INPUT` on the host to kubelet API, SSH, and node-to-node ports; let your CNI handle pod-level policy. Do not try to manually manage rules in the same chains kube-proxy writes to. You will lose that battle. > **Operational Warning:** Never run `iptables -F` on a live Kubernetes node in production. It will flush the `KUBE-SERVICES` and `KUBE-FORWARD` chains and break all ClusterIP routing immediately. Pods will still run, but inter-service communication dies. Recovery requires restarting kube-proxy to rebuild its chains. ## 7. Security and Operational Best Practices ### Connection Tracking Is Not Optional Stateless packet filtering — matching only on IP, protocol, and port — was the approach before conntrack existed. You could make it work, but you had to write explicit rules for both directions of every connection. Connection tracking was a significant improvement: the kernel maintains a state table of active connections, and rules can match on state rather than re-specifying the whole connection pattern. The rule that should appear near the top of every INPUT chain: ```bash iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT iptables -A INPUT -m conntrack --ctstate INVALID -j DROP ``` The first line allows return traffic for connections your server already established outbound — outbound DNS queries, package downloads, API calls. The second drops malformed packets that do not match any tracked connection. Without these two rules, you will write rules that seem to allow outbound connections but responses never make it back, and you will spend real time debugging what is a fundamentally simple misconfiguration. ### Log What You Drop, Not Everything Logging every dropped packet on an internet-facing server will fill your disk and degrade performance. The useful logging strategy is targeted: log on custom chains for specific threat categories, rate-limit the log rule itself, and write to a dedicated file. A `LOG` target does not terminate processing — the packet continues to the next rule — so you can log and then drop cleanly: ```bash # Create a reusable chain that logs then drops iptables -N LOG_AND_DROP iptables -A LOG_AND_DROP -m limit --limit 5/min \ -j LOG --log-prefix "FW-DROP: " --log-level 4 iptables -A LOG_AND_DROP -j DROP ``` ### Rule Order Is Not a Detail Rules are evaluated top to bottom, first match wins. A broad `ACCEPT` rule before a narrow `DROP` rule means the `DROP` never fires. Engineers who learned iptables by adding rules incrementally often end up with chains where early rules silently supersede later ones. If a rule never shows packet hits in `iptables -L -v`, it is either never matched or shadowed by an earlier rule — both of which deserve investigation. ## 8. Performance and Scaling Considerations iptables performance becomes a real concern when rule sets grow large. Each packet traverses the chain linearly until it matches. A chain with five hundred rules means, in the worst case, five hundred comparisons per packet. This is not hypothetical — Kubernetes clusters with hundreds of services accumulate `KUBE-SERVICES` chain entries at exactly this rate, and it was a known scalability problem that motivated the development of IPVS mode for kube-proxy and eventually nftables. ### The Connection Tracking Table Limit The most operationally significant performance constraint is `nf_conntrack_max` — the maximum number of connections the kernel will track simultaneously. The default on many distributions is 65,536. On a high-traffic server or a Kubernetes node handling many short-lived pod connections, this fills up. When it does, new connections fail with cryptic errors and the kernel logs `nf_conntrack: table full, dropping packet`. Check your current table usage: ```bash # Current tracked connections vs maximum cat /proc/net/nf_conntrack | wc -l cat /proc/sys/net/netfilter/nf_conntrack_max # Increase limit and reduce TCP timeout for established connections sysctl -w net.netfilter.nf_conntrack_max=262144 sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=300 ``` ### Use ipset for Large IP Lists If you need to block or allow hundreds of IP addresses, do not write hundreds of iptables rules. Use `ipset` to create a hash table of IPs in the kernel, then match the set in a single rule. The performance difference is not marginal — an ipset lookup is O(1), a chain traversal is O(n). For geo-blocking or blocklist management, ipset is the only operationally sane approach: ```bash # Create and populate a blocklist set ipset create blocklist hash:ip ipset add blocklist 198.51.100.0/24 # One rule handles the entire list iptables -A INPUT -m set --match-set blocklist src -j DROP ``` ## 9. Common Mistakes That Will Ruin Your Day - **Forgetting persistence.** iptables rules live in kernel memory. A reboot or a service restart flushes everything. If you wrote the perfect rule set and did not run `iptables-save` or configure `iptables-persistent`, it is gone in the morning. This one has caught engineers who have been working with iptables for years. - **No conntrack rule, but a restrictive default policy.** The server accepts the connection, sends back SYN-ACK, then the response to the client's ACK hits INPUT and drops because there is no ESTABLISHED allow rule. The client sees a connection that never completes. Debugging this via tcpdump and iptables logs takes time that a three-line conntrack rule would have prevented. - **Testing rule changes in production without a fallback.** The safe approach: set a cron job to run `iptables -P INPUT ACCEPT && iptables -F` every ten minutes before you start, then delete the cron job after you verify access. If you lock yourself out, the cron fires and saves you. This feels paranoid until the one time it matters. - **Assuming cloud security groups replace host firewalls.** They are different layers. A cloud security group blocks traffic before it reaches your instance. iptables operates on traffic after it arrives at the kernel. For internal VPC traffic, or for Kubernetes pod-to-pod communication, the cloud security group may not be enforcing what you expect. - **Writing DROP rules instead of logging first.** When debugging a service that cannot reach another service, if you have no logging on your DROP rules you have no visibility. Write your rules to log first, verify the log shows what you expect, then convert to drop. Debugging is much faster when you have evidence. ## 10. iptables in Modern Infrastructure The question engineers ask is: if we have cloud security groups, Kubernetes NetworkPolicies, and service meshes, why do I still need to understand iptables? The answer is that none of those things replace it at the kernel level — they either implement their policies using it, or they sit at a different layer entirely. Kubernetes NetworkPolicies, depending on your CNI plugin, are often implemented as iptables rules. Calico translates NetworkPolicy objects into iptables chains. Flannel uses iptables for masquerading egress traffic. Even with Cilium — which uses eBPF — the iptables rules do not entirely disappear from a cluster running in a compatibility mode. Docker builds network bridges and masquerade rules using iptables every time a container starts. You can verify this yourself on any Docker host by comparing `iptables -L -t nat` before and after `docker run`. The practical implication: when something in your container or Kubernetes networking is broken, knowing how to read and interpret iptables rules is a core debugging skill. The engineer who can look at `iptables -L -v -n` output and trace where a packet is being dropped will find the problem in minutes. The engineer who cannot will escalate it as a "network issue" and wait for someone else. ## 11. When to Move Past iptables iptables has real limitations that are worth being honest about. The linear rule traversal model does not scale well past a few hundred rules in a chain. The atomic rule management is difficult — you cannot update the rule set transactionally, so there are brief windows during rule replacement where the firewall state is inconsistent. The syntax is arcane enough that rule review in code review is error-prone. **nftables** is the direct successor, built on the same Netfilter hooks but with a cleaner rule language and better performance characteristics. It ships with all major current distributions and is what `iptables` calls transparently on some systems via a compatibility layer. If you are designing a new system from scratch and you have the option, nftables is the correct choice. The concepts transfer almost directly. **Cloud security groups** are appropriate for external perimeter control on cloud VMs. They operate before traffic reaches the instance and have no per-packet processing overhead at the OS level. Use them. But do not use them as a substitute for host-level firewalling — they are a different layer. **eBPF-based solutions** like Cilium operate at a fundamentally different performance level and allow policy enforcement with much finer granularity and lower latency than iptables. On large Kubernetes clusters running thousands of services, the difference in kube-proxy behavior between iptables mode and eBPF mode is measurable. But eBPF is a newer operational surface — if your team does not have experience debugging eBPF programs, adding that dependency to your production stack requires honest assessment. For most teams managing ten to a hundred Linux servers, a mix of cloud security groups at the perimeter and iptables (or nftables) on the host is still the right answer. It is well-understood, well-documented, and the operational tooling is mature. Adopt newer technology when you have a specific, measurable problem that the new technology solves — not because it is newer. ## 12. The Engineering Takeaway iptables is powerful, genuinely useful, and still operating under most Linux infrastructure whether you interact with it directly or not. The engineers who get burned by it are rarely the ones who spent time understanding the packet flow model. They are the ones who copied rules from Stack Overflow without understanding what the rules were doing, or who assumed their cloud security group made host firewalling unnecessary, or who never thought about what happens to their firewall rules after a reboot. The discipline iptables rewards is systematic thinking about network state: what traffic should be allowed, where rules are applied in the packet flow, what happens when an assumption is wrong. That discipline transfers to every other network security tool in your stack. Whether you end up managing iptables directly, debugging Kubernetes CNI behavior, or tuning nftables rules for a high-throughput router, the mental model you build here applies. Understand the flow. Write rules with intent. Test before you trust. Log before you drop. And always, before you change the default policy to DROP, make sure your SSH allow rule is already in place. --- _Vladimiras Levinas is a Lead DevOps Engineer with 18+ years in fintech infrastructure. He runs a production K3s homelab and writes about AI infrastructure at doc.thedevops.dev_