#docker
---
## Introduction to Docker Swarm {#introduction}
Docker Swarm is Docker's native container orchestration platform that allows you to manage a cluster of Docker engines as a single virtual system. It enables you to deploy, scale, and manage containerized applications across multiple hosts.
### Why Use Docker Swarm?
- **High Availability**: Distribute services across multiple nodes
- **Load Balancing**: Built-in load balancing and service discovery
- **Scaling**: Easy horizontal scaling of services
- **Rolling Updates**: Zero-downtime deployments
- **Simple Setup**: Easier to learn than Kubernetes
- **Declarative Model**: Define desired state, Swarm maintains it
### Swarm vs Kubernetes
**Docker Swarm Advantages:**
- Simpler learning curve
- Faster setup and deployment
- Built into Docker (no additional installation)
- Less resource overhead
- Better for small to medium deployments
**Kubernetes Advantages:**
- More features and flexibility
- Larger ecosystem
- Better for complex, large-scale deployments
- More community support
---
## Architecture & Concepts {#architecture}
### Node Types
**Manager Nodes:**
- Orchestrate and manage the cluster
- Maintain cluster state using Raft consensus
- Schedule services to worker nodes
- Handle API requests
- Minimum 1 required, 3-5 recommended for production
**Worker Nodes:**
- Execute containers (tasks)
- Report status back to managers
- Can be promoted to managers
### Key Concepts
**Service**: Definition of tasks to execute (like a container specification) **Task**: Single container running on a node **Stack**: Collection of services defined in a compose file **Overlay Network**: Multi-host network for service communication **Ingress Network**: Built-in load balancer for published ports
---
## Setting Up a Swarm Cluster {#setup}
### Initialize a Swarm (Manager Node)
```bash
# Initialize swarm on the manager node
docker swarm init --advertise-addr 192.168.1.100
# Output includes join tokens for workers and managers
# To add a worker to this swarm, run the following command:
# docker swarm join --token SWMTKN-1-xxx 192.168.1.100:2377
```
### Join Nodes to the Swarm
```bash
# On worker nodes, use the token from init output
docker swarm join --token SWMTKN-1-xxx 192.168.1.100:2377
# Get join token for workers (run on manager)
docker swarm join-token worker
# Get join token for managers (run on manager)
docker swarm join-token manager
# Add a node as a manager
docker swarm join --token SWMTKN-1-yyy 192.168.1.100:2377
```
### View Cluster Information
```bash
# List all nodes in the swarm
docker node ls
# Inspect a specific node
docker node inspect node-1
# View node details in readable format
docker node inspect --pretty node-1
# Check swarm status
docker info | grep -A 10 Swarm
```
### Node Management
```bash
# Promote a worker to manager
docker node promote worker-1
# Demote a manager to worker
docker node demote manager-2
# Set node availability
docker node update --availability drain node-1 # Stop scheduling new tasks
docker node update --availability active node-1 # Resume scheduling
docker node update --availability pause node-1 # Pause (existing tasks stay)
# Add labels to nodes
docker node update --label-add type=database node-1
docker node update --label-add environment=production node-2
# Remove a node from swarm
docker node rm node-1
# Leave swarm (run on the node leaving)
docker swarm leave
# Force manager to leave
docker swarm leave --force
```
---
## Service Management {#services}
### Creating Services
```bash
# Create a simple service
docker service create --name web nginx:alpine
# Create with multiple replicas
docker service create --name web --replicas 3 nginx:alpine
# Create with published ports
docker service create --name web --publish 80:80 nginx:alpine
# Create with environment variables
docker service create \
--name app \
--env NODE_ENV=production \
--env DB_HOST=database \
node:18-alpine
# Create with resource limits
docker service create \
--name app \
--limit-cpu 0.5 \
--limit-memory 512M \
--reserve-cpu 0.25 \
--reserve-memory 256M \
node:18-alpine
# Create with volume mount
docker service create \
--name web \
--mount type=volume,source=web-data,target=/usr/share/nginx/html \
nginx:alpine
# Create with placement constraints
docker service create \
--name db \
--constraint 'node.labels.type==database' \
postgres:15-alpine
# Create with network
docker service create \
--name web \
--network my-overlay-net \
nginx:alpine
```
### Service Operations
```bash
# List all services
docker service ls
# Inspect a service
docker service inspect web
docker service inspect --pretty web
# View service logs
docker service logs web
docker service logs -f web # Follow logs
docker service logs --tail 100 web # Last 100 lines
# List tasks (containers) for a service
docker service ps web
docker service ps --filter desired-state=running web
# Scale a service
docker service scale web=5
docker service scale web=5 app=3 db=1
# Update service configuration
docker service update --image nginx:1.25 web
docker service update --replicas 5 web
docker service update --env-add NEW_VAR=value web
docker service update --publish-add 8080:8080 web
# Remove a service
docker service rm web
```
### Service Modes
```bash
# Replicated mode (default) - specified number of replicas
docker service create --name web --replicas 3 nginx:alpine
# Global mode - one task per node
docker service create --name monitor --mode global node-exporter
```
---
## Networking {#networking}
### Overlay Networks
```bash
# Create overlay network
docker network create --driver overlay my-network
# Create attachable overlay (allows standalone containers)
docker network create --driver overlay --attachable my-network
# Create encrypted overlay network
docker network create --driver overlay --opt encrypted my-secure-net
# Create with custom subnet
docker network create \
--driver overlay \
--subnet 10.0.9.0/24 \
--gateway 10.0.9.1 \
my-network
# List networks
docker network ls
# Inspect network
docker network inspect my-network
# Connect service to network
docker service update --network-add my-network web
# Disconnect service from network
docker service update --network-rm my-network web
# Remove network
docker network rm my-network
```
### Ingress Network
The ingress network provides:
- Load balancing across service replicas
- External access to services
- Routing mesh (any node can handle requests)
```bash
# Publish port (uses ingress by default)
docker service create --name web --publish 80:80 nginx:alpine
# Publish in host mode (bypass ingress)
docker service create \
--name web \
--publish published=80,target=80,mode=host \
nginx:alpine
```
### Service Discovery
Services can communicate using service names:
```bash
# Create backend network
docker network create --driver overlay backend
# Create database service
docker service create \
--name database \
--network backend \
postgres:15-alpine
# Create app that connects to "database" hostname
docker service create \
--name app \
--network backend \
--env DB_HOST=database \
myapp:latest
```
---
## Storage & Volumes {#storage}
### Volume Types
```bash
# Named volume (managed by Docker)
docker service create \
--name db \
--mount type=volume,source=db-data,target=/var/lib/postgresql/data \
postgres:15-alpine
# Bind mount (host directory)
docker service create \
--name web \
--mount type=bind,source=/host/path,target=/container/path \
nginx:alpine
# tmpfs mount (memory-based)
docker service create \
--name cache \
--mount type=tmpfs,target=/cache,tmpfs-size=1G \
redis:alpine
```
### Volume Drivers
```bash
# Create volume
docker volume create --driver local my-volume
# Use NFS volume
docker volume create \
--driver local \
--opt type=nfs \
--opt o=addr=192.168.1.200,rw \
--opt device=:/path/to/dir \
nfs-volume
# Use with service
docker service create \
--name web \
--mount type=volume,source=nfs-volume,target=/data \
nginx:alpine
```
---
## Security {#security}
### Secrets Management
```bash
# Create secret from file
docker secret create db_password /path/to/password.txt
# Create secret from stdin
echo "my-secret-password" | docker secret create db_password -
# List secrets
docker secret ls
# Inspect secret (doesn't show value)
docker secret inspect db_password
# Use secret in service
docker service create \
--name app \
--secret db_password \
myapp:latest
# Secret available at: /run/secrets/db_password
# Use secret with custom target
docker service create \
--name app \
--secret source=db_password,target=/app/password.txt \
myapp:latest
# Remove secret
docker secret rm db_password
# Update service with new secret
docker service update \
--secret-rm db_password \
--secret-add db_password_v2 \
app
```
### Configs Management
```bash
# Create config from file
docker config create nginx.conf /path/to/nginx.conf
# Create config from stdin
cat config.json | docker config create app_config -
# List configs
docker config ls
# Inspect config (shows value)
docker config inspect app_config
# Use config in service
docker service create \
--name web \
--config source=nginx.conf,target=/etc/nginx/nginx.conf \
nginx:alpine
# Remove config
docker config rm app_config
```
### Lock Swarm
```bash
# Enable autolock (protects encryption keys)
docker swarm update --autolock=true
# Unlock swarm after restart
docker swarm unlock
# View unlock key
docker swarm unlock-key
# Rotate unlock key
docker swarm unlock-key --rotate
```
---
## Scaling & Load Balancing {#scaling}
### Manual Scaling
```bash
# Scale service to 5 replicas
docker service scale web=5
# Scale multiple services
docker service scale web=5 app=3 worker=10
# Scale to 0 (pause service)
docker service scale web=0
```
### Auto-Placement
```bash
# Spread replicas evenly across nodes
docker service create \
--name web \
--replicas 6 \
--placement-pref 'spread=node.labels.datacenter' \
nginx:alpine
# Constraint to specific nodes
docker service create \
--name db \
--replicas 1 \
--constraint 'node.labels.type==database' \
--constraint 'node.role==worker' \
postgres:15-alpine
```
### Load Balancing
Swarm provides built-in load balancing through the ingress network:
```bash
# Create service with load balancing
docker service create \
--name web \
--replicas 3 \
--publish 80:80 \
nginx:alpine
# Requests to any node on port 80 are load-balanced across all replicas
# Even nodes without replicas can handle requests (routing mesh)
```
---
## Updates & Rollbacks {#updates}
### Rolling Updates
```bash
# Update service image
docker service update --image nginx:1.25 web
# Update with parallelism and delay
docker service update \
--image nginx:1.25 \
--update-parallelism 2 \
--update-delay 10s \
web
# Update with failure handling
docker service update \
--image nginx:1.25 \
--update-failure-action rollback \
--update-max-failure-ratio 0.2 \
web
# Configure update at creation
docker service create \
--name web \
--replicas 6 \
--update-parallelism 2 \
--update-delay 10s \
--update-failure-action rollback \
nginx:alpine
```
### Rollback
```bash
# Rollback to previous version
docker service rollback web
# Automatic rollback on failure
docker service update \
--image nginx:broken \
--update-failure-action rollback \
web
```
### Health Checks
```bash
# Add health check
docker service create \
--name web \
--health-cmd "curl -f http://localhost/ || exit 1" \
--health-interval 30s \
--health-timeout 10s \
--health-retries 3 \
nginx:alpine
# Health check in Dockerfile
# HEALTHCHECK --interval=30s --timeout=3s \
# CMD curl -f http://localhost/ || exit 1
```
---
## Monitoring & Troubleshooting {#monitoring}
### Monitoring Commands
```bash
# View service status
docker service ls
docker service ps web
# View service logs
docker service logs -f web
# View node status
docker node ls
# Inspect service details
docker service inspect web
# View events
docker events --filter 'type=service'
docker events --filter 'type=node'
# View system-wide info
docker system df
docker system events
```
### Troubleshooting
```bash
# Find failed tasks
docker service ps --filter desired-state=running web
docker service ps --filter desired-state=shutdown web
# Get detailed task info
docker inspect <task-id>
# View container logs on specific node
# SSH to the node, then:
docker logs <container-id>
# Check node resources
docker node inspect node-1 --format '{{.Status.State}}'
docker node inspect node-1 --format '{{.Description.Resources}}'
# Drain node for maintenance
docker node update --availability drain node-1
# Common issues and solutions:
# Issue: Service not starting
# Check: docker service ps web (look for error messages)
# Check: docker service logs web
# Verify: Image exists and is accessible
# Verify: Resource constraints are met
# Issue: Port already in use
# Check: docker service ps --filter published=80
# Solution: Use different port or remove conflicting service
# Issue: Node not joining
# Check: Firewall rules (2377, 7946, 4789)
# Check: Token is correct
# Verify: Time synchronization across nodes
```
### Visualizer
```bash
# Deploy visualizer to see swarm state
docker service create \
--name viz \
--publish 8080:8080 \
--constraint 'node.role==manager' \
--mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock \
dockersamples/visualizer:stable
# Access at http://manager-ip:8080
```
---
## Real-World Examples {#examples}
### Example 1: Complete Web Application Stack
```yaml
# stack.yml
version: '3.8'
services:
# Reverse Proxy
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
deploy:
replicas: 2
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: on-failure
networks:
- frontend
configs:
- source: nginx_config
target: /etc/nginx/nginx.conf
volumes:
- nginx-logs:/var/log/nginx
# Application
app:
image: mycompany/myapp:latest
deploy:
replicas: 4
resources:
limits:
cpus: '0.5'
memory: 512M
update_config:
parallelism: 2
delay: 10s
failure_action: rollback
placement:
constraints:
- node.role == worker
networks:
- frontend
- backend
environment:
- NODE_ENV=production
- DB_HOST=postgres
- REDIS_HOST=redis
secrets:
- db_password
- api_key
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
# Database
postgres:
image: postgres:15-alpine
deploy:
replicas: 1
placement:
constraints:
- node.labels.database == true
restart_policy:
condition: on-failure
networks:
- backend
environment:
- POSTGRES_DB=appdb
- POSTGRES_USER=appuser
secrets:
- db_password
volumes:
- db-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U appuser"]
interval: 10s
timeout: 5s
retries: 5
# Cache
redis:
image: redis:alpine
deploy:
replicas: 1
placement:
constraints:
- node.labels.cache == true
networks:
- backend
command: redis-server --appendonly yes
volumes:
- redis-data:/data
# Worker Queue
worker:
image: mycompany/myapp:latest
deploy:
replicas: 3
resources:
limits:
cpus: '1.0'
memory: 1024M
networks:
- backend
environment:
- WORKER_MODE=true
- REDIS_HOST=redis
secrets:
- api_key
command: npm run worker
networks:
frontend:
driver: overlay
backend:
driver: overlay
internal: true
volumes:
db-data:
redis-data:
nginx-logs:
secrets:
db_password:
external: true
api_key:
external: true
configs:
nginx_config:
external: true
```
**Deploy the stack:**
```bash
# Create secrets
echo "secure-db-password" | docker secret create db_password -
echo "api-key-12345" | docker secret create api_key -
# Create config
docker config create nginx_config nginx.conf
# Label nodes
docker node update --label-add database=true node-1
docker node update --label-add cache=true node-2
# Deploy stack
docker stack deploy -c stack.yml myapp
# View stack status
docker stack ps myapp
docker stack services myapp
# Scale specific service
docker service scale myapp_app=6
# Update service
docker service update --image mycompany/myapp:v2 myapp_app
# View logs
docker service logs -f myapp_app
# Remove stack
docker stack rm myapp
```
### Example 2: Monitoring Stack
```yaml
# monitoring-stack.yml
version: '3.8'
services:
# Prometheus
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
deploy:
replicas: 1
placement:
constraints:
- node.role == manager
networks:
- monitoring
volumes:
- prometheus-data:/prometheus
configs:
- source: prometheus_config
target: /etc/prometheus/prometheus.yml
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
# Grafana
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
deploy:
replicas: 1
networks:
- monitoring
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana-data:/var/lib/grafana
# Node Exporter (on every node)
node-exporter:
image: prom/node-exporter:latest
deploy:
mode: global
networks:
- monitoring
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($|/)'
# cAdvisor (container metrics)
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
deploy:
mode: global
networks:
- monitoring
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
privileged: true
networks:
monitoring:
driver: overlay
volumes:
prometheus-data:
grafana-data:
configs:
prometheus_config:
external: true
```
### Example 3: CI/CD Pipeline Integration
```bash
# deploy.sh - Automated deployment script
#!/bin/bash
set -e
# Variables
STACK_NAME="myapp"
IMAGE_TAG="${CI_COMMIT_SHA:-latest}"
REGISTRY="myregistry.com"
IMAGE="${REGISTRY}/myapp:${IMAGE_TAG}"
echo "Deploying ${IMAGE}..."
# Build and push image (in CI)
# docker build -t ${IMAGE} .
# docker push ${IMAGE}
# Update service with new image
docker service update \
--image ${IMAGE} \
--update-parallelism 2 \
--update-delay 10s \
--update-failure-action rollback \
${STACK_NAME}_app
# Wait for update to complete
echo "Waiting for update to complete..."
sleep 5
# Check deployment status
UPDATED=$(docker service ps ${STACK_NAME}_app \
--filter "desired-state=running" \
--format "{{.Image}}" | grep ${IMAGE_TAG} | wc -l)
REPLICAS=$(docker service ls \
--filter "name=${STACK_NAME}_app" \
--format "{{.Replicas}}" | cut -d'/' -f2)
if [ "$UPDATED" -eq "$REPLICAS" ]; then
echo "✓ Deployment successful!"
exit 0
else
echo "✗ Deployment failed!"
docker service ps ${STACK_NAME}_app
exit 1
fi
```
---
## Best Practices
### Production Recommendations
1. **Manager Nodes**: Use 3 or 5 managers (odd number for quorum)
2. **Separate Roles**: Don't run workloads on manager nodes
3. **Resource Management**: Always set resource limits and reservations
4. **Health Checks**: Implement health checks for all services
5. **Secrets**: Use Docker secrets for sensitive data
6. **Networks**: Use overlay networks for service isolation
7. **Updates**: Configure rolling updates with health checks
8. **Monitoring**: Deploy monitoring stack (Prometheus + Grafana)
9. **Backups**: Regularly backup swarm state and volumes
10. **Testing**: Test updates in staging environment first
### Security Best Practices
1. Enable swarm autolock
2. Use TLS for node communication (automatic)
3. Rotate join tokens regularly
4. Use secrets for sensitive data
5. Implement network segmentation
6. Keep Docker and OS updated
7. Use minimal base images (Alpine)
8. Scan images for vulnerabilities
9. Implement RBAC with Docker EE
10. Monitor and audit access logs
---
## Common Commands Cheatsheet
```bash
# Swarm Management
docker swarm init
docker swarm join
docker swarm leave
docker node ls
docker node inspect
docker node update
# Service Management
docker service create
docker service ls
docker service ps
docker service logs
docker service inspect
docker service update
docker service scale
docker service rm
# Stack Management
docker stack deploy
docker stack ls
docker stack ps
docker stack services
docker stack rm
# Network Management
docker network create --driver overlay
docker network ls
docker network inspect
# Secret Management
docker secret create
docker secret ls
docker secret inspect
docker secret rm
# Config Management
docker config create
docker config ls
docker config inspect
docker config rm
```
---
## Conclusion
Docker Swarm provides a powerful yet simple way to orchestrate containers across multiple hosts. Its native integration with Docker, ease of setup, and built-in features make it an excellent choice for small to medium-scale deployments.
Key takeaways:
- Start small and scale as needed
- Use declarative configuration with stacks
- Implement proper monitoring and logging
- Follow security best practices
- Test updates before production deployment
- Keep your cluster and images updated
For more complex orchestration needs or larger scale, consider Kubernetes, but Docker Swarm remains a solid choice for many production workloads.