I feel as though a lot of systems are overcomplicating things with the could. For a small SaaS to run, you only really need a few things done right:
A lot of this can be done on bare metal at a fraction of cost it would take to run on the cloud and with better results. I’ve actually written two prior blog posts which now redirecting here, which were precursors to getting to this final and clean setup.
The other two relied heavily on bash, spinning up services on different ports, then using sed to update nginx. It worked, but if the service were to ever go down, once on rebooted it wouldn’t start the service with the correct port. This wasn’t acceptable. It also felt too fiddly for something which should be idiot proof. Downtime is not an option.
So I’ve landed on Docker Compose, Traefik and Docker Rollout.
This all runs on two bare metal servers, behind an LB. This is how it works: Traefik as a reverse proxy, Docker Compose for orchestration, and docker-rollout for zero-downtime updates.
Three components:
Everything’s in one docker-compose.yml with two services: Traefik and the app.
Instead of docker-compose restart (which drops requests), I use docker-rollout:
docker rollout -f docker-compose.yml app
This starts a new container, waits for it to be healthy, then stops the old one. Traefik continues routing traffic the whole time.
Traefik handles TLS termination and routing. Certs live in ./certs/, and container labels define the routing:
app:
labels:
- "traefik.enable=true"
- "traefik.http.routers.app.rule=Host(`<hostname.com>`)"
- "traefik.http.routers.app.entrypoints=websecure"
- "traefik.http.services.app.loadbalancer.server.port=<application port>"
The app docker image listens only on the local interface; Traefik decrypts HTTPS on 443 and forwards plain HTTP to it. I set memory limits and read-only filesystem to keep the reverse proxy lightweight and hardened.
The service is hosted with Hetzner, so I have 4GB ram per instance (8GB total). Despite this, Traefik uses ~50mb ram, and the application uses ~20MB ram. Lots of room to grow but at a fraction of the cost of other cloud providers.
By using Traefik and Docker Compose, I know if the service goes down or restarts, both the web server and application will come back online. I also know that if we run OOM, or into any other issues, the service will come back online, and I’ll get alerts through my observability.
The following is really all that’s needed to run a zero-downtime service.
networks:
web:
services:
traefik:
image: traefik:v3.0
command:
- --providers.docker=true
- --providers.docker.exposedbydefault=false
- --providers.file.directory=/certs
- --entrypoints.websecure.address=:443
ports:
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./certs:/certs:ro
networks:
- web
restart: unless-stopped
mem_limit: 128m
mem_reservation: 64m
cpus: "0.25"
pids_limit: 64
read_only: true
security_opt:
- no-new-privileges:true
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
app:
image: <image name>
networks:
- web
restart: unless-stopped
mem_limit: 512m
mem_reservation: 256m
cpus: "0.75"
pids_limit: 128
read_only: true
security_opt:
- no-new-privileges:true
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
labels:
- "traefik.enable=true"
- "traefik.http.routers.app.rule=Host(`<service.com>`)"
- "traefik.http.routers.app.entrypoints=websecure"
- "traefik.http.routers.app.tls=true"
- "traefik.http.services.app.loadbalancer.server.port=<port>"
I make use of Docker Rollout, which allows me to select which docker instance i want to scale up and replace, before taking the old instance out.
Deployments are as straight forward as building an image in your CI pipeline, then having each server run docker rollout -f docker-compose.yml app. That’s it.
Each deployment is now an automated deployment, which is easily triggered and reproducible.
The beauty of using docker compose is that the whole process is easy to replicate.
We’re only using ~200mb of 8GB so far - so this won’t be an issue for a long while - but if we needed to grow further:
1. Provision a new server
2. Configure it to be behind the firewall
3. SCP the docker compose file and the server hardening script
4. docker-compose up -d
5. Add it behind the LB.
That’s it.