Part 6 of the Docker Roadmap Series
Alright, enough playing around with other people’s containers. You’ve been living like a digital tenant, running pre-built images that someone else cobbled together, probably with more bloat than a Thanksgiving dinner and security holes you could drive a truck through.
It’s time to graduate from being a Docker consumer to a Docker producer. Time to build your own images that are lean, mean, and tailored exactly to your needs. No more “it works on my machine” excuses, no more inheriting someone else’s questionable life choices baked into their base image.
The Hidden Costs of Pre-Built Images
Before we dive into building our own images, let’s talk about why you should be building your own instead of just grabbing whatever looks convenient from Docker Hub.
That node:latest
image you’ve been using? It’s probably 900MB of bloated nonsense that includes:
- A full Ubuntu/Debian base system with packages you’ll never use
- Multiple versions of build tools “just in case”
- Debug symbols for every library known to mankind
- Documentation that takes up more space than your actual application
- Vulnerabilities from packages that were outdated the moment the image was built
Let’s see this bloat in action:
# Pull a few "convenient" images and check their sizes
docker pull node:latest
docker pull python:latest
docker pull openjdk:latest
# Now check the damage
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"
# Expected output (your mileage may vary):
# REPOSITORY TAG SIZE
# node latest 993MB # Nearly a gigabyte for Node.js!
# python latest 1.02GB # Python shouldn't need a GB
# openjdk latest 471MB # Java, but still chunky
993MB for Node.js? Are you kidding me? Node.js is supposed to be lightweight! That’s like buying a sports car and getting a delivery truck.
To be fair, these images aim for broad compatibility, including tools and libraries for every possible use case, not just yours this is why there are not lightweight.
Your First Dockerfile: The Recipe for Sanity
A Dockerfile is like a recipe, but instead of making cookies, you’re making a reproducible, portable environment for your application. And just like a recipe, if you mess it up, everyone who tries to use it will know exactly who to blame.
Let’s start with the simplest possible Dockerfile - one that actually makes sense:
# This is a comment. Use them liberally or you'll forget what you were thinking.
# We start with Alpine Linux because it's tiny and gets the job done
FROM alpine:3.18
# Set a working directory so we're not dumping files all over the place
WORKDIR /app
# Install only what we absolutely need - no kitchen sink approach
RUN apk add --no-cache nodejs npm
# Copy your application files
COPY package.json package-lock.json ./
RUN npm ci --only=production
# Copy the rest of your application
COPY . .
# Tell Docker what port your app uses (this is documentation, not magic)
EXPOSE 3000
# Define what command runs when the container starts
CMD ["node", "server.js"]
Let’s build this and see the difference:
# Build your image (don't forget the dot at the end!)
docker build -t my-lean-node-app .
# Compare the sizes
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}" | grep -E "(node|my-lean)"
# Expected output:
# my-lean-node-app latest 45MB # Much better!
# node latest 993MB # The bloated monster
45MB vs 993MB. That’s a 95% reduction in size. Your deployment pipeline just got 20x faster, and your disk space thanks you.
What is Alpine Linux? Alpine Linux is a super lightweight, security-oriented Linux distribution commonly used in Docker images. It’s designed to be minimal (just ~5 MB) while still being functional, making it ideal for building small, fast, and secure containers.
Understanding Dockerfile Instructions (The Tools of the Trade)
Let’s break down the most important Dockerfile instructions, because understanding these is the difference between a functioning image and a digital disaster:
FROM: Choose Your Foundation Wisely
# Bad: Using a massive base image
FROM ubuntu:latest # ~77MB base, but grows fast with packages
# Better: Using a smaller base
FROM alpine:3.18 # ~7MB base, minimal but functional
# Best: Using a specific, minimal base for your language (recommended for most Node.js apps)
FROM node:18-alpine # ~40MB base with Node.js already installed and optimized
Pro tip: Never use latest
tags in production Dockerfiles. latest
is a lie - it’s whatever the maintainer decided to tag as latest, and it changes without warning. Use specific versions like 3.18
or 18-alpine
.
WORKDIR: Don’t Be Messy
# Bad: Dumping files everywhere like a digital pack rat
COPY . /
# Good: Having a dedicated workspace
WORKDIR /app
COPY . .
WORKDIR
sets the working directory for subsequent commands. Think of it as cd /app
but permanent. All your RUN
, COPY
, and CMD
instructions will execute from this directory.
RUN: Execute Commands During Build
# Bad: Multiple RUN commands create multiple layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git
RUN apt-get clean
# Good: Chain commands to minimize layers
RUN apt-get update && \
apt-get install -y curl git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
Each RUN
instruction creates a new layer in your image. More layers = bigger image. Chain related commands together with &&
to keep your image lean.
COPY vs ADD: Know the Difference
# COPY: Simple file copying (use this 99% of the time)
COPY package.json ./
COPY src/ ./src/
# ADD: Like COPY but with superpowers (auto-extracts archives, can fetch URLs)
ADD https://example.com/file.tar.gz /tmp/ # Downloads and extracts
ADD archive.tar.gz /tmp/ # Extracts automatically
Rule of thumb: Use COPY
unless you specifically need ADD
’s extra features. COPY
is explicit and predictable, ADD
is magical and can surprise you.
EXPOSE: Documentation, Not Magic
# This documents that your app uses port 3000
EXPOSE 3000
# This does NOT automatically publish the port!
# You still need -p when running: docker run -p 3000:3000 my-app
EXPOSE
is pure documentation. It tells other developers (and your future confused self) what ports your application uses, but it doesn’t actually publish them.
CMD vs ENTRYPOINT: The Startup Drama
# CMD: Default command, easily overridden
CMD ["node", "server.js"]
# docker run my-app # Runs: node server.js
# docker run my-app echo hi # Runs: echo hi (CMD overridden)
# ENTRYPOINT: Always runs, arguments get appended
ENTRYPOINT ["node"]
CMD ["server.js"]
# docker run my-app # Runs: node server.js
# docker run my-app app.js # Runs: node app.js (CMD replaced, ENTRYPOINT preserved)
Use CMD
for simple applications. Use ENTRYPOINT
+ CMD
when you want to ensure your main executable always runs but allow flexibility in arguments.
Multi-Stage Builds: The Swiss Army Knife
Here’s where things get interesting. Multi-stage builds let you use multiple FROM
instructions in a single Dockerfile, allowing you to build your application in one stage and create a minimal runtime image in another.
This is like having a fully equipped workshop to build your furniture, but only shipping the finished product - not the entire workshop.
# Stage 1: Build stage (the messy workshop)
FROM node:18-alpine AS builder
WORKDIR /app
# Install ALL dependencies (including dev dependencies for building)
COPY package.json package-lock.json ./
RUN npm ci
# Copy source code and build
COPY . .
RUN npm run build
# Stage 2: Production stage (the clean, minimal result)
FROM node:18-alpine AS production
WORKDIR /app
# Install only production dependencies
COPY package.json package-lock.json ./
RUN npm ci --only=production && npm cache clean --force
# Copy built application from the builder stage
COPY --from=builder /app/dist ./dist
# Create a non-root user (security best practice)
RUN addgroup -g 1001 -S nodejs && \
adduser -S nextjs -u 1001
USER nextjs
EXPOSE 3000
CMD ["node", "dist/server.js"]
Let’s see the magic:
# Build the multi-stage image
docker build -t my-multistage-app .
# Check the size
docker images my-multistage-app
# Expected: Much smaller than if we included all build dependencies
The build stage includes all the heavy build tools, TypeScript compilers, webpack, and other development dependencies. The production stage only includes the runtime and your built application. It’s like cooking a meal with all your kitchen equipment but only serving the food.
Layer Caching: The Performance Game Changer
Docker builds images in layers, and it’s smart enough to cache layers that haven’t changed. Understanding this is crucial for fast builds and maintaining your sanity during development.
The Wrong Way (Cache-Busting Nightmare)
# Bad: This busts the cache every time ANY file changes
FROM node:18-alpine
WORKDIR /app
COPY . . # Copies everything, cache busts on ANY change
RUN npm install # Reinstalls ALL packages on every build
CMD ["node", "server.js"]
Every time you change a single line of code, Docker has to reinstall all your dependencies. This is like rebuilding your entire house because you changed a light bulb.
The Right Way (Cache-Friendly)
# Good: Optimized for caching
FROM node:18-alpine
WORKDIR /app
# Step 1: Copy only dependency files first
COPY package.json package-lock.json ./
RUN npm ci --only=production # This layer only rebuilds if dependencies change
# Step 2: Copy application code after dependencies
COPY . . # This layer rebuilds on code changes, but deps are cached
CMD ["node", "server.js"]
Now when you change your application code, Docker reuses the cached dependency layer and only rebuilds from the COPY . .
step forward.
.dockerignore: The Unsung Hero
Just like .gitignore
keeps junk out of your Git repo, .dockerignore
keeps junk out of your Docker build context:
# .dockerignore file
node_modules
npm-debug.log
.git
.gitignore
README.md
.env
.nyc_output
coverage
.nyc_output
Dockerfile
docker-compose.yml
*.md
Without this file, Docker copies everything to the build context, including your massive node_modules
directory, .git
history, and other files that have no business being in your image.
Image Size Optimization: Every Byte Counts
Large images are slow to build, slow to deploy, and expensive to store. Here are battle-tested techniques to keep your images lean:
1. Use Alpine Linux Base Images
# Instead of this heavyweight champion:
FROM node:18 # ~993MB
# Use this lightweight contender:
FROM node:18-alpine # ~40MB
Alpine Linux uses musl
libc instead of glibc
and apk
instead of apt
. It’s minimal but functional.
2. Multi-Stage Builds for Compiled Languages
# Go application with multi-stage build
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o main .
# Final stage: just the binary
FROM alpine:3.18
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/main .
CMD ["./main"]
The final image only contains the compiled binary, not the entire Go toolchain.
3. Remove Package Managers After Installation
# Bad: Leaves package manager cache and metadata
RUN apt-get update && apt-get install -y curl
# Good: Cleans up after itself
RUN apt-get update && \
apt-get install -y curl && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
Package managers leave behind caches, metadata, and temporary files. Clean up after yourself.
4. Use Specific Package Versions
# Bad: Installs latest versions (unpredictable, potentially large)
RUN apk add --no-cache nodejs npm
# Good: Pins specific versions
RUN apk add --no-cache nodejs=18.19.0-r0 npm=10.2.4-r0
This ensures reproducible builds and prevents surprise updates that might break your application.
Security Best Practices: Don’t Be a Sitting Duck
Security in Docker isn’t an afterthought, it’s foundational. Containers may be isolated, but they’re not invincible. Here are the non-negotiables for building secure Docker images:
1. Never Run as Root (Seriously)
Running your app as root
inside a container is like locking your front door but leaving the key under the mat, it’s just a matter of time.
Why it’s dangerous:
- If an attacker exploits a vulnerability in your app, and it’s running as root, they can:
- Escalate privileges to the host system (especially if Docker is misconfigured)
- Delete or corrupt host-mounted volumes
- Install persistent malware across containers
- Even accidental bugs (like deleting
/
) become disasters with root access.
Bad (Default is root):
# Bad: Running as root (security nightmare)
FROM alpine:3.18
COPY app.py /app.py
CMD ["python", "/app.py"]
Good (Use a non-root user):
# Good: Create and use a non-root user
FROM alpine:3.18
RUN addgroup -g 1001 -S appgroup && \
adduser -S appuser -u 1001 -G appgroup
USER appuser
COPY app.py /app.py
CMD ["python", "/app.py"]
Running as root means if your application gets compromised, the attacker has full control over the container. Don’t make it easy for them.
2. Use Minimal Base Images
Every additional package is a potential attack surface.
# Bad: Full Ubuntu with everything and the kitchen sink
FROM ubuntu:latest
# Better: Minimal Alpine
FROM alpine:3.18
# Best: Distroless (no shell, no package manager, minimal attack surface)
FROM gcr.io/distroless/java:11
Distroless images contain only your application and its runtime dependencies. No shell, no package manager, no unnecessary tools that could be exploited.
3. Scan Your Images for Vulnerabilities
Don’t blindly trust your images, always scan them before shipping to prod.
# Install Docker Scout (if not already available)
docker scout quickview
# Scan your image for vulnerabilities
docker scout cves my-app:latest
# Get recommendations for fixes
docker scout recommendations my-app:latest
Other great tools:
Don’t ship vulnerable images. Scan them and fix issues before deployment.
4. Don’t Embed Secrets in Images
Putting secrets directly in your Dockerfile is like writing your passwords on the front page of your resume.
Bad:
ENV API_KEY=super-secret-key
RUN curl -H "Authorization: Bearer super-secret-key" https://api.example.com
- Every layer is cached — and secrets become part of the image history.
- Anyone with access to your image can extract them with
docker history
or by simply running a shell in the container.
Better:
# Use build-time arguments
ARG BUILD_TIME_TOKEN
RUN curl -H "Authorization: Bearer $BUILD_TIME_TOKEN" https://api.example.com
Best:
- Pass secrets at runtime using environment variables or secret stores:
docker run -e API_KEY=secret my-app
- Or mount secrets via files:
docker run -v /secrets/api-key.txt:/run/secrets/api_key my-app
Combine this with .dockerignore
to ensure secret files never get baked into the image.
Real-World Example: Building a Production-Ready Python API
Let’s put it all together with a realistic example - a Python FastAPI application:
# Multi-stage build for a Python FastAPI application
FROM python:3.11-slim AS builder
# Install build dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
curl && \
rm -rf /var/lib/apt/lists/*
# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Production stage
FROM python:3.11-slim AS production
# Install only runtime dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl && \
rm -rf /var/lib/apt/lists/*
# Copy virtual environment from builder stage
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser
# Set up application directory
WORKDIR /app
RUN chown appuser:appuser /app
# Copy application code
COPY --chown=appuser:appuser . .
# Switch to non-root user
USER appuser
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Expose port
EXPOSE 8000
# Run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
This Dockerfile includes:
- Multi-stage build to minimize final image size
- Non-root user for security
- Proper dependency management
- Health check for monitoring
- Optimized layer caching
Building and Tagging Best Practices
Building a Docker image isn’t just about making it work, it’s about doing it predictably, reproducibly, and responsibly, especially in CI/CD environments. Here’s how to do it right.
Build Context Optimization
The Docker build context is everything sent to the Docker daemon when you run docker build
. It includes all the files in the current directory, unless filtered by .dockerignore
.
The smaller and cleaner your context, the faster your builds, and the fewer surprises you get.
Examples with Explanations:
# Most common: build using the Dockerfile in the current directory
docker build -t my-app .
It uses Dockerfile
by default and sends the current directory (.
) as build context.
# Specify a custom Dockerfile
docker build -f Dockerfile.prod -t my-app:prod .
It’s useful if you maintain multiple Dockerfiles for different environments (e.g., Dockerfile.dev
, Dockerfile.test
, Dockerfile.prod
).
# Use build arguments to customize your image
docker build --build-arg NODE_ENV=production -t my-app:prod .
It passes variables into the Dockerfile using ARG
(e.g., environment, token, config path). You have to put this in your Dockerfile
:
ARG NODE_ENV
ENV NODE_ENV=$NODE_ENV
# Disable layer caching (forces fresh rebuild)
docker build --no-cache -t my-app .
Use it when caching causes weird behavior (e.g., stale deps, broken layers). Slow, but sometimes necessary in CI or debugging.
Tagging Strategy: Version Like a Pro
A tag is like a label on a container image. If you skip the tag, Docker uses :latest
— but relying on latest
is dangerous in production.
Tagging is about predictability. You want to know what you’re deploying and where it came from.
Semantic Versioning (Recommended):
# Build with multiple tags
docker build -t my-app:latest -t my-app:1.0.0 -t my-app:1.0 .
1.0.0
: Full release version (exact and immutable)1.0
: Major/minor version, useful for quick upgradeslatest
: Convenience for local testing (avoid in prod)
Best Practice: Automate this tagging using your CI/CD tool based on
git tag
or commit hash.
Environment-Specific Tags:
docker build -t my-app:dev .
docker build -t my-app:staging .
docker build -t my-app:prod .
Useful for deploying the same app with different configs, secrets, or environments. Often paired with
--build-arg
to inject environment-specific values.
Git-Based Tagging
docker build -t my-app:$(git rev-parse --short HEAD) .
Tags image with the current Git commit hash. Great for debugging, rollbacks, and reproducibility.
Why Not Just Use latest?
docker build -t my-app .
docker run my-app:latest
Seems fine… until you realize:
latest
changes every time you rebuild- You lose track of what version is running in production
- It makes rollbacks nearly impossible
Think of
latest
like naming every commit in Git as “final_version_2_really_final_v3”. Don’t do it.
Common Dockerfile Antipatterns (Don’t Be This Person)
Even smart engineers make these mistakes, often because things “just work.” But bad Dockerfiles can lead to bloated images, slow builds, insecure containers, and CI nightmares. Here’s what to avoid:
The Kitchen Sink Approach
# ❌ Bad: Installing everything "just in case"
FROM ubuntu:latest
RUN apt-get update && apt-get install -y \
curl wget git vim nano emacs \
python3 python2 nodejs npm \
gcc g++ make cmake \
mysql-client postgresql-client \
# ... and 47 more tools you'll never use
This is like packing your entire house for a weekend trip.
Why it’s bad:
- Bloats your image by hundreds of MBs
- Expands the attack surface (more tools = more vulnerabilities)
- Slows down build times and deployment
Fix it: Install only the tools your app needs to run (not build or debug, unless it’s a build stage).
The Update Addict
# Bad: Updating packages unnecessarily
RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y curl
Think you’re securing the image but you may actually be breaking it.
Why it’s bad:
upgrade -y
may override base image assumptions and break compatibility- Reproducibility goes out the window
apt upgrade
pulls moving targets - CI builds may start failing for “no reason”
Fix it:
- Only upgrade packages if required for a CVE patch or explicit dependency
- Pin versions if you care about reproducibility
In regulated environments, regular scheduled upgrades (not ad hoc) are standard — often handled via rebuilds with patched base images.
The Root Enthusiast
# ❌ Bad: Runs as root and sets permissions wide open
FROM alpine:3.18
COPY app.py /
RUN chmod 777 /app.py
CMD ["python", "/app.py"]
This is like leaving your front door open with a “Free Stuff Inside” sign.
Why it’s bad:
- Running as root gives attackers full control of the container and potentially the host
chmod 777
is lazy and insecure: anyone (inside the container) can modify and execute
If you’re doing this, you’re basically leaving your front door wide open with a sign that says “Free Stuff Inside.”
Fix it:
FROM alpine:3.18
RUN addgroup -g 1001 -S appgroup && \
adduser -S appuser -u 1001 -G appgroup
USER appuser
COPY app.py /app.py
CMD ["python", "/app.py"]
Principle of least privilege: your container should do one thing, with just enough permissions to do it and nothing more.
Debugging Docker Build Issues Like a Pro
Docker builds can fail for many reasons: network issues, caching bugs, permission errors, or just bad assumptions. Here’s your toolbelt:
See Detailed Build Output
# Build with verbose output
docker build --progress=plain -t my-app .
It’s useful in CI or when diagnosing a step taht silently fails in auto
or tty
mode.
Get a Shell Inside the Image
# Inspect intermediate layers
docker build -t my-app .
docker run -it my-app /bin/sh # You might need to adapt if you don't have sh or bash
Use this to explore your image, inspect installed tools, debug path issues, or run ls, env, etc.
Build Specific Stages (e.g., builder)
docker build --target=builder -t my-app-debug .
For multi-stage builds: lets you stop before the final minimal image, so you can inspect the build stage contents.
See What You’re Actually Sending
# See what's in your build context
docker build --no-cache -t my-app . 2>&1 | grep -i "sending build context"
This shows how big your build context is (e.g., Sending build context to Docker daemon 187MB
).
If it’s large, check your .dockerignore
, you’re probably sending .git
, node_modules
, or test assets unintentionally.
Bonus: View Image Layers
docker history my-app
See each layer, its size, and the command that created it, great for spotting bloat. You can also use Dive for a beautiful TUI interface to explore layers.
The Bottom Line
Building Docker images isn’t rocket science, but it does require thinking like an engineer instead of a script kiddie. Here’s your checklist:
- ✅ Use specific, minimal base images
- ✅ Optimize for layer caching
- ✅ Never run as root
- ✅ Use multi-stage builds for compiled languages
- ✅ Keep secrets out of images
- ✅ Use
.dockerignore
files - ✅ Scan for vulnerabilities
- ✅ Tag images meaningfully
Stop accepting bloated, insecure images just because they’re convenient. Take control of your containerization destiny. Build lean, secure, purpose-built images that do exactly what you need and nothing more.
Your deployment pipeline will thank you, your security team will thank you, and your future self will definitely thank you when you’re not debugging someone else’s questionable life choices at 3 AM.
Next up, we’ll dive into container registries, because building great images is only half the battle - you need to store and distribute them properly too.
Built your first Docker image? Good. Now build it again, but smaller. Then do it again. Perfection is iteration, not accident.