Efficient & Secure Docker Images: Build Your Own Containers from Scratch

Part 6 of the Docker Roadmap Series

Alright, enough playing around with other people’s containers. You’ve been living like a digital tenant, running pre-built images that someone else cobbled together, probably with more bloat than a Thanksgiving dinner and security holes you could drive a truck through.

It’s time to graduate from being a Docker consumer to a Docker producer. Time to build your own images that are lean, mean, and tailored exactly to your needs. No more “it works on my machine” excuses, no more inheriting someone else’s questionable life choices baked into their base image.

The Hidden Costs of Pre-Built Images

Before we dive into building our own images, let’s talk about why you should be building your own instead of just grabbing whatever looks convenient from Docker Hub.

That node:latest image you’ve been using? It’s probably 900MB of bloated nonsense that includes:

A full Ubuntu/Debian base system with packages you’ll never use
Multiple versions of build tools “just in case”
Debug symbols for every library known to mankind
Documentation that takes up more space than your actual application
Vulnerabilities from packages that were outdated the moment the image was built

Let’s see this bloat in action:

# Pull a few "convenient" images and check their sizes
docker pull node:latest
docker pull python:latest
docker pull openjdk:latest
 
# Now check the damage
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"
# Expected output (your mileage may vary):
# REPOSITORY   TAG      SIZE
# node         latest   993MB    # Nearly a gigabyte for Node.js!
# python       latest   1.02GB   # Python shouldn't need a GB
# openjdk      latest   471MB    # Java, but still chunky

993MB for Node.js? Are you kidding me? Node.js is supposed to be lightweight! That’s like buying a sports car and getting a delivery truck.

To be fair, these images aim for broad compatibility, including tools and libraries for every possible use case, not just yours this is why there are not lightweight.

Your First Dockerfile: The Recipe for Sanity

A Dockerfile is like a recipe, but instead of making cookies, you’re making a reproducible, portable environment for your application. And just like a recipe, if you mess it up, everyone who tries to use it will know exactly who to blame.

Let’s start with the simplest possible Dockerfile - one that actually makes sense:

# This is a comment. Use them liberally or you'll forget what you were thinking.
# We start with Alpine Linux because it's tiny and gets the job done
FROM alpine:3.18
 
# Set a working directory so we're not dumping files all over the place
WORKDIR /app
 
# Install only what we absolutely need - no kitchen sink approach
RUN apk add --no-cache nodejs npm
 
# Copy your application files
COPY package.json package-lock.json ./
RUN npm ci --only=production
 
# Copy the rest of your application
COPY . .
 
# Tell Docker what port your app uses (this is documentation, not magic)
EXPOSE 3000
 
# Define what command runs when the container starts
CMD ["node", "server.js"]

Let’s build this and see the difference:

# Build your image (don't forget the dot at the end!)
docker build -t my-lean-node-app .
 
# Compare the sizes
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}" | grep -E "(node|my-lean)"
# Expected output:
# my-lean-node-app  latest   45MB     # Much better!
# node              latest   993MB    # The bloated monster

45MB vs 993MB. That’s a 95% reduction in size. Your deployment pipeline just got 20x faster, and your disk space thanks you.

What is Alpine Linux? Alpine Linux is a super lightweight, security-oriented Linux distribution commonly used in Docker images. It’s designed to be minimal (just ~5 MB) while still being functional, making it ideal for building small, fast, and secure containers.

Understanding Dockerfile Instructions (The Tools of the Trade)

Let’s break down the most important Dockerfile instructions, because understanding these is the difference between a functioning image and a digital disaster:

FROM: Choose Your Foundation Wisely

# Bad: Using a massive base image
FROM ubuntu:latest  # ~77MB base, but grows fast with packages
 
# Better: Using a smaller base
FROM alpine:3.18    # ~7MB base, minimal but functional
 
# Best: Using a specific, minimal base for your language (recommended for most Node.js apps)
FROM node:18-alpine # ~40MB base with Node.js already installed and optimized

Pro tip: Never use latest tags in production Dockerfiles. latest is a lie - it’s whatever the maintainer decided to tag as latest, and it changes without warning. Use specific versions like 3.18 or 18-alpine.

WORKDIR: Don’t Be Messy

# Bad: Dumping files everywhere like a digital pack rat
COPY . /
 
# Good: Having a dedicated workspace
WORKDIR /app
COPY . .

WORKDIR sets the working directory for subsequent commands. Think of it as cd /app but permanent. All your RUN, COPY, and CMD instructions will execute from this directory.

RUN: Execute Commands During Build

# Bad: Multiple RUN commands create multiple layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git
RUN apt-get clean
 
# Good: Chain commands to minimize layers
RUN apt-get update && \
    apt-get install -y curl git && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Each RUN instruction creates a new layer in your image. More layers = bigger image. Chain related commands together with && to keep your image lean.

COPY vs ADD: Know the Difference

# COPY: Simple file copying (use this 99% of the time)
COPY package.json ./
COPY src/ ./src/
 
# ADD: Like COPY but with superpowers (auto-extracts archives, can fetch URLs)
ADD https://example.com/file.tar.gz /tmp/  # Downloads and extracts
ADD archive.tar.gz /tmp/                   # Extracts automatically

Rule of thumb: Use COPY unless you specifically need ADD’s extra features. COPY is explicit and predictable, ADD is magical and can surprise you.

EXPOSE: Documentation, Not Magic

# This documents that your app uses port 3000
EXPOSE 3000
 
# This does NOT automatically publish the port!
# You still need -p when running: docker run -p 3000:3000 my-app

EXPOSE is pure documentation. It tells other developers (and your future confused self) what ports your application uses, but it doesn’t actually publish them.

CMD vs ENTRYPOINT: The Startup Drama

# CMD: Default command, easily overridden
CMD ["node", "server.js"]
# docker run my-app           # Runs: node server.js
# docker run my-app echo hi   # Runs: echo hi (CMD overridden)
 
# ENTRYPOINT: Always runs, arguments get appended
ENTRYPOINT ["node"]
CMD ["server.js"]
# docker run my-app           # Runs: node server.js
# docker run my-app app.js    # Runs: node app.js (CMD replaced, ENTRYPOINT preserved)

Use CMD for simple applications. Use ENTRYPOINT + CMD when you want to ensure your main executable always runs but allow flexibility in arguments.

Multi-Stage Builds: The Swiss Army Knife

Here’s where things get interesting. Multi-stage builds let you use multiple FROM instructions in a single Dockerfile, allowing you to build your application in one stage and create a minimal runtime image in another.

This is like having a fully equipped workshop to build your furniture, but only shipping the finished product - not the entire workshop.

# Stage 1: Build stage (the messy workshop)
FROM node:18-alpine AS builder
 
WORKDIR /app
 
# Install ALL dependencies (including dev dependencies for building)
COPY package.json package-lock.json ./
RUN npm ci
 
# Copy source code and build
COPY . .
RUN npm run build
 
# Stage 2: Production stage (the clean, minimal result)
FROM node:18-alpine AS production
 
WORKDIR /app
 
# Install only production dependencies
COPY package.json package-lock.json ./
RUN npm ci --only=production && npm cache clean --force
 
# Copy built application from the builder stage
COPY --from=builder /app/dist ./dist
 
# Create a non-root user (security best practice)
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nextjs -u 1001
 
USER nextjs
 
EXPOSE 3000
 
CMD ["node", "dist/server.js"]

Let’s see the magic:

# Build the multi-stage image
docker build -t my-multistage-app .
 
# Check the size
docker images my-multistage-app
# Expected: Much smaller than if we included all build dependencies

The build stage includes all the heavy build tools, TypeScript compilers, webpack, and other development dependencies. The production stage only includes the runtime and your built application. It’s like cooking a meal with all your kitchen equipment but only serving the food.

Layer Caching: The Performance Game Changer

Docker builds images in layers, and it’s smart enough to cache layers that haven’t changed. Understanding this is crucial for fast builds and maintaining your sanity during development.

The Wrong Way (Cache-Busting Nightmare)

# Bad: This busts the cache every time ANY file changes
FROM node:18-alpine
WORKDIR /app
COPY . .                    # Copies everything, cache busts on ANY change
RUN npm install            # Reinstalls ALL packages on every build
CMD ["node", "server.js"]

Every time you change a single line of code, Docker has to reinstall all your dependencies. This is like rebuilding your entire house because you changed a light bulb.

The Right Way (Cache-Friendly)

# Good: Optimized for caching
FROM node:18-alpine
WORKDIR /app
 
# Step 1: Copy only dependency files first
COPY package.json package-lock.json ./
RUN npm ci --only=production        # This layer only rebuilds if dependencies change
 
# Step 2: Copy application code after dependencies
COPY . .                           # This layer rebuilds on code changes, but deps are cached
 
CMD ["node", "server.js"]

Now when you change your application code, Docker reuses the cached dependency layer and only rebuilds from the COPY . . step forward.

.dockerignore: The Unsung Hero

Just like .gitignore keeps junk out of your Git repo, .dockerignore keeps junk out of your Docker build context:

# .dockerignore file
node_modules
npm-debug.log
.git
.gitignore
README.md
.env
.nyc_output
coverage
.nyc_output
Dockerfile
docker-compose.yml
*.md

Without this file, Docker copies everything to the build context, including your massive node_modules directory, .git history, and other files that have no business being in your image.

Image Size Optimization: Every Byte Counts

Large images are slow to build, slow to deploy, and expensive to store. Here are battle-tested techniques to keep your images lean:

1. Use Alpine Linux Base Images

# Instead of this heavyweight champion:
FROM node:18        # ~993MB
 
# Use this lightweight contender:
FROM node:18-alpine # ~40MB

Alpine Linux uses musl libc instead of glibc and apk instead of apt. It’s minimal but functional.

2. Multi-Stage Builds for Compiled Languages

# Go application with multi-stage build
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o main .
 
# Final stage: just the binary
FROM alpine:3.18
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/main .
CMD ["./main"]

The final image only contains the compiled binary, not the entire Go toolchain.

3. Remove Package Managers After Installation

# Bad: Leaves package manager cache and metadata
RUN apt-get update && apt-get install -y curl
 
# Good: Cleans up after itself
RUN apt-get update && \
    apt-get install -y curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Package managers leave behind caches, metadata, and temporary files. Clean up after yourself.

4. Use Specific Package Versions

# Bad: Installs latest versions (unpredictable, potentially large)
RUN apk add --no-cache nodejs npm
 
# Good: Pins specific versions
RUN apk add --no-cache nodejs=18.19.0-r0 npm=10.2.4-r0

This ensures reproducible builds and prevents surprise updates that might break your application.

Security Best Practices: Don’t Be a Sitting Duck

Security in Docker isn’t an afterthought, it’s foundational. Containers may be isolated, but they’re not invincible. Here are the non-negotiables for building secure Docker images:

1. Never Run as Root (Seriously)

Running your app as root inside a container is like locking your front door but leaving the key under the mat, it’s just a matter of time.

Why it’s dangerous:

If an attacker exploits a vulnerability in your app, and it’s running as root, they can:
- Escalate privileges to the host system (especially if Docker is misconfigured)
- Delete or corrupt host-mounted volumes
- Install persistent malware across containers
Even accidental bugs (like deleting /) become disasters with root access.

Bad (Default is root):

# Bad: Running as root (security nightmare)
FROM alpine:3.18
COPY app.py /app.py
CMD ["python", "/app.py"]

Good (Use a non-root user):

# Good: Create and use a non-root user
FROM alpine:3.18
RUN addgroup -g 1001 -S appgroup && \
    adduser -S appuser -u 1001 -G appgroup
USER appuser
COPY app.py /app.py
CMD ["python", "/app.py"]

Running as root means if your application gets compromised, the attacker has full control over the container. Don’t make it easy for them.

2. Use Minimal Base Images

Every additional package is a potential attack surface.

# Bad: Full Ubuntu with everything and the kitchen sink
FROM ubuntu:latest
 
# Better: Minimal Alpine
FROM alpine:3.18
 
# Best: Distroless (no shell, no package manager, minimal attack surface)
FROM gcr.io/distroless/java:11

Distroless images contain only your application and its runtime dependencies. No shell, no package manager, no unnecessary tools that could be exploited.

3. Scan Your Images for Vulnerabilities

Don’t blindly trust your images, always scan them before shipping to prod.

# Install Docker Scout (if not already available)
docker scout quickview
 
# Scan your image for vulnerabilities
docker scout cves my-app:latest
 
# Get recommendations for fixes
docker scout recommendations my-app:latest

Other great tools:

trivy: Fast, reliable CLI scanner
snyk: Great for CI/CD integration

Don’t ship vulnerable images. Scan them and fix issues before deployment.

4. Don’t Embed Secrets in Images

Putting secrets directly in your Dockerfile is like writing your passwords on the front page of your resume.

Bad:

ENV API_KEY=super-secret-key
RUN curl -H "Authorization: Bearer super-secret-key" https://api.example.com

Every layer is cached — and secrets become part of the image history.
Anyone with access to your image can extract them with docker history or by simply running a shell in the container.

Better:

# Use build-time arguments
ARG BUILD_TIME_TOKEN
RUN curl -H "Authorization: Bearer $BUILD_TIME_TOKEN" https://api.example.com

Best:

Pass secrets at runtime using environment variables or secret stores:

docker run -e API_KEY=secret my-app

Or mount secrets via files:

docker run -v /secrets/api-key.txt:/run/secrets/api_key my-app

Combine this with .dockerignore to ensure secret files never get baked into the image.

Real-World Example: Building a Production-Ready Python API

Let’s put it all together with a realistic example - a Python FastAPI application:

# Multi-stage build for a Python FastAPI application
FROM python:3.11-slim AS builder
 
# Install build dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    build-essential \
    curl && \
    rm -rf /var/lib/apt/lists/*
 
# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
 
# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
 
# Production stage
FROM python:3.11-slim AS production
 
# Install only runtime dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    curl && \
    rm -rf /var/lib/apt/lists/*
 
# Copy virtual environment from builder stage
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
 
# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser
 
# Set up application directory
WORKDIR /app
RUN chown appuser:appuser /app
 
# Copy application code
COPY --chown=appuser:appuser . .
 
# Switch to non-root user
USER appuser
 
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1
 
# Expose port
EXPOSE 8000
 
# Run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

This Dockerfile includes:

Multi-stage build to minimize final image size
Non-root user for security
Proper dependency management
Health check for monitoring
Optimized layer caching

Building and Tagging Best Practices

Building a Docker image isn’t just about making it work, it’s about doing it predictably, reproducibly, and responsibly, especially in CI/CD environments. Here’s how to do it right.

Build Context Optimization

The Docker build context is everything sent to the Docker daemon when you run docker build. It includes all the files in the current directory, unless filtered by .dockerignore.

The smaller and cleaner your context, the faster your builds, and the fewer surprises you get.

Examples with Explanations:

# Most common: build using the Dockerfile in the current directory
docker build -t my-app .

It uses Dockerfile by default and sends the current directory (.) as build context.

# Specify a custom Dockerfile
docker build -f Dockerfile.prod -t my-app:prod .

It’s useful if you maintain multiple Dockerfiles for different environments (e.g., Dockerfile.dev, Dockerfile.test, Dockerfile.prod).

# Use build arguments to customize your image
docker build --build-arg NODE_ENV=production -t my-app:prod .

It passes variables into the Dockerfile using ARG (e.g., environment, token, config path). You have to put this in your Dockerfile:

ARG NODE_ENV
ENV NODE_ENV=$NODE_ENV

# Disable layer caching (forces fresh rebuild)
docker build --no-cache -t my-app .

Use it when caching causes weird behavior (e.g., stale deps, broken layers). Slow, but sometimes necessary in CI or debugging.

Tagging Strategy: Version Like a Pro

A tag is like a label on a container image. If you skip the tag, Docker uses :latest — but relying on latest is dangerous in production.

Tagging is about predictability. You want to know what you’re deploying and where it came from.

Semantic Versioning (Recommended):

# Build with multiple tags
docker build -t my-app:latest -t my-app:1.0.0 -t my-app:1.0 .

1.0.0: Full release version (exact and immutable)
1.0: Major/minor version, useful for quick upgrades
latest: Convenience for local testing (avoid in prod)

Best Practice: Automate this tagging using your CI/CD tool based on git tag or commit hash.

Environment-Specific Tags:

docker build -t my-app:dev .
docker build -t my-app:staging .
docker build -t my-app:prod .

Useful for deploying the same app with different configs, secrets, or environments. Often paired with --build-arg to inject environment-specific values.

Git-Based Tagging

docker build -t my-app:$(git rev-parse --short HEAD) .

Tags image with the current Git commit hash. Great for debugging, rollbacks, and reproducibility.

Why Not Just Use latest?

docker build -t my-app .
docker run my-app:latest

Seems fine… until you realize:

latest changes every time you rebuild
You lose track of what version is running in production
It makes rollbacks nearly impossible

Think of latest like naming every commit in Git as “final_version_2_really_final_v3”. Don’t do it.

Common Dockerfile Antipatterns (Don’t Be This Person)

Even smart engineers make these mistakes, often because things “just work.” But bad Dockerfiles can lead to bloated images, slow builds, insecure containers, and CI nightmares. Here’s what to avoid:

The Kitchen Sink Approach

# ❌ Bad: Installing everything "just in case"
FROM ubuntu:latest
RUN apt-get update && apt-get install -y \
    curl wget git vim nano emacs \
    python3 python2 nodejs npm \
    gcc g++ make cmake \
    mysql-client postgresql-client \
    # ... and 47 more tools you'll never use

This is like packing your entire house for a weekend trip.

Why it’s bad:

Bloats your image by hundreds of MBs
Expands the attack surface (more tools = more vulnerabilities)
Slows down build times and deployment

Fix it: Install only the tools your app needs to run (not build or debug, unless it’s a build stage).

The Update Addict

# Bad: Updating packages unnecessarily
RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y curl

Think you’re securing the image but you may actually be breaking it.

Why it’s bad:

upgrade -y may override base image assumptions and break compatibility
Reproducibility goes out the window apt upgrade pulls moving targets
CI builds may start failing for “no reason”

Fix it:

Only upgrade packages if required for a CVE patch or explicit dependency
Pin versions if you care about reproducibility

In regulated environments, regular scheduled upgrades (not ad hoc) are standard — often handled via rebuilds with patched base images.

The Root Enthusiast

# ❌ Bad: Runs as root and sets permissions wide open
FROM alpine:3.18
COPY app.py /
RUN chmod 777 /app.py
CMD ["python", "/app.py"]

This is like leaving your front door open with a “Free Stuff Inside” sign.

Why it’s bad:

Running as root gives attackers full control of the container and potentially the host
chmod 777 is lazy and insecure: anyone (inside the container) can modify and execute

If you’re doing this, you’re basically leaving your front door wide open with a sign that says “Free Stuff Inside.”

Fix it:

FROM alpine:3.18
RUN addgroup -g 1001 -S appgroup && \
    adduser -S appuser -u 1001 -G appgroup
USER appuser
COPY app.py /app.py
CMD ["python", "/app.py"]

Principle of least privilege: your container should do one thing, with just enough permissions to do it and nothing more.

Debugging Docker Build Issues Like a Pro

Docker builds can fail for many reasons: network issues, caching bugs, permission errors, or just bad assumptions. Here’s your toolbelt:

See Detailed Build Output

# Build with verbose output
docker build --progress=plain -t my-app .

It’s useful in CI or when diagnosing a step taht silently fails in auto or tty mode.

Get a Shell Inside the Image

# Inspect intermediate layers
docker build -t my-app . 
docker run -it my-app /bin/sh # You might need to adapt if you don't have sh or bash

Use this to explore your image, inspect installed tools, debug path issues, or run ls, env, etc.

Build Specific Stages (e.g., builder)

docker build --target=builder -t my-app-debug .

For multi-stage builds: lets you stop before the final minimal image, so you can inspect the build stage contents.

See What You’re Actually Sending

# See what's in your build context
docker build --no-cache -t my-app . 2>&1 | grep -i "sending build context"

This shows how big your build context is (e.g., Sending build context to Docker daemon 187MB). If it’s large, check your .dockerignore, you’re probably sending .git, node_modules, or test assets unintentionally.

Bonus: View Image Layers

docker history my-app

See each layer, its size, and the command that created it, great for spotting bloat. You can also use Dive for a beautiful TUI interface to explore layers.

The Bottom Line

Building Docker images isn’t rocket science, but it does require thinking like an engineer instead of a script kiddie. Here’s your checklist:

✅ Use specific, minimal base images
✅ Optimize for layer caching
✅ Never run as root
✅ Use multi-stage builds for compiled languages
✅ Keep secrets out of images
✅ Use .dockerignore files
✅ Scan for vulnerabilities
✅ Tag images meaningfully

Stop accepting bloated, insecure images just because they’re convenient. Take control of your containerization destiny. Build lean, secure, purpose-built images that do exactly what you need and nothing more.

Your deployment pipeline will thank you, your security team will thank you, and your future self will definitely thank you when you’re not debugging someone else’s questionable life choices at 3 AM.

Next up, we’ll dive into container registries, because building great images is only half the battle - you need to store and distribute them properly too.

Built your first Docker image? Good. Now build it again, but smaller. Then do it again. Perfection is iteration, not accident.

❄️ Pierre Munhoz engineering blog

🚀 Elevate Your Data Engineering Skills

Explorer

Efficient & Secure Docker Images: Build Your Own Containers from Scratch

The Hidden Costs of Pre-Built Images

Your First Dockerfile: The Recipe for Sanity

Understanding Dockerfile Instructions (The Tools of the Trade)

FROM: Choose Your Foundation Wisely

WORKDIR: Don’t Be Messy

RUN: Execute Commands During Build

COPY vs ADD: Know the Difference

EXPOSE: Documentation, Not Magic

CMD vs ENTRYPOINT: The Startup Drama

Multi-Stage Builds: The Swiss Army Knife

Layer Caching: The Performance Game Changer

The Wrong Way (Cache-Busting Nightmare)

The Right Way (Cache-Friendly)

.dockerignore: The Unsung Hero

Image Size Optimization: Every Byte Counts

1. Use Alpine Linux Base Images

2. Multi-Stage Builds for Compiled Languages

3. Remove Package Managers After Installation

4. Use Specific Package Versions

Security Best Practices: Don’t Be a Sitting Duck

1. Never Run as Root (Seriously)

2. Use Minimal Base Images

3. Scan Your Images for Vulnerabilities

4. Don’t Embed Secrets in Images

Real-World Example: Building a Production-Ready Python API

Building and Tagging Best Practices

Build Context Optimization

Tagging Strategy: Version Like a Pro

Common Dockerfile Antipatterns (Don’t Be This Person)

The Kitchen Sink Approach

The Update Addict

The Root Enthusiast

Debugging Docker Build Issues Like a Pro

See Detailed Build Output

Get a Shell Inside the Image

Build Specific Stages (e.g., builder)

See What You’re Actually Sending

Bonus: View Image Layers

The Bottom Line

📬 Join the Newsletter

React to this article!

Table of Contents