Docker Data Persistence Explained (and How to Stop Losing Your Data)

Part 4 of the Docker Roadmap Series

Alright, let’s talk about one of the biggest “gotchas” in Docker that has probably made you want to throw your laptop out the window at least once. You spend hours setting up a perfect database container, configuring everything just right, adding test data, and then… POOF! You remove the container and everything vanishes into the digital void.

Welcome to the wonderful world of ephemeral filesystems, where data goes to die if you don’t know what you’re doing.

The Great Data Disappearing Act

Here’s the harsh reality that Docker newbies discover the hard way: containers are designed to be disposable. Every single file you create, every database record you insert, every log you write inside a container lives in what I like to call “digital quicksand”: it looks solid until it suddenly isn’t.

Let’s witness this ephemeral behavior firsthand. We’ll start a simple Ubuntu container, create some files inside it, then remove the container. Watch what happens to our ‘important’ data:

# Step 1: Start a container and create some "important" data
# We use -it for interactive mode, so you can type commands directly inside the container.
docker run -it --name data-disaster ubuntu:latest /bin/bash
 
# Inside the container (you'll be in a bash prompt here)
echo "My super important data" > /tmp/critical-info.txt
exit # Type 'exit' to leave the container
 
# Step 2: Verify the data is there (for now)
# The container is stopped, but its filesystem layer still exists.
docker start data-disaster
docker exec data-disaster cat /tmp/critical-info.txt
# Expected Output: My super important data
 
# Step 3: Remove the container
# This is the crucial step – the top writable layer is deleted.
docker rm --force data-disaster
 
# Step 4: Try to access the data again...
# Since the container (and its writable layer) is gone, these commands will now fail.
docker run --name data-disaster-attempt ubuntu:latest cat /tmp/critical-info.txt
# Expected Output: Error response from daemon: No such container: data-disaster-attempt
 
 
# Alternatively, if you try to create a *new* container and look for the old data:
docker run --rm ubuntu:latest ls /tmp/critical-info.txt
# Expected Output: ls: cannot access '/tmp/critical-info.txt': No such file or directory
 
# It's gone. Forever. Kaput. The container, and its changes, are no more.

This isn’t a bug, it’s a feature! Containers are supposed to be stateless, immutable, and replaceable. But in the real world, we need to store data somewhere, and that’s where Docker’s persistence mechanisms come to the rescue.

Understanding the Container Filesystem Layers

Before we dive into solutions, let’s understand why this happens. Remember those union filesystems we talked about in the previous articles? Here’s how they work in practice:

When you run a container, Docker creates a thin, writable layer on top of the read-only image layers. This is where all your changes go:

┌─────────────────────────┐ ← Container Layer (Read-Write)
├─────────────────────────┤ ← Image Layer 3 (Read-Only)
├─────────────────────────┤ ← Image Layer 2 (Read-Only)  
└─────────────────────────┘ ← Base Layer (Read-Only)

Please note that there can be more or less layers in your docker container.

Everything you write gets stored in that top layer. When you delete the container, that layer gets nuked from orbit. It’s gone, and there’s no recovery.

You can actually peek into the ephemeral layer and see exactly what changes Docker tracks within a running container. Let’s start an Nginx container, make a couple of modifications, and then use docker diff:

# Start a container and make some changes
docker run -d --name change-tracker nginx:latest
docker exec change-tracker touch /tmp/new-file.txt
docker exec change-tracker sh -c "echo 'modified' >> /etc/issue"
 
# See what Docker is tracking
docker diff change-tracker
# A /tmp/new-file.txt
# C /etc/hostname
# Others files...

The A means “Added,” and C means “Changed.” All of this is living dangerously in that ephemeral container layer.

Volume Mounts: Docker’s Data Lifeline

Volumes are Docker’s answer to persistent data storage, and they’re managed entirely by Docker. Think of them as external hard drives that you can plug into any container.

Creating and Using Volumes

# Create a named volume
docker volume create my-precious-data
 
# List all volumes
docker volume ls
 
# Get detailed info about a volume
docker volume inspect my-precious-data

The output of inspect will show you where Docker actually stores this volume on your host system. On Linux, it’s typically under /var/lib/docker/volumes/, but honestly, you shouldn’t care about the exact location – that’s Docker’s job to manage.

Mounting Volumes in Containers

Now for the main event: mounting our my-precious-data volume to an Nginx container. We’ll write some content to it, then demonstrate that even if we destroy the container, our data remains safe and accessible for a new container.

# Step 1: Run Nginx, mounting our 'my-precious-data' volume
# We'll map the volume to Nginx's default HTML directory.
docker run -d \
  -p 80:80 \
  --name persistent-nginx \
  -v my-precious-data:/usr/share/nginx/html \
  nginx:latest
 
# Step 2: Add some content to the mounted volume
# This content is now written to the volume, not the container's ephemeral layer.
docker exec persistent-nginx sh -c "echo '<h1>I will survive\!</h1>' > /usr/share/nginx/html/index.html"
 
# Step 3: (Optional) Test if the content is served (assuming you expose the port for Nginx)
# curl http://localhost
 
# Step 4: The moment of truth - destroy the container!
docker stop persistent-nginx
docker rm persistent-nginx
 
# Step 5: Create a brand new container, mounting the *same* volume
# Notice we expose port 8080 this time to avoid conflicts if you had a previous Nginx running.
docker run -d --rm \
  --name phoenix-nginx \
  -v my-precious-data:/usr/share/nginx/html \
  -p 8080:80 \
  nginx:latest
 
# Step 6: Check if our data survived in the new container!
curl http://localhost:8080
# Expected Output: <h1>I will survive!</h1> (Proof that the data persisted!)

Victory! The data survived the container apocalypse.

Volume Best Practices

Here are some hard-learned lessons about volumes:

1. Name your volumes explicitly

Anonymous volumes are created automatically by Docker when you don’t specify a name (e.g., just -v /path/in/container). While convenient for quick tests, they are hard to track, debug, and manage, often leading to orphaned data. Always give your volumes descriptive names so you know exactly what data they contain.

# Good - descriptive name
docker volume create postgres-production-data
 
# Bad - anonymous volume (you'll lose track of it)
docker run -v /var/lib/postgresql/data postgres:15

2. Use volumes for databases

Databases are the quintessential example of stateful applications that absolutely require persistent storage. Never run a database in Docker without mounting a volume for its data directory. If you don’t, every time the container is removed or recreated, your entire database will vanish.

# PostgreSQL with persistent data
docker run -d \
  --name postgres-db \
  -e POSTGRES_PASSWORD=secretpassword \
  -v postgres-data:/var/lib/postgresql/data \
  -p 5432:5432 \
  postgres:15
 
# The database will survive container restarts and recreations

3. Share volumes between containers

Sometimes, you’ll have multiple containers that need to access the same persistent data. A common pattern is a “producer” container that writes data, and a “consumer” container that reads or processes it. Volumes are perfect for this shared access.

Let’s illustrate with a simple example where one container writes timestamps to a file, and another container continuously reads the latest timestamp from that same file.

# Step 1: Create a dedicated shared volume
docker volume create shared-storage
 
# Step 2: Start a 'producer' container that writes the current date/time to a file every 10 seconds
# This container mounts 'shared-storage' to /output and keeps appending to timestamps.log
docker run -d --name producer -v shared-storage:/output alpine \
  sh -c "while true; do date >> /output/timestamps.log; sleep 10; done"
 
# Step 3: Start a 'consumer' container that continuously reads the last line of the same file
# This container mounts 'shared-storage' to /input and will see the updates from the producer.
docker run -d --name consumer -v shared-storage:/input alpine \
  sh -c "while true; do tail -1 /input/timestamps.log; sleep 5; done"
 
# Step 4: Verify that the consumer is receiving data from the producer
# Wait a few seconds, then view the logs of the consumer container.
docker logs consumer --follow # Press Ctrl+C to exit logs
 
# Expected output from 'docker logs consumer' will show new timestamps appearing every ~5 seconds:
# Tue Jun 10 17:00:00 UTC 2025
# Tue Jun 10 17:00:10 UTC 2025
# Tue Jun 10 17:00:20 UTC 2025
# ... and so on, confirming data sharing.
 
# Clean up the containers (important!)
docker stop producer consumer
docker rm producer consumer
docker volume rm shared-storage

Bind Mounts: When You Need Host Access

While Docker volumes are ideal for managing data within Docker’s ecosystem, sometimes you need direct access to files or directories that reside directly on your host machine. That’s where bind mounts come in. They create a direct, live link (a “portal”) between a specific path on your host and a specific path inside your container. Changes made on either side are immediately reflected on the other.

This makes bind mounts incredibly powerful for development, configuration management, and scenarios where you want the container to work directly with your host’s filesystem.

Basic Bind Mount Usage (Serving Static Content)

Let’s start with a simple example: serving static website content directly from a directory on your host machine using an Nginx container. This is extremely useful for quickly testing local web projects without copying files into the container image.

# Step 1: Create a directory on your host machine for your website content.
# Replace '/path/to/your/website' with an actual path, e.g., 'mkdir ~/my-nginx-site'
mkdir -p /path/to/your/website
 
# Step 2: Run an Nginx container, binding the host directory to Nginx's HTML serving path.
# The `-v /path/to/your/website:/usr/share/nginx/html` part is the bind mount.
# It means "mount my host's /path/to/your/website into the container's /usr/share/nginx/html".
docker run -d \
  --name web-server \
  -v /path/to/your/website:/usr/share/nginx/html \
  -p 8080:80 \
  nginx:latest
 
# Step 3: Create an index.html file directly on your host machine within the mounted directory.
# Watch how changes on the host immediately appear inside the container!
echo "<h1>Live editing! Hello from the host!</h1>" > /path/to/your/website/index.html
 
# Step 4: Access the Nginx server to see the content.
curl http://localhost:8080
 
# Expected Output: <h1>Live editing! Hello from the host!</h1>
 
# Clean up (important!):
docker stop web-server
docker rm web-server
# Optionally, remove the directory you created on your host: rm -rf /path/to/your/website

Key Takeaway: With bind mounts, your container isn’t just copying files; it’s directly accessing the files on your host. This is a fundamental difference from volumes where Docker manages the storage location itself.

Development Workflow with Bind Mounts (Live Reload)

This is where bind mounts truly shine: enabling rapid iteration in development environments. By mounting your application’s source code directly from your host into the container, any changes you save on your local machine are immediately visible to the application running inside the container, often triggering live reloads.

# Assume you are in your Node.js project's root directory (e.g., where package.json is).
# $(pwd) expands to your current working directory on the host.
 
# Run a Node.js development server, mounting your local source code and package.json.
# -v $(pwd)/src:/app/src: Your local 'src' directory is mounted to '/app/src' in the container.
# -v $(pwd)/package.json:/app/package.json: Your local package.json is mounted to '/app/package.json'.
# This allows the Node.js application inside to run your actual development files.
docker run -d \
  --name dev-server \
  -v $(pwd)/src:/app/src \
  -v $(pwd)/package.json:/app/package.json \
  -p 3000:3000 \
  node:18 \
  npm run dev
 
# Now, edit any file in your local 'src' directory on your host.
# Your development server inside the container will likely detect the change
# and automatically reload, providing a seamless development experience.

Why this is useful: You avoid rebuilding your Docker image every time you make a code change. The container becomes a consistent runtime environment, while your code remains on your host machine, accessible by your favorite IDE.

Configuration Management (Read-Only Mounts)

Bind mounts are also excellent for providing configuration files to your containers without embedding them directly into the image. This allows you to easily change configurations without rebuilding the image, and even share them between different services.

# Assume you have 'nginx.conf' and an 'ssl-certs' directory in your current host directory.
 
# Run an Nginx container, mounting specific configuration files and an SSL certs directory.
# -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro: Mounts your local nginx.conf as read-only.
# -v $(pwd)/ssl-certs:/etc/nginx/ssl:ro: Mounts your local SSL certificate directory as read-only.
docker run -d \
  --name configured-nginx \
  -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro \
  -v $(pwd)/ssl-certs:/etc/nginx/ssl:ro \
  -p 443:443 \
  nginx:latest

Notice the :ro suffix? This stands for read-only. It’s a crucial security and stability feature that makes the mount read-only from the container’s perspective. This prevents the container from accidentally (or maliciously) modifying or deleting your critical host configuration files.

Bind Mounts vs Volumes: The Eternal Debate

This is a fundamental choice when dealing with Docker persistence, and while there’s often overlap, each mechanism has its strengths. Think of it less as a strict rule and more about the context and lifecycle of the data you’re managing.

Here’s a detailed comparison to help you decide when to use each:

Feature/Use Case	Volumes (Docker-managed)	Bind Mounts (Host-managed)
Primary Use Case	Persistent application data (databases, queues, logs)	Accessing host files/directories; development
Host Path	Docker manages the location (opaque to user)	You explicitly define the host path
Management	Managed entirely by Docker (create, ls, inspect, prune)	Managed by the host operating system; Docker just links
Data Lifecycle	Independent of container lifecycle; easy backup/restore	Tied directly to host filesystem; easier for manual inspection
Performance	Generally good, optimized by Docker	Can be slightly slower (especially on macOS/Windows due to VM layer)
Cross-OS Compatibility	More portable across different Docker host OS (Linux, Win, Mac)	Host path syntax varies (e.g., `C:\` on Windows)
Typical Scenarios	- Databases (PostgreSQL, MySQL, Redis)	- Local development with live reload
	- Message queues (Kafka, RabbitMQ)	- Mounting configuration files (often `:ro`)
	- Application logs that need persistence	- Injecting specific host-level secrets/certs
	- Sharing data between related containers (e.g., app & DB)	- Providing large datasets that already exist on the host
Best For	Production data storage, Docker-native solutions	Development, host integration, specific config injection

My opinionated take:

Use volumes for data that truly belongs to your application and needs to persist independently of the host. Think of volumes as Docker’s internal, optimized way to store your application’s state. This is almost always the default choice for databases and persistent application data in production.
Use bind mounts for data that belongs to your host system or your development workflow. This includes your source code, local configuration files you’re actively tweaking, or scenarios where you need direct access to host resources. They bridge the gap between your containerized app and your local development environment.

Tmpfs Mounts: The Forgotten Hero

Beyond volumes and bind mounts, there’s a third type of mount that often goes unmentioned but is incredibly useful for specific scenarios: tmpfs mounts. These create a temporary filesystem that lives entirely in the host’s memory (RAM), not on disk.

Because they reside in volatile memory, any data written to a tmpfs mount is ephemeral: it disappears permanently when the container stops or is removed, and it’s also gone if the host system reboots. This makes them unsuitable for persistent application data, but perfect for temporary, sensitive, or high-performance data that doesn’t need to survive restarts.

Let’s see how to use one:

# Step 1: Create a container with a tmpfs mount
# We'll mount a tmpfs filesystem to the /tmp directory inside the container.
# '--tmpfs /tmp:rw,size=100m' specifies the mount point (/tmp),
# 'rw' for read/write permissions, and 'size=100m' limits its memory usage to 100MB.
docker run -d \
  --name memory-storage \
  --tmpfs /tmp:rw,size=100m \
  nginx:latest
 
echo "Container 'memory-storage' started with a tmpfs mount on /tmp."
echo ""
 
# Step 2: Write some data to the tmpfs mount inside the container.
docker exec memory-storage sh -c "echo 'This data lives only in RAM.' > /tmp/volatile-info.txt"
docker exec memory-storage sh -c "ls -lh /tmp/volatile-info.txt"
# Expected Output (showing the file exists): -rw-r--r-- 1 root root 29 ... /tmp/volatile-info.txt
echo "Data written to /tmp inside the container."
echo ""
 
# Step 3: Demonstrate that the data disappears when the container stops.
echo "Stopping and removing the container..."
docker stop memory-storage > /dev/null
docker rm memory-storage > /dev/null
echo "Container removed. Now, let's try to find the data again (it won't be there)."
echo ""
 
# Step 4: Try to access the data from the previously created tmpfs mount (it's gone).
docker run --rm alpine ls /tmp/volatile-info.txt || true # This will fail, indicating the file is gone.
# Expected Output: ls: /tmp/volatile-info.txt: No such file or directory
 
echo "--- Data written to tmpfs has vanished as the container was stopped/removed. ---"

This is perfect for use cases like:

Storing sensitive, short-lived data: Think session tokens, temporary encryption keys, or authentication credentials that you want to ensure are not written to persistent disk at any point.
High-performance temporary storage: For applications that create a lot of small, temporary files during processing (e.g., compilers, transcoders), using tmpfs can offer faster read/write speeds compared to disk-backed volumes, reducing I/O overhead.
Application cache that doesn’t need persistence: If your application generates a cache that can be easily rebuilt and doesn’t need to survive container restarts, tmpfs prevents unnecessary disk writes and ensures a clean slate on each launch.

However, remember that tmpfs mounts reduce available RAM and should be used carefully in resource-constrained environments (e.g., on shared CI runners).

Real-World Data Persistence Patterns

Understanding individual persistence mechanisms is crucial, but knowing how to combine them into robust, production-ready patterns is where the real power lies. Let me show you some patterns I’ve seen work well in production environments:

Database Container Pattern (Separation of Concerns)

It’s common practice to separate your database’s core data from its configuration files, even within Docker. This allows you to manage config changes without risking your precious data, and simplifies backups/restores if you only need the data.

# Step 1: Create dedicated named volumes for different data types.
# 'postgres-data' will hold the actual database files (e.g., tables, indexes).
# 'postgres-config' can hold custom configuration files (e.g., postgresql.conf).
docker volume create postgres-data
docker volume create postgres-config
 
# Step 2: Run your production database container.
# Mount both volumes to their respective paths inside the PostgreSQL container.
# '--restart unless-stopped' ensures the database automatically restarts if Docker restarts.
docker run -d \
  --name production-db \
  -e POSTGRES_PASSWORD=secure-password \
  -v postgres-data:/var/lib/postgresql/data \
  -v postgres-config:/etc/postgresql \
  --restart unless-stopped \
  postgres:15
 
echo "PostgreSQL production database 'production-db' running with data and config volumes."
echo "Data is persisted in 'postgres-data' and config in 'postgres-config'."
# For a real scenario, you might then copy custom configs into postgres-config volume.
# Example: docker run --rm -v postgres-config:/config alpine cp /tmp/my-custom-pg.conf /config/

Why this pattern? This separation makes managing database upgrades, configuration changes, and data backups more flexible and less risky. You can swap out a database image or update a configuration without touching the core data volume.

Application + Database Pattern (Networked Services)

Most real-world applications consist of multiple services working together, typically an application server (e.g., Node.js, Python, Java) and a database. This pattern demonstrates how to set up persistent storage for both within a networked environment.

# Step 1: Create a dedicated Docker network for your application components.
# This allows the app and database containers to communicate securely by name.
docker network create app-network
 
# Step 2: Run your database container with persistent storage within the new network.
# 'db-data' volume ensures database records survive.
# '-e POSTGRES_PASSWORD' sets the initial password.
docker run -d \
  --name app-database \
  --network app-network \
  -v db-data:/var/lib/postgresql/data \
  -e POSTGRES_PASSWORD=secret \
  postgres:15
echo "Database container 'app-database' started on 'app-network'."
echo ""
 
# Step 3: Run your application container, also on the same network, with its own persistence.
# 'app-logs' volume ensures your application logs are persistent.
# '-p 8080:8080' exposes the application's port to your host.
# Your application code would connect to 'app-database' via its name on the network.
docker run -d \
  --name web-app \
  --network app-network \
  -v app-logs:/var/log/app \
  -p 8080:8080 \
  my-web-app:latest # Replace 'my-web-app:latest' with your actual app image
echo "Application container 'web-app' started on 'app-network', connecting to 'app-database'."
echo "Logs are persisted in 'app-logs' volume."
 
# To clean up:
# docker stop app-database web-app
# docker rm app-database web-app
# docker network rm app-network
# docker volume rm db-data app-logs

Why this pattern? This is the foundation for microservices architectures. Services are isolated, persistent, and communicate efficiently over a dedicated network, making deployments and scaling easier.

Backup and Restore Strategy for Volumes

Docker volumes contain your most valuable data. Implementing a backup and restore strategy is non-negotiable for production. This pattern shows a simple way to back up a named volume to your host filesystem using a temporary Alpine container and tar.

# Prerequisite: Ensure you have a 'postgres-data' volume from a previous example.
# If not, create one and put some data in it:
# docker volume create postgres-data
# docker run --rm -v postgres-data:/data alpine sh -c "echo 'test data' > /data/my_db.txt"
 
# Step 1: Backup a Docker volume to your host machine.
# '--rm': The container is removed immediately after it exits.
# '-v postgres-data:/source:ro': Mounts your database volume as READ-ONLY to /source.
# '-v $(pwd)/backups:/backup': Mounts a 'backups' directory on your host to /backup inside the container.
# 'alpine tar czf ...': Uses Alpine Linux to create a compressed tarball of the volume's content.
mkdir -p backups # Ensure the host backups directory exists
docker run --rm \
  -v postgres-data:/source:ro \
  -v $(pwd)/backups:/backup \
  alpine tar czf /backup/postgres-backup-$(date +%Y%m%d).tar.gz -C /source .
 
echo "Volume 'postgres-data' backed up to $(pwd)/backups/postgres-backup-$(date +%Y%m%d).tar.gz"
echo ""
 
# Step 2: (Optional) Simulate data loss by removing the original volume.
# docker volume rm postgres-data
 
# Step 3: Restore a volume from a backup.
# '--rm': Container removed after exit.
# '-v new-postgres-data:/target': Mounts a NEW volume (or the original, empty one) as the target.
# '-v $(pwd)/backups:/backup': Mounts your host's backup directory.
# 'alpine tar xzf ...': Extracts the tarball into the target volume.
docker run --rm \
  -v postgres-data:/target \
  -v $(pwd)/backups:/backup \
  alpine tar xzf /backup/postgres-backup-$(date +%Y%m%d).tar.gz -C /target
 
echo "Volume 'postgres-data' restored from backup."
 
# Cleanup:
# rm -rf backups
# docker volume rm postgres-data # If you created a new one for testing restore

Why this pattern? This shows a portable way to manage your Docker volume data. You can back up volumes to any host path, move them, and restore them, providing essential disaster recovery capabilities.

Common Pitfalls and How to Avoid Them

Even with the right patterns, there are common missteps that can lead to frustration and data loss. Being aware of these will save you considerable headaches.

1. Anonymous Volumes Everywhere

As discussed earlier, using anonymous volumes (by only specifying the container path with -v) is a common mistake for beginners. They are difficult to manage and prone to being orphaned.

# This creates an anonymous volume you'll never find easily or reuse.
# Docker assigns it a random, long hash as a name.
docker run -d --name risky-db postgres:15 -v /var/lib/postgresql/data
 
# Do this instead: Name your volumes!
# This creates a clearly identifiable and manageable volume.
docker run -d --name proper-db -v postgres-data:/var/lib/postgresql/data postgres:15
 
# To see the anonymous volume created by the first command (look for a long hash without a friendly name):
# docker volume ls --filter dangling=true # Often dangling if container removed without volume managed
# docker volume ls

How to avoid: Always use named volumes for any data you care about. If you’re building a Dockerfile, consider using VOLUME instructions, but remember they create anonymous volumes by default unless managed externally (e.g., via Docker Compose).

2. Bind Mount Permission Issues (Host User vs. Container User)

A very common problem with bind mounts, especially in development, is permission conflicts. If your container’s process runs as root (default for many images) but tries to write to a bind-mounted host directory owned by your unprivileged host user, it might fail. Conversely, if the container runs as a specific user, it might not have permissions to a host directory owned by root or another user.

# Problem: This might fail due to the container (running as root) trying to write
# to a directory on your host owned by your local user, or vice versa.
# docker run -v $(pwd)/data:/app/data node:18 npm start
 
# Fix: Map the container's user to your host user (most common fix for development).
# $(id -u) gets your host user ID, $(id -g) gets your host group ID.
# This tells Docker to run the container's process with your host user's permissions.
docker run -it --rm \
  --user $(id -u):$(id -g) \
  -v $(pwd)/data:/app/data \
  node:18 /bin/bash # Using bash for interactive demo
 
# Inside the container, you can now create/modify files in /app/data with your host user's permissions:
# touch /app/data/test.txt
# ls -l /app/data/test.txt # Should show your host user/group

How to avoid: For bind mounts, especially in development, try to match the user ID (UID) and group ID (GID) of the process inside the container to your host user. Some application images (e.g., Node.js, Python) allow you to specify the user via environment variables or image-specific settings.

3. Forgetting to Clean Up (Orphaned Volumes)

Volumes consume disk space, and if you’re not careful, they can accumulate quickly, especially anonymous ones or those from old, abandoned projects. This leads to wasted disk space and potential confusion.

# See all volumes on your system (including anonymous and orphaned ones).
# Look for volumes without meaningful names.
docker volume ls
 
# Clean up all unused (dangling) volumes.
# These are volumes that are not attached to any container.
docker volume prune
 
# Remove a specific named volume when you are sure you no longer need its data.
# This will delete the volume and all data within it.
docker volume rm old-unused-volume

How to avoid: Regularly use docker volume ls to review your volumes. Integrate docker volume prune into your development cleanup scripts or run it periodically. Always explicitly remove named volumes when they are no longer needed.

Docker Compose: Making Persistence Easy

By now, you’ve seen that managing multiple containers with various volumes and bind mounts using raw docker run commands can quickly become tedious and error-prone. This is exactly where Docker Compose shines.

Docker Compose allows you to define your entire multi-service application, including all its services, networks, and persistent storage, in a single, easy-to-read YAML file. It simplifies the orchestration of complex applications and makes managing data persistence significantly cleaner and more declarative.

Here’s an example docker-compose.yml file demonstrating how to define volumes and bind mounts for a typical web application setup (Nginx web server, PostgreSQL database, and a custom application):

services:
  web:
    image: nginx:latest # The Nginx web server
    ports:
      - "8080:80" # Map host port 8080 to container port 80
    volumes:
      # Named volume: 'web-content' volume (defined at the bottom) mounted to Nginx's HTML directory.
      # This is where your website's static files would live persistently.
      - web-content:/usr/share/nginx/html
      # Bind mount: Mounts your local 'nginx.conf' file into the container as read-only.
      # Ideal for external configuration management.
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
 
  database:
    image: postgres:15 # The PostgreSQL database server
    environment:
      POSTGRES_PASSWORD: secret # Sets the database password (for development, use secrets in prod!)
    volumes:
      # Named volume: 'postgres-data' volume (defined at the bottom) for persistent database files.
      - postgres-data:/var/lib/postgresql/data
      # Bind mount: Mounts a local 'init.sql' script for initial database setup as read-only.
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro
    
  app:
    image: my-app:latest # Your custom application image
    depends_on:
      - database # Ensures the database starts before the application
    volumes:
      # Named volume: 'app-logs' volume (defined at the bottom) for persistent application logs.
      - app-logs:/var/log/app
      # Bind mount: Mounts a local 'config' directory containing application config files as read-only.
      - ./config:/app/config:ro
 
# This section declares the named volumes used by the services above.
# Docker Compose will automatically create these volumes if they don't exist.
volumes:
  web-content:
  postgres-data:
  app-logs:

Why this is so much cleaner:

Instead of juggling multiple docker run commands with a dozen volume flags, all your persistence configurations are centralized and clearly readable in one docker-compose.yml file. You can then start, stop, and manage your entire application stack, including all its volumes, with simple commands like docker compose up -d and docker compose down. This significantly reduces complexity and improves collaboration in development and deployment.

Monitoring and Debugging Storage

When things go wrong (and they will), here are your debugging tools:

# 1. See overall Docker disk space usage, including volumes.
# This gives you a summary of how much space containers, images, and volumes are consuming.
docker system df -v
# Expected Output Example (truncated):
# TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
# Images          X         X         X         X
# Containers      X         X         X         X
# Local Volumes   X         X         X         X
# Build Cache     X         X         X         X
 
# 2. Inspect a specific container's mounts.
# This command provides detailed information about all mounts (volumes, bind mounts, tmpfs)
# associated with a specific container, including their source (host path) and destination (container path).
# The 'jq' part filters the output to only show the 'Mounts' array.
docker inspect <container-name-or-id> | jq '.[0].Mounts'
# Replace <container-name-or-id> with the actual name or ID of your container.
# Expected Output Example:
# [
#   {
#     "Type": "volume",
#     "Name": "my-precious-data",
#     "Source": "/var/lib/docker/volumes/my-precious-data/_data",
#     "Destination": "/usr/share/nginx/html",
#     ...
#   },
#   {
#     "Type": "bind",
#     "Source": "/host/website",
#     "Destination": "/usr/share/nginx/html",
#     ...
#   }
# ]
 
# 3. Check detailed information about a specific volume.
# This command is invaluable for understanding where a named volume is stored on your host,
# its driver, labels, and other metadata managed by Docker.
docker volume inspect <volume-name>
# Replace <volume-name> with the actual name of your volume (e.g., 'postgres-data').
# Expected Output Example (truncated):
# [
#   {
#     "Driver": "local",
#     "Labels": {},
#     "Mountpoint": "/var/lib/docker/volumes/my-precious-data/_data", # This is the crucial part!
#     "Name": "my-precious-data",
#     "Options": {},
#     "Scope": "local"
#   }
# ]
 
# 4. See what's taking up space inside a specific volume.
# You can use a temporary Alpine container to run standard Linux commands like 'du' (disk usage)
# directly inside a volume. This helps pinpoint large files or directories within your volume.
docker run --rm -v <volume-name>:/data alpine du -sh /data/*
# Replace <volume-name> with the actual name of your volume (e.g., 'postgres-data').
# Expected Output Example:
# 16K    /data/lost+found
# 24M    /data/base
# 5.0M   /data/pg_wal
# 48K    /data/pg_xact
# ...

The Bottom Line

Data persistence in Docker isn’t magic, but it does require intentional design. Here’s my TL;DR:

🗂 Use volumes for persistent app data
🛠 Use bind mounts for development and host integration
⚡ Use tmpfs for fast, ephemeral data
📛 Always name your volumes
💾 Set up a backup/restore strategy
📦 Use Docker Compose to manage it all cleanly

The ephemeral nature of containers isn’t a bug, it’s what makes them powerful. But when you need persistence, Docker gives you the tools to do it right. Use them wisely, and you’ll never lose another byte of precious data to the container void.

Next up, we’ll dive into building your own Docker images, because running other people’s containers is just the beginning of your Docker journey.

Got questions about data persistence? Hit me up! And if you’ve lost data because you forgot to use volumes… well, we’ve all been there. Consider it a rite of passage.

❄️ Pierre Munhoz engineering blog

Explorer

Docker Data Persistence Explained (and How to Stop Losing Your Data)

The Great Data Disappearing Act

Understanding the Container Filesystem Layers

Volume Mounts: Docker’s Data Lifeline

Creating and Using Volumes

Mounting Volumes in Containers

Volume Best Practices

Bind Mounts: When You Need Host Access

Basic Bind Mount Usage (Serving Static Content)

Development Workflow with Bind Mounts (Live Reload)

Configuration Management (Read-Only Mounts)

Bind Mounts vs Volumes: The Eternal Debate

Tmpfs Mounts: The Forgotten Hero

Real-World Data Persistence Patterns

Database Container Pattern (Separation of Concerns)

Application + Database Pattern (Networked Services)

Backup and Restore Strategy for Volumes

Common Pitfalls and How to Avoid Them

Docker Compose: Making Persistence Easy

Monitoring and Debugging Storage

The Bottom Line

React to this article!

Table of Contents

❄️ Pierre Munhoz engineering blog

🚀 Elevate Your Data Engineering Skills

Explorer

Docker Data Persistence Explained (and How to Stop Losing Your Data)

The Great Data Disappearing Act

Understanding the Container Filesystem Layers

Volume Mounts: Docker’s Data Lifeline

Creating and Using Volumes

Mounting Volumes in Containers

Volume Best Practices

Bind Mounts: When You Need Host Access

Basic Bind Mount Usage (Serving Static Content)

Development Workflow with Bind Mounts (Live Reload)

Configuration Management (Read-Only Mounts)

Bind Mounts vs Volumes: The Eternal Debate

Tmpfs Mounts: The Forgotten Hero

Real-World Data Persistence Patterns

Database Container Pattern (Separation of Concerns)

Application + Database Pattern (Networked Services)

Backup and Restore Strategy for Volumes

Common Pitfalls and How to Avoid Them

Docker Compose: Making Persistence Easy

Monitoring and Debugging Storage

The Bottom Line

📬 Join the Newsletter

React to this article!

Table of Contents