Learn the ins, outs, and limits of Docker's native technology for integrating containers with local file systems. Credit: grzegorz petrykowski / Shutterstock Docker containers are meant to be immutable, meaning the code and data they hold never change. Immutability is useful when you want to be sure the code running in production is the same as the code that passed QA testing; it’s not so useful when you need to write data and persist it across application lifetimes. Most of the time, you can address the need for data persistence by using an external database. But sometimes an application in a container just needs to use a local file system, or something that looks like a local file system. Enter Docker volumes, Docker’s native mechanism for dealing with local storage. A Docker volume is a convenient way to allow containerized apps to write and retrieve data through a local file system or file system-like interface. But Docker volumes are not a panacea for managing state. We must use them wisely. How Docker volumes work Docker volumes provide a way to map a file path inside a container, called a mount point, to a file system path or file-like object outside the container. Anything written to the Docker volume will be stored externally, so it will persist across the lifetime of one or more containers. It is also possible for multiple containers to access the same volume at once (with some caveats). Docker volumes use a volume driver to control where data is stored. One example, Blockbridge, provides direct access to iSCSI targets through the volume driver layer. A more ambitious example is REX-Ray, a storage engine that works with a variety of storage vendors and standards. REX-Ray provides connectivity via Docker’s volume plug-in system or the more general Container Storage Interface spec. Creating Docker volumes manually The most basic way to create a volume is to include the -v or —volume flag, mount point, and target when you start a container: $ docker run -P —name websvc -v /websvcdata myorg/websvc python app.py This creates an “anonymous” volume with the mount point websvcdata and with the data stored in a randomly generated directory used by the Docker process. You can accomplish the same thing in a Dockerfile by including a VOLUME instruction that describes the location of a volume: FROM ubuntu: latest VOLUME /websvcdata This would be a good way to create a quick-and-dirty dumping ground for data in the course of a given container session. But it’s not as useful for persisting state across container sessions, since the name of the volume isn’t known ahead of time and the volume can’t be reused efficiently. Using the Docker volume API A better solution to the problem is to use Docker’s volume API to create named volumes. Named volumes can be easily attached to one or more containers, and thus reused a good deal more easily. $ docker volume create websvcdata This creates a Docker volume named websvcdata. However, the Docker volume doesn’t have a mount point in a container yet, so a container wouldn’t be able to access it by default. To create a mount point, you’d launch the container with a command like this: $ docker run -P —name websvc -v websvcdata:/websvcdata myorg/websvc python app.py This command is the same as the previous docker run example, but instead of the volume being created with an anonymous name, it’s created with the name websvcdata on the host. You can run docker inspect on the container and read the "Mounts" section in the resulting dump to determine if the mounts are as you intended. Note that you can’t create a named volume with a Dockerfile, because names for Docker volumes must be specified at runtime. This is intentional, since Dockerfiles cannot assume a given host and its volume paths exist—they’re meant to be run on any system with any set of volume paths. A volume specified in a Dockerfile will be created in a location that supports the persistence of the data the volume stores for the life of the container. If you run docker volume create with flags specific to the Docker storage driver, you can dictate many options for the volume’s creation. With the local file system driver, for instance, you can describe where to place the volume, what device or file system to use (such as an NFS share or a temporary file system), and many other controls. This way, you can place the volume on the best device for the particular use case. A useful tip: If you create a volume and bind it to a path inside the base image that already contains data, the data inside the base image will be copied to the volume at bind time. This is a handy way to pre-populate a volume with a set of data that you want to use as a starting point. (Note that cleaning up the populated volume is your responsibility.) Sharing Docker volumes between containers If you want more than one container to attach to the same Docker volume, all you have to do is create the volume and attach it to multiple containers: $ docker run -ti —name instance1 -v DataVol1:/datavol1 ubuntu $ docker run -ti —name instance2 —volumes-from DataVol1 ubuntu $ docker run -ti —name instance3 —volumes-from DataVol1:ro ubuntu This creates three containers, instance1 through instance3, and attaches DataVoll to each of them. The instance3 container has DataVol1 mounted as read-only, as per the :ro after the volume name. Be warned that Docker does not automatically mediate conflicts between containers that share the same volume. That’s up to your application. (More on this below.) Removing Docker volumes Volumes are not automatically removed from disk when a container is removed. This is by design, because you don’t want to remove a volume that could conceivably be used by another, as-yet-unused container in the future. That means volume unmounting and on-disk cleanup are your responsibility. Docker provides built-in tools for facilitating volume cleanup. The docker volume command has a subcommand, docker volume prune, that removes all volumes not in use by at least one container in the system. You can also modify the scope of the deletion—e.g., remove all volumes associated with a given container—by passing command-line flags. The limits of Docker volumes Docker volumes aren’t a cure-all for local persistence. Because of the way containers interact with local file systems, Docker volumes can create more problems than they solve. One key limitation is that Docker does not handle file locking in volumes used by multiple containers. That becomes the responsibility of whatever application you’re using. If you’re not confident that the application in question knows how to write to a shared file system, you could end up with file corruption in that volume. One possible solution would be to use an object storage server—for instance, a project like Minio—instead of the local file system. Another issue with Docker volumes that they can make application portability more difficult. Every machine’s storage topology is different. If you create volumes based on assumptions about where things are in the system, you may find those assumptions are not true if you try to deploy the same containers on a system you didn’t build yourself. This is less problematic if you’re using containers only on systems where you have rigorous control over the topology—e.g., an internal private cluster—but it can come back to bite you if you decide to re-architect things later. Finally, avoid using volumes to store stateful data that is better handled through another native mechanism in Docker. Application secrets, for instance, should be handled by Docker’s own secrets system or a third-party product like HashiCorp’s Vault, and never by way of volumes or writable container image layers. Related content news Go language evolving for future hardware, AI workloads The Go team is working to adapt Go to large multicore systems, the latest hardware instructions, and the needs of developers of large-scale AI systems. By Paul Krill Nov 15, 2024 3 mins Google Go Generative AI Programming Languages analysis And the #1 Python IDE is . . . PyCharm, VS Code, and five other popular Python IDEs duke it out. Which one do you think takes home the prize? By Serdar Yegulalp Nov 15, 2024 2 mins Python Programming Languages Software Development news JDK 24: The new features in Java 24 21 features are proposed for the next version of Java including quantum-resistant cryptographic keys designed to secure Java apps against future quantum computing attacks. By Paul Krill Nov 15, 2024 11 mins Java Programming Languages Software Development news Rust Foundation moves forward on C++ and Rust interoperability Problem statement released to address the challenges to making cross-language development with C++ and Rust more accessible and approachable. By Paul Krill Nov 14, 2024 2 mins C++ Rust Programming Languages Resources Videos