Understanding Volumes in Docker

This post was updated on 6 Jan 2017 to cover new versions of Docker.

It’s clear from looking at the questions asked on the Docker IRC channel (#docker on Freenode), Slack and Stackoverflow that there’s a lot of confusion over how volumes work in Docker. In this post, I’ll try to explain how volumes work and present some best practices. Whilst this post is primarily aimed at Docker users with little to no knowledge of volumes, even experienced users are likely to learn something as there are some subtleties that many people aren’t aware of.

In order to understand what a Docker volume is, we first need to be clear about how the filesystem normally works in Docker. Docker images are stored as series of read-only layers. When we start a container, Docker takes the read-only image and adds a read-write layer on top. If the running container modifies an existing file, the file is copied out of the underlying read-only layer and into the top-most read-write layer where the changes are applied. The version in the read-write layer hides the underlying file, but does not destroy it — it still exists in the underlying layer. When a Docker container is deleted, relaunching the image will start a fresh container without any of the changes made in the previously running container — those changes are lost. Docker calls this combination of read-only layers with a read-write layer on top a Union File System.

In order to be able to save (persist) data and also to share data between containers, Docker came up with the concept of volumes. Quite simply, volumes are directories (or files) that are outside of the default Union File System and exist as normal directories and files on the host filesystem.

There are several ways to initialise volumes, with some subtle differences that are important to understand. The most direct way is declare a volume at run-time with the -v flag:

This will make the directory /data inside the container live outside the Union File System and directly accessible on the host. Any files that the image held inside the /data directory will be copied into the volume. We can find out where the volume lives on the host by using the docker inspect command on the host (open a new terminal and leave the previous container running if you’re following along):

And you should see something like:

I’ve used a Go template to pull out the data we’re interested in and piped the output in jq for pretty printing. Now that we know the name of the volume, we can also get similar information from docker volume inspect command:

In both cases, the output tells us that Docker has mounted /data inside the container as a directory somewhere under /var/lib/docker. Let’s add a file to the directory from the host:

Then switch back to our container and have a look:

Changes are reflected immediately as the directory on the container is simply a mount of the directory on the host.

The exact same effect can be achieved by using a VOLUME instruction in a Dockerfile:

We can also create volumes using the docker volume create command:

Which we can then attach to a container at run-time e.g:

This example will mount the my-vol volume at /data inside the container.

There is another major use case for volumes that can only be accomplished through the -v flag — mounting a specific directory from the host into a container. For example:

Will mount the directory /home/adrian/data on the host as /data inside the container. Any files already existing in the /home/adrian/data directory will be available inside the container. This is very useful for sharing files between the host and the container, for example mounting source code to be compiled. The host directory for a volume cannot be specified from a Dockerfile, in order to preserve portability (the host directory may not be available on all systems). When this form of the -v argument is used any files in the image under the directory are not copied into the volume. Such volumes are not “managed” by Docker as per the previous examples — they will not appear in the output of docker volume ls and will never be deleted by the Docker daemon.

Sharing Data

To give another container access to a container’s volumes, we can provide the –volumes-from argument to docker run. For example:

This works whether container-test is running or not. A volume will never be deleted as long as a container is linked to it. We could also have mounted the volume by giving its name to the -v flag i.e:

Note that using this syntax allows us to mount the volume to a different directory inside the container.

Data Containers

Prior to the introduction of the docker volume commands, it was common to use “data containers” for storing persistent and shared data such as databases or configuration data. This approach meant that the container essentially became a “namespace” for the data – a handle for managing it and sharing with other containers. However, in modern versions of Docker, this approach should be never be used – simply create named volumes using docker volume create –name instead.

Permissions and Ownership

Often you will need to set the permissions and ownership on a volume, or initialise the volume with some default data or configuration files. A key point to be aware of here is that anything after the VOLUME instruction in a Dockerfile will not be able to make changes to that volume e.g:

Will not work as expected. We want the touch command to run in the image’s filesystem but it is actually running in the volume of a temporary container. The following will work:

Docker is clever enough to copy any files that exist in the image under the volume mount into the volume and set the ownership correctly. This won’t happen if you specify a host directory for the volume (so that host files aren’t accidentally overwritten).

If you can’t set permissions and ownership in a RUN command, you will have to do so using a CMD or ENTRYPOINT script that runs when the container is started.

Deleting Volumes

Chances are, if you’ve been using docker rm to delete your containers, you probably have lots of orphan volumes lying about taking up space.

Volumes are only automatically deleted if the parent container is removed with the docker rm -v command (the -v is essential) or the –rm flag was provided to docker run. Even then, a volume will only be deleted if no other container links to it. Volumes linked to user specified host directories are never deleted by docker.

To have a look at the volumes in your system use docker volume ls:

To delete all volumes not in use, try:

Conclusion

There is quite a lot to volumes, but once you’ve understand the underlying philosophy it should be fairly intuitive. Using volumes effectively is essential to an efficient Docker workflow, so it’s worth playing around with the commands until you understand how they work.

The following two tabs change content below.

Adrian Mouat

Adrian Mouat is Chief Scientist at Container Solutions and the author of the O'Reilly book "Using Docker". He has been a professional software developer for over 10 years, working on a wide range of projects from small webapps to large data mining platforms.

Latest posts by Adrian Mouat (see all)

46 Comments

  1. Thank you for writing this post. I looked at many posts, but none were as detailed and complete as this. This was extremely helpful.

  2. This post is crystal clear and helpful, thanks a lot.
    The “Backup” chapter saved me a lot of time…

  3. Great piece but I was wondering you could expand on one of the statements you made:

    “When this form of the -v argument is used any files in the image under the directory are not copied into the volume.”

    Docker may have have changed since this was written but when I was using the -v HOST-DIR:CONTAINER-DIR the files still seemed to be copied both to and from the image. Files that were in HOST-DIR still were copied into CONTAINER-DIR and vice-versa.

    Could you please elaborate on what is meant by that sentence?

    • Hi Adam,

      This sentence is referring to files that exist in the image at the location where the volume is later declared. For example, if you build an image from the following Dockerfile:

      FROM debian

      RUN mkdir data && echo “bla” > /data/file
      VOLUME /data

      The file “/data/file” will be copied into the volume, unless the volume is mounted to specified directory at run-time e.g: “-v $(pwd)/mydata:/data”.

      Note that no copying goes on with volumes normally; the same directory is being used on the host and container.

  4. Oh finally there was an article that explains it everything! Thanks a lot. Highly recommended for those who cannot wrap their heads around data volumes and data persistence with docker.

  5. Hi Adrian,

    I have just started using docker and I am still a beginner. I need to create two application each launched in their own container. These two applications will at some point write into a SQLite database. I am not quite sure how to deal with concurrent transactions. Per my understanding, the two containers are essentially two processes trying to write into the same database at the same instant which me cause contention. Any suggestions?

    Thanks,
    -Priyanka

    • Hi Priyanka,

      I’m not sure I understand. There is a lot of work in databases to handle concurrent transactions correctly, and it shouldn’t make any difference whether you connect via a container or normal process. It should all just work…

      Adrian.

    • Hi, Priyanka.

      The way SQLite3 handles concurrency is up to the driver you use to connect, I guess. Besides that, I think that you will quickly run into issues because that database is pretty simplistic so everything (table definitions and data) is stored in one single file, so I’m not sure if that is what you are looking for. For development it might be fine but for production perhaps you need something that can properly take simultaneous/concurrent connections.

  6. Hi, I had a data container that has accidentally been deleted.

    Other containers use it via the –volumes-from flag, but now that isn’t working as the container they reference has been deleted.

    The volume is still there, so is there any way to re run that data container and point it back to the original volume on disk and get things back on track?

    • I guess you can just run the container again with the volume set up manually e.g:

      docker run -v /path/to/volume:/path/in/container my_data_container_image echo “dc”

    • In then end I simply re ran the original container so it created a new volume, and then just copied the files from the old volume to the new one.

    • The answers don’t help me with the question of “how do I not lose my data”. If I’ve inserted 1M rows into Postgres and for some reason the storage container restarts, I don’t want to lose my data!

      It seems to me it’s critical that I be able to start a container and have it reference an existing volume, but in now 2 hours of Googling I can’t find that answer.

    • Paul, things are bit different in Docker nowadays. The Docker volume subcommand has effectively removed the need for data containers, see https://docs.docker.com/engine/reference/commandline/volume_create/ for usage.

      Volumes are just pointers to directories on the host. To this end, to (re)attach a volume to a container you just need to use the -v syntax to attach the directory (or volume name if you’re using the new volumes stuff). If a container restarts, it won’t “lose the data” and it won’t delete it (unless started with --rm).

    • Hi Adrian.

      Excellent article – many thanks!

      This is just what I was looking for – a way to give sensible names to volumes with the new docker volume command:

      root@vagrant-ubuntu-trusty-64:/var/lib/docker# docker version
      Client:
      Version: 1.11.2
      API version: 1.23
      Go version: go1.5.4
      Git commit: b9f10c9
      Built: Wed Jun 1 21:47:50 2016
      OS/Arch: linux/amd64

      Server:
      Version: 1.11.2
      API version: 1.23
      Go version: go1.5.4
      Git commit: b9f10c9
      Built: Wed Jun 1 21:47:50 2016
      OS/Arch: linux/amd64

      root@vagrant-ubuntu-trusty-64:/var/lib/docker# docker volume ls
      DRIVER VOLUME NAME
      local 5cc067532842712dc73ff3c2a47f4524038164b486945a7a8b3c7369baf1b548
      local c396fb2966121a1186618443124f896992a76ea192b0177a37dde990cb14901a
      root@vagrant-ubuntu-trusty-64:/var/lib/docker#

      root@vagrant-ubuntu-trusty-64:/var/lib/docker# docker volume create –name my-new-volume
      my-new-volume
      root@vagrant-ubuntu-trusty-64:/var/lib/docker#

      root@vagrant-ubuntu-trusty-64:/var/lib/docker# docker volume ls
      DRIVER VOLUME NAME
      local 5cc067532842712dc73ff3c2a47f4524038164b486945a7a8b3c7369baf1b548
      local c396fb2966121a1186618443124f896992a76ea192b0177a37dde990cb14901a
      local my-new-volume

      Regards,

      Keith

  7. Hi Adrian,
    Thanks for the post! It finally clicked somewhere in my brain why chown doesn’t work from Dockerfiles if misplaced.

    I am struggling with another issue on OS X however: what if “chown -R user /mounted/folder” is done from the ENTRYPOINT script? If you need an example, have a look at https://github.com/docker-library/ghost/
    No matter what I try, I can’t get overcome the permissions error thrown when the entrypoint script is executing. It copies files to the mounted folder (under /Users), but cannot set owner.

    • I assume that’s to do with how the filesystem is mounted into the boot2docker VM. I’m not sure what the best solution is.

    • Hi Alexander,

      Working with Docker on OSX is quite confusing. You have 3 layers of operating system, each with its own users and userid numbers. You have your OSX system, the virtual machine (usually VirtualBox), and then the OS inside the docker image. It looks like your docker image has a user named “user”, and you are trying to change permissions on the VM-mounted /Users directory, which is owned by your OSX user (e.g. “alex”).

      To make life easier, what I did in my entrypoint script is to change my app user’s userid number to match my OSX userid number:

      echo “Updating app user id to match mine”
      cat /etc/passwd | sed s/9999/1009/g > /tmp/pwupdate
      mv /tmp/pwupdate /etc/passwd
      chmod 444 /etc/passwd

      In the above example above, the user is “app” and its user id number is 9999, and my OSX user has id 1009. You’ll have to look in your image to see what id the user “user” has. Run “id ” to get your uid numbers.

    • … actually I looked at the Ghost image, and they create user “user” in the Dockerfile. You could just edit that and specify the userid number with the ‘useradd’ command.

    • Hi again ;-). What I said above is a bit wrong, at least with the latest version of Docker Toolbox (1.10). The user in your docker container needs to have the uid of the user in the VM. By default that user is “docker” with a uid of 1000.

      So in your case, you can specify the uid of 1000 by adding “–uid 1000” to the “useradd…” line in the Dockerfile, or update the uid in the entrypoint script. A better way to do that than what I said above is to use the usermod command: “usermod -u 1000 user”

      Cheers,
      Brent

    • Hi all.

      I’m using Mac OS X 10.9.4 and running a vagrant virtualbox image of ubuntu 14.01 LTS, which runs the docker daemon. So I work directly in the vagrant VM and everything works OK for me.

      HTH

      Keith

  8. Great post, Adrian! Thank you very much.

    I have some questions about about is this advice:

    * Don’t use a “minimal image” such as busybox or scratch for the data-container. Just use the database image itself. You already have the image, so it isn’t taking up any additional space and it also allows the volume to be seeded with data from image.

    If the version of the database image changes, i.e., goes from postgres:9.4 to postgres:9.5, and I update my database image to the latest version, will the data volume still rely on the older postgres:9.4 image?

    Also, I don’t understand “it also allows the volume to be seeded with data from the image”. Are you here assuming that prior to creating the data-container, we are using the database image for data storage? In my case, because I didn’t yet understand data containers, I mounted a host filesystem directory inside the database image, for data persistence. So in this case, the seeding advantage doesn’t apply — or am I missing something?

    Thanks again!

    • Regarding upgrading DBs, it depends on the update. Note that updating a DB is a fraught process at the best of times. However, all that you will be using from the data container is the data at a certain path, so unless the data format or paths have changed, there shouldn’t be any issues. Otherwise you will have to export and re-import your data. You will have the same issues if you use a minimal container; the data will still have originally been created with a previous version of the DB.

      With regard to seeding data, what I’m referring to is that Docker will copy files into a volume if they existed in the Dockerfile prior to the VOLUME declaration. This means you can do things like copy default conf files or data into the volume. This won’t happen if you explicitly supply a host directory.

  9. Hi Adrian;
    Thank you so much for this blog on Docker~ It really new and help full.
    I would like to know something on docker- Suppose I have one Oracle Database Images which have 50G in size. now on the same host i configured docker and created file system “/var/lib/docker” with 200GB for contain docker image and other pre-requisite for this. Now I have one new container using the latest images and my oracle DB is running very fine there. here as per your blog if I use -v and -rw option . I mean to say will get disk from storage and attach the same on server or can use this with -v option. then the changes which we will do on container ‘ll go to new volume (-v) and if we commit then it will sync with images (Inside /var/lib/docker) or its will discard if we drop container.?

    Please help me to understand .
    Thanks.
    – Raj Gupta

    • Hi Raj,

      I’m sorry, I don’t fully understand the question. When using volumes, the storage will be from wherever the directory is mounted on the host. Changes to volumes are *not* committed to images.

      I hope that helps.

  10. Great article. We’re undergoing corp training and that trainer doesn’t able to explain what Volumes are. So in the class I’ve to google and found this article. it really helped me to understand and explained to all the class members 🙂

  11. Hi Adrian,

    Thanks for the helpful article. Regarding the function of the “–volumes-from” flag in a command such as “docker run -it — name container2 –volumes-from container1 ubuntu /bin/bash” , is there a way to specify which specific volume(s) from container1 you wish to mount in container2 if there are multiple volumes yet you do not wish to mount all of them? Furthermore, can you specify different mount points for the volumes in container2 compared to where they were mounted in container1?

  12. Hi Adrian,
    Thanks for the helpful article. It helped clear a lot of questions. I have one question though. From the post (and comments) I could understand that, irrespective of the command used to create the volume, any changes to a local directory reflect immediately on the container directory. But what about the changes to container directory ? Are they reflected immediately in the local directory or when the container shuts down ?

    Thanks,
    Puneet

    • I feel I have failed slightly in the article if you have this question. Yes, changes made in a container to a volume are reflected immediately. When using a volume the files really only exist in place – the container does not have its own separate copy.

  13. Hi,

    Great article. Helped me get up to speed quickly. One thing you don’t really cover is what to do about permissions for the mount an existing directory (or newly created volume) from the host case. e.g. docker run -v /home/adrian/data:/data debian ls /data. In the majority of Docker usage it seems that people just use the root user inside the container and thus the permissions on the volume being mounted into the container will (if also running as root) reflect this. That makes sense.

    In my case, the docker image has another user setup which it uses to run a service. The Dockerfile has, among other things:


    USER myuser
    VOLUME ["/MYSTUFF"]

    It seems that in this case, when the MYSTUFF volume is created, it is owned by myuser. However, if I then need to add another volume (not specified in the Dockerfile) or replace this mount with one which mounts from a specific location in the host, then I am out of luck as it will be owned by root and thus not accessible to myuser inside the container at runtime. I can’t figure out what the best solution to this is as effectively it requires that the host has implicit knowledge about the container. i.e. That I should chown the directory that I am sharing (if created with docker volume then it’ll be a _data directory) to whatever uid myuser has inside the container. I also considered that perhaps the responsibility for this should lie within the container, but if that is executing as myuser, then it wouldn’t have permission to change the permissions on that directory, so that wont work either.

    In this specific case, there is only one user in the container, so I can assume that the uid for myuser is 1000, but I can imagine scenarios where this might not be the case. I would also have expected docker volume create to have taken an optional uid to match up with what is happening in the Dockerfile created volumes.

    A final extra piece of information is that I am making use of ECS on AWS, which will manage the volumes for me, however, this then gives me no way to do a chown on the volumes before they get mounted. The only way I can think of to do this is to have an additional container as part of my stack which runs as root and makes this change to the permissions while my main application blocks (in my run script) until the permissions are correct.

    I feel like I’m between a rock and a hard place on this. Any idea what the best practice solution is?

    Thanks!

    • I think this is discussed in some of the other comments.

      The most common solution, and the one used by the official images (e.g redis), is to start the container as root and in the entrypoint script perform a chown and then use sudo (or gosu) to change to a less privileged user before starting the application. You can see an example of this in the redis entrypoint script: https://github.com/docker-library/redis/blob/master/3.2/docker-entrypoint.sh

      Another solution is to set the privileges on the volume so that they can be openly read/written.

  14. tried your sample, and verified that docker is clever enough to copy the files to the volume you gave it
    but only to those *normally* case

    is it possible to let docker to copy files to the volume when you were using –volumes-from ?

  15. Hi Adrian,

    Nice article indeed. By the way, why don’t you talk about docker-compose.yml file with volumes in it ? This approach offers other capabilities of declarations, right ?
    I would be interested to here from you what you think about this yml file.

    Thanks a lot.

    • Compose really just uses the underlying features mentioned above. It does do some work to try to retain volumes between runs however. I don’t want to extend this article for fear of causing confusion. First understand volumes, then understand how Compose uses them.

Leave a Reply

Your email address will not be published. Required fields are marked *