Understanding Volumes in Docker

It’s clear from looking at the questions asked on the docker IRC channel (#docker on Freenode) and stackoverflow that there’s a lot of confusion over how volumes work in Docker. In this post I’ll try to explain how volumes work and present some best practices. Whilst this post is primarily aimed at Docker users with little to no knowledge of volumes, even experienced users are likely to learn something as there are several subtleties regarding volumes that many people aren’t aware of.

In order to understand what a Docker volume is, we first need to be clear about how the filesystem normally works in Docker. Docker images are stored as series of read-only layers. When we start a container, Docker takes the read-only image and adds a read-write layer on top. If the running container modifies an existing file, the file is copied out of the underlying read-only layer and into the top-most read-write layer where the changes are applied. The version in the read-write layer hides the underlying file, but does not destroy it — it still exists in the underlying image. When a Docker container is deleted, relaunching the image will start a fresh container without any of the changes made in the previously running container — those changes are lost. Docker calls this combination of read-only layers with a read-write layer on top a Union File System.

In order to be able to save (persist) data and share data between containers, Docker came up with the concept of volumes. Quite simply, volumes are directories (or files) that are outside of the default Union File System and exist as normal directories and files on the host filesystem.

There are two ways to initialise volumes, with some subtle differences that are important to understand. We can declare a volume at run-time with the -v flag:

This will make the directory /data inside the container live outside the Union File System and directly accessible on the host. Any files that the image held inside the /data directory will be copied into the volume. We can find out where the volume lives on the host by using the docker inspect command on the host (open a new terminal and leave the previous container running if you’re following along):

And you should see something like:

Telling us that Docker has mounted /data inside the container as a directory somewhere under /var/lib/docker. Let’s add a file to the directory from the host:

Then switch back to our container and have a look:

Changes are reflected immediately as the directory on the container is simply a mount of the directory on the host. We can achieve exactly the same effect by using VOLUME instruction in a Dockerfile:

But there’s one more thing the -v argument to docker run can do and that can’t be done from a Dockerfile, and that’s mount a specific directory on the host to the container. For example:

Will mount the directory /home/adrian/data on the host as /data inside the container. Any files already existing in the /home/adrian/data directory will be available inside the container. This is very useful for sharing files between the host and the container, for example mounting source code to be compiled. The host directory for a volume cannot be specified from a Dockerfile, in order to preserve portability (the host directory may not be available on all systems). When this form of the -v argument is used any files in the image under the directory are not copied into the volume.

Sharing Data

To give another container access to a container’s volumes, we can simply give the –volumes-from argument to docker run. For example:

It’s important to note that this works whether container-test is running or not. A volume will never be deleted as long as a container is linked to it.

Data Containers

It’s common practice to use a data-only container for storing persistent databases, configuration files, data files etc. The Docker website has some good documentation on this. For example:

This command will create a postgres image, including the volume defined in the Dockerfile, run the echo command and exit. The echo command is useful in so far as it helps us identify the purpose of the image when looking at docker ps. We can use this volume from other containers with the –volumes-from argument e.g:

There are two important points using running data containers:

  • Don’t leave the data-container running; it’s a pointless waste of resources
  • Don’t use a “minimal image” such as busybox or scratch for the data-container. Just use the database image itself. You already have the image, so it isn’t taking up any additional space and it also allows the volume to be seeded with data from image.

Backups

If you’re using a data-container, it’s pretty trivial to do a backup:

Should create a tarball of everything in the volume (the official postgres Dockerfile defines a volume at /var/lib/postgresql/data).

Permissions and Ownership

Often you will need to set the permissions and ownership on a volume or initialise the volume with some default data or configuration files. The key point to be aware of here is that anything after the VOLUME instruction in a Dockerfile will not be able to make changes to that volume e.g:

Will not work as expected. We want the touch command to run in the image’s filesystem but it is actually running in the volume of a temporary container. The following will work:

Docker is clever enough to copy any files that exist in the image under the volume mount into the volume and set the ownership correctly. This won’t happen if you specify a host directory for the volume (so that host files aren’t accidentally overwritten).

If you can’t set permissions and ownership in a RUN command, you will have to do so using a CMD or ENTRYPOINT script that runs after container creation.

Deleting Volumes

This is a bit more subtle than most people realise. Chances are, if you’ve been using docker rm to delete your containers, you probably have lots of orphan volumes lying about taking up space.

Volumes are only deleted if the container is removed with the docker rm -v command (the -v is essential) or the –rm flag was provided to docker run. Even then, a volume will only be deleted if no other container links to it. Volumes linked to user specified host directories are never deleted by docker.

Unless you’ve been very careful about always running your containers like this, you’re going to have zombie files and directories under /var/lib/docker/vfs/dir and no easy way of telling what they represent. You can use the Docker Volume Manager to help keep track of your volumes and clean up orphan ones.

Further Reading

The following resources explain volumes in more depth and were essential in writing this blog:

Also, it looks like we can expect to get some more tools for dealing with volumes soon:

The following two tabs change content below.

Adrian Mouat

Adrian Mouat is Chief Scientist at Container Solutions and the author of the O'Reilly book "Using Docker". He has been a professional software developer for over 10 years, working on a wide range of projects from small webapps to large data mining platforms.

Latest posts by Adrian Mouat (see all)

32 Comments

  1. Thank you for writing this post. I looked at many posts, but none were as detailed and complete as this. This was extremely helpful.

  2. This post is crystal clear and helpful, thanks a lot.
    The “Backup” chapter saved me a lot of time…

  3. Great piece but I was wondering you could expand on one of the statements you made:

    “When this form of the -v argument is used any files in the image under the directory are not copied into the volume.”

    Docker may have have changed since this was written but when I was using the -v HOST-DIR:CONTAINER-DIR the files still seemed to be copied both to and from the image. Files that were in HOST-DIR still were copied into CONTAINER-DIR and vice-versa.

    Could you please elaborate on what is meant by that sentence?

    • Hi Adam,

      This sentence is referring to files that exist in the image at the location where the volume is later declared. For example, if you build an image from the following Dockerfile:

      FROM debian

      RUN mkdir data && echo “bla” > /data/file
      VOLUME /data

      The file “/data/file” will be copied into the volume, unless the volume is mounted to specified directory at run-time e.g: “-v $(pwd)/mydata:/data”.

      Note that no copying goes on with volumes normally; the same directory is being used on the host and container.

  4. Oh finally there was an article that explains it everything! Thanks a lot. Highly recommended for those who cannot wrap their heads around data volumes and data persistence with docker.

  5. Hi Adrian,

    I have just started using docker and I am still a beginner. I need to create two application each launched in their own container. These two applications will at some point write into a SQLite database. I am not quite sure how to deal with concurrent transactions. Per my understanding, the two containers are essentially two processes trying to write into the same database at the same instant which me cause contention. Any suggestions?

    Thanks,
    -Priyanka

    • Hi Priyanka,

      I’m not sure I understand. There is a lot of work in databases to handle concurrent transactions correctly, and it shouldn’t make any difference whether you connect via a container or normal process. It should all just work…

      Adrian.

    • Hi, Priyanka.

      The way SQLite3 handles concurrency is up to the driver you use to connect, I guess. Besides that, I think that you will quickly run into issues because that database is pretty simplistic so everything (table definitions and data) is stored in one single file, so I’m not sure if that is what you are looking for. For development it might be fine but for production perhaps you need something that can properly take simultaneous/concurrent connections.

  6. Hi, I had a data container that has accidentally been deleted.

    Other containers use it via the –volumes-from flag, but now that isn’t working as the container they reference has been deleted.

    The volume is still there, so is there any way to re run that data container and point it back to the original volume on disk and get things back on track?

    • I guess you can just run the container again with the volume set up manually e.g:

      docker run -v /path/to/volume:/path/in/container my_data_container_image echo “dc”

    • In then end I simply re ran the original container so it created a new volume, and then just copied the files from the old volume to the new one.

    • The answers don’t help me with the question of “how do I not lose my data”. If I’ve inserted 1M rows into Postgres and for some reason the storage container restarts, I don’t want to lose my data!

      It seems to me it’s critical that I be able to start a container and have it reference an existing volume, but in now 2 hours of Googling I can’t find that answer.

    • Paul, things are bit different in Docker nowadays. The Docker volume subcommand has effectively removed the need for data containers, see https://docs.docker.com/engine/reference/commandline/volume_create/ for usage.

      Volumes are just pointers to directories on the host. To this end, to (re)attach a volume to a container you just need to use the -v syntax to attach the directory (or volume name if you’re using the new volumes stuff). If a container restarts, it won’t “lose the data” and it won’t delete it (unless started with --rm).

    • Hi Adrian.

      Excellent article – many thanks!

      This is just what I was looking for – a way to give sensible names to volumes with the new docker volume command:

      root@vagrant-ubuntu-trusty-64:/var/lib/docker# docker version
      Client:
      Version: 1.11.2
      API version: 1.23
      Go version: go1.5.4
      Git commit: b9f10c9
      Built: Wed Jun 1 21:47:50 2016
      OS/Arch: linux/amd64

      Server:
      Version: 1.11.2
      API version: 1.23
      Go version: go1.5.4
      Git commit: b9f10c9
      Built: Wed Jun 1 21:47:50 2016
      OS/Arch: linux/amd64

      root@vagrant-ubuntu-trusty-64:/var/lib/docker# docker volume ls
      DRIVER VOLUME NAME
      local 5cc067532842712dc73ff3c2a47f4524038164b486945a7a8b3c7369baf1b548
      local c396fb2966121a1186618443124f896992a76ea192b0177a37dde990cb14901a
      root@vagrant-ubuntu-trusty-64:/var/lib/docker#

      root@vagrant-ubuntu-trusty-64:/var/lib/docker# docker volume create –name my-new-volume
      my-new-volume
      root@vagrant-ubuntu-trusty-64:/var/lib/docker#

      root@vagrant-ubuntu-trusty-64:/var/lib/docker# docker volume ls
      DRIVER VOLUME NAME
      local 5cc067532842712dc73ff3c2a47f4524038164b486945a7a8b3c7369baf1b548
      local c396fb2966121a1186618443124f896992a76ea192b0177a37dde990cb14901a
      local my-new-volume

      Regards,

      Keith

  7. Hi Adrian,
    Thanks for the post! It finally clicked somewhere in my brain why chown doesn’t work from Dockerfiles if misplaced.

    I am struggling with another issue on OS X however: what if “chown -R user /mounted/folder” is done from the ENTRYPOINT script? If you need an example, have a look at https://github.com/docker-library/ghost/
    No matter what I try, I can’t get overcome the permissions error thrown when the entrypoint script is executing. It copies files to the mounted folder (under /Users), but cannot set owner.

    • I assume that’s to do with how the filesystem is mounted into the boot2docker VM. I’m not sure what the best solution is.

    • Hi Alexander,

      Working with Docker on OSX is quite confusing. You have 3 layers of operating system, each with its own users and userid numbers. You have your OSX system, the virtual machine (usually VirtualBox), and then the OS inside the docker image. It looks like your docker image has a user named “user”, and you are trying to change permissions on the VM-mounted /Users directory, which is owned by your OSX user (e.g. “alex”).

      To make life easier, what I did in my entrypoint script is to change my app user’s userid number to match my OSX userid number:

      echo “Updating app user id to match mine”
      cat /etc/passwd | sed s/9999/1009/g > /tmp/pwupdate
      mv /tmp/pwupdate /etc/passwd
      chmod 444 /etc/passwd

      In the above example above, the user is “app” and its user id number is 9999, and my OSX user has id 1009. You’ll have to look in your image to see what id the user “user” has. Run “id ” to get your uid numbers.

    • … actually I looked at the Ghost image, and they create user “user” in the Dockerfile. You could just edit that and specify the userid number with the ‘useradd’ command.

    • Hi again ;-). What I said above is a bit wrong, at least with the latest version of Docker Toolbox (1.10). The user in your docker container needs to have the uid of the user in the VM. By default that user is “docker” with a uid of 1000.

      So in your case, you can specify the uid of 1000 by adding “–uid 1000” to the “useradd…” line in the Dockerfile, or update the uid in the entrypoint script. A better way to do that than what I said above is to use the usermod command: “usermod -u 1000 user”

      Cheers,
      Brent

    • Hi all.

      I’m using Mac OS X 10.9.4 and running a vagrant virtualbox image of ubuntu 14.01 LTS, which runs the docker daemon. So I work directly in the vagrant VM and everything works OK for me.

      HTH

      Keith

  8. Great post, Adrian! Thank you very much.

    I have some questions about about is this advice:

    * Don’t use a “minimal image” such as busybox or scratch for the data-container. Just use the database image itself. You already have the image, so it isn’t taking up any additional space and it also allows the volume to be seeded with data from image.

    If the version of the database image changes, i.e., goes from postgres:9.4 to postgres:9.5, and I update my database image to the latest version, will the data volume still rely on the older postgres:9.4 image?

    Also, I don’t understand “it also allows the volume to be seeded with data from the image”. Are you here assuming that prior to creating the data-container, we are using the database image for data storage? In my case, because I didn’t yet understand data containers, I mounted a host filesystem directory inside the database image, for data persistence. So in this case, the seeding advantage doesn’t apply — or am I missing something?

    Thanks again!

    • Regarding upgrading DBs, it depends on the update. Note that updating a DB is a fraught process at the best of times. However, all that you will be using from the data container is the data at a certain path, so unless the data format or paths have changed, there shouldn’t be any issues. Otherwise you will have to export and re-import your data. You will have the same issues if you use a minimal container; the data will still have originally been created with a previous version of the DB.

      With regard to seeding data, what I’m referring to is that Docker will copy files into a volume if they existed in the Dockerfile prior to the VOLUME declaration. This means you can do things like copy default conf files or data into the volume. This won’t happen if you explicitly supply a host directory.

  9. Hi Adrian;
    Thank you so much for this blog on Docker~ It really new and help full.
    I would like to know something on docker- Suppose I have one Oracle Database Images which have 50G in size. now on the same host i configured docker and created file system “/var/lib/docker” with 200GB for contain docker image and other pre-requisite for this. Now I have one new container using the latest images and my oracle DB is running very fine there. here as per your blog if I use -v and -rw option . I mean to say will get disk from storage and attach the same on server or can use this with -v option. then the changes which we will do on container ‘ll go to new volume (-v) and if we commit then it will sync with images (Inside /var/lib/docker) or its will discard if we drop container.?

    Please help me to understand .
    Thanks.
    – Raj Gupta

    • Hi Raj,

      I’m sorry, I don’t fully understand the question. When using volumes, the storage will be from wherever the directory is mounted on the host. Changes to volumes are *not* committed to images.

      I hope that helps.

  10. Great article. We’re undergoing corp training and that trainer doesn’t able to explain what Volumes are. So in the class I’ve to google and found this article. it really helped me to understand and explained to all the class members 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *