Tagging Docker images the right way

 

by

In our consultancy work, we often see companies tagging production images in an ad-hoc manner. Taking a look at their registry, we find a list of images like:

and so on.

There is nothing wrong with using semantic versioning for your software, but using it as the only strategy for tagging your images often results in a manual, error prone process (how do you teach your CI/CD pipeline when to upgrade your versions?)

I’m going to explain you an easy, yet robust, method for tagging your images. Spoiler alerts: use the commit hash as the image tag.

Suppose the HEAD of our Git repository has the hash ff613f07328fa6cb7b87ddf9bf575fa01b0d8e43. We can manually build an image with this hash like so:

If we know that this is the latest image tag (but beware of the confusion caused by latest!), we can also immediately tag it as such:

Once all that’s done you can push your image to the registry:

That works, but we don’t really want to manually cut and paste the git hash each time, so let’s see how we can fix that next.

Git Magic

To get the latest commit of your repository:

That’s great information but quite verbose. Trim it down to just get the hash from the last commit:

You can even get a shorter version of the hash but I prefer the other one:

Automation

I’m a big fan of Makefiles. I’ve been using them for all of my Go projects but also for anything that has Docker related tasks, like building containers, for example 🙂

A normal Makefile for building projects with Docker would look more or less like this:

Now it’s just a matter of make build push for you to generate a new image with automated tagging.

If you have a CICD pipeline in place for your project, it gets even easier. The only gotcha is that you need to be logged in with your registry before attempting to push the image there. That’s what the login task is for. Invoke it before invoking the push task and you’re good to go. Don’t forget to add the DOCKER_USER and the DOCKER_PASS environment variables to your pipeline, otherwise the login task will not work.

Also do not call all the tasks in the same line. Best if you break it down into different steps.

Coming back to Semantic Versioning

As I said earlier in this post, there is nothing wrong with semversioning your software. If you offer an image that is consumed by many users you probably want to start tagging your stable releases that match with major milestones in your project’s road map.
These are points in time that you want to look back easily, then having a something like v1.2.3 is something that not only you can benefit from but also your users. But then again, this a manual process that can’t (?) be automated so it needs to be used in conjuction with something like what I have proposed in here.

Conclusions

  • Whenever possible you should automate the generation of your container images. If you can think of a scenario where this is not a good idea, please leave a comment below.
  • Automatically tagging your container images should not be difficult. Think of a streamlined process to achieve this. So far we have used this approach with our customers and they are all quite happy with it. Let me know in the comments if you have a different approach set in place that is also working for you
  • One of the main benefits from tagging your container images with the corresponding commit hash is that it’s easy to trace back who a specific point in time, know how the application looked and behaved like for that specifc point in history and most importantly, blame the people that broke the application 😉
  • There is nothing wrong with semantic versioning. Use that for your image tags together with this other dynamic way of tagging

If you’d like to learn more about cloud native, grab a copy of the new book.

New Call-to-action
 

31 Comments

  1. I think this is okay if we don’t expose to public.
    But what if we expose our image to public where they will see as
    acmecorp/foo:ff613f07328fa6cb7b87ddf9bf575fa01b0d8e43

    Which I think not comfortable.
    Your views?

    • Hello, Bathula

      You can think of this strategy as a “nightly build” release on steroids but I would like to know what specifically makes you feel uncomfortable about this approach.

      Also, as I mentioned in the conclusions, you don’t have to drop the semantic versioning of your releases at all. You can use this in conjunction with semver which points to major releases that go together with major announcements through your common communication channels.

      Regards,

      Carlos

    • Hi Carlos,

      Thanks for this very nice and inspiring post.

      In addition to your unix-ish code, here’s a fragment that I use in a Windows batch file and which may be of use for other die-hard Windows users:

      Best regards from a Swiss glider pilot,

      Remi.

    • Awesome.
      Is your CICD pipeline running on a Windows machine or is this for your local development machine?

      All the best (+10m/s!),

      Carlos

    • And that led me to this neat little trick with Docker Compose.

      In the ‘docker-compose.yml’, “version” needs to be at least 2.1

      Set:
      image: IMAGENAME:${TAG?Use- TAG=git rev-parse --short HEAD docker-compose …}
      in your service definition, and then run docker-compose as the message says to use the git commit as your tag automatically. (in Linux, probably Mac, not Windows)

  2. Carlos,

    Looks really good with the syntax highlighting!

    I have prepared a github repo with a simple Java sample app that illustrates tagging the docker images the way you suggested: https://github.com/Remigius2011/webapp-hello-java (more git/docker sorcery to follow in the same theater – I am about to finish and publish a repo with an iac working base using terraform/ansible/lxd/docker to spin up some containers, then there’s a pure docker based cd to follow this fall).

    Where are you soaring? In Colombia? Unfortunately, the season is over here, so +10m/s only next year.

    • Nice that you created that repo.

      A couple of things that I noticed:
      1. Your Dockerfile expects a .war file to be at the same level so that when you create the container you can copy the WAR file into it. The best practice is to generate the WAR file from within the container itself. In this way you create a uniform and consistent development AND production environment (people don’t have to install Java nor any other dependencies in their machines but instead everything is self contained – pun intended).
      2. You placed your Dockerfile inside a docker/ folder. The most common and best practice is to place the Dockerfile on the root of your project. It’s just practical and it doesn’t hurt to leave it there. Think of Ruby + Gemfile, NodeJS + package.json and such combos.

      Soaring mostly in Àger, Spain, but have explored the French Alps (Annecy, St. Hilaire, Chamoux, Sapenay) a bit. You?
      Also, yes, very sad that the season is over but hopefully will be soaring in the dunes in winter.

  3. Hi Carlos,

    Thanks for the hints.

    @1: I think this is a matter of taste – in Java you often have a multi-level project built using mvn/gradle. Building a WAR file inside a container mandates a whole build infrastructure that is too heavy-weight for a produiction container. Then there are multi-stage dockerfiles. IMHO they are good as long as you don’t use persistent volumes during the build (e.g. to host a mvn local repository between build runs in order to avoid to download all necessary dependencies on each run). In this case it might be better to have a dedicated build container that produces all the artifacts.
    @2: In a multi-level project you typically don’t want to pollute your project root and also you don’t want to expose *everything* to the Dockerfile (similar considerations apply to other artifacts that are first accrued in a separate directory, like e.g. the war file itself or jar files therein).

    My home base is LSZI, but I have also been in the southern Alps (Sisteron, Vinon). Nowadays I spend typically some time in summer in LFGO, a small airfield near Paris with very firendly French collegues.

    • 1. I agree. The multistaged Dockerfiles are so far only good for frameworks like the golang one. But things like Java, Ruby and NodeJS (just to name a few) is still not very usable. Then having two Dockerfiles, one for dev and one for build is still the best approach.

      2. You can always decide what you put in your container and what not by both using the COPY/ADD commands and the .dockerignore file 😉
      But, agreed, it depends on the technology being containerised.

  4. Nice automation and you make a valid point against manual versioning. However the fact that it is made by hand by humans is especially useful if you want humans to be able to reference versions between them easily. v1.2.3 is human readable, memorisable and usable, while a hash is really only useful for automated/machine processing.

    So IMHO it’s really a matter of target and usage case.

    • I agree with you, Samuel. I think, though, that I might have been misunderstood. I’m not against manual (sem)versioning. I just think that in a scenario where you’re releasing (ideally) several times a day, manual versioning can rather get on the way.
      However, I would like to say that if you have a manual release process in place, you should accommodate generating a version of your container image that matches that version. Could be triggered manually through the means of the click of a button on your CICD tool, special commit message or whatever else clever idea you might come up with 🙂

    • Carlos suggested method actually directly mirrors what GitHub has always done. Commits are hashes and Releases (and branches) are manually tagged. So, this seems not only reasonable, but totally logical.

  5. Why not tag off git tags? You can have Jenkins automatically increase your git-tag (even control it with minor/major) by reading you last tag number and increasing it. Then use the same system you mentioned, but use git tags instead of git commit hash.

    • I like this idea, Barak. The problem with this approach is that you would have to build-in some logic to see if a tag hasn’t yet been released.
      Take for example tag v0.1.0. If I push three new commits, the tag won’t change, so I would be building the container image with tag v0.1.0 three times. Three different outcomes but named the same. It’s inconsistent.
      One could get around this by building some sort of logic inside the pipeline to check if a tag hasn’t yet been released and proceed accordingly, for sure. Up to you. I’d rather keep it simple.

    • This is correct, Eugene. The Makefile featured this but the first code snippet didn’t. I updated accordingly.

      Thanks for your input!

    • Using docker labels.

      image_name=repo/name
      git_hash=
      git rev-parse --short HEAD
      git_described=git describe --tags | sed 's/[\-\s]//'

      docker build --label "com.repo.git.hash=${git_hash}" -t app .
      docker tag app ${image_name}:${git_described}
      docker tag app ${image_name}:latest

      docker inspect app --format "{{ .Config.Labels 'com.repo.git.hash' }}"

    • Not bad! The use of labels is not something that I have seen often but I can see how this is a good idea.

  6. I have a git repository with multiple docker containers. Each docker container has it’s own folder inside the git repository.
    I guess you would probably want a more complex build strategy, like having version files in every folder and using this version to tag the docker container. So I can check which docker container is already build and pushed to the registry and then just build the new ones.
    What do you think?

    • For repositories that feature multiple containers (a common example is a Ruby+Nginx mix), the strategy stays the same. There is a level of coupling between them two so you really want the version to stay the same.
      It could be, though, that for your use case you do need to add version files to your repo but that solution isn’t very elegant and you need to hack even further. YMMV.
      I guess the outstanding question is: how do you handle versioning in your project today?

    • How do you test all images together? I’m in the same situation as you: a Git repo, with many images. I wrote a Bash script that starts all of them and tests them together, before pushing to Docker Hub.

      https://github.com/debiki/talkyard/blob/master/s/build-and-release.sh

      I avoid having a version file for each image. I would just forget mess up all version numbers with each other. Instead, whenever I release a new version of the app, I bump the versions of all images. They all have the exact same version. And I bump an image also if it didn’t change. For example, the App image changes all the time, but the Search image, with ElasticSearch, changes maybe only a few times a year.

  7. I have a continuous build server that, for each commit pushed to any branch in my Git repository, will run my unit test suite. The environment for building and running the unit tests is contained in a docker image, which I use multi-stage builds to construct. The Dockerfile builds the image, which is based off ubuntu, to have required packages, builds tools (cmake, ninja), and manually builds and installs libraries (like boost, openssl).

    The Dockerfile is stored in the main repository with the test code. The continuous build server works as follows:

    1. Detect commit in Git repo & check out that SHA1
    2. Run docker build on the Dockerfile. If the Dockerfile hasn’t changed, it uses the previous images in the cache.
    3. Run docker run to execute a shell script inside of the docker container to execute the build. A volume is mounted to get access to the source code on the host machine.

    Right now, on step #2, I do not version my images. Basically, I only have latest. I want to version my build environment every time my Dockerfile changes because this will allow convenient reusability of previous images when I need to go back to earlier tags to hotfix builds.

    The concern I have with the solutions proposed here is that your makefile creates a new image, with a new tag, unconditionally using each SHA1 being built. This appears to happen regardless of if the Dockerfile itself has changed. Ideally, I only want to rebuild my docker image when the build environment changes in a backward-breaking way. Or at least, if it changes at all. Isn’t it wasteful to build so many images? For example, if people commit 100 times a day, that’s 100 images (or rather, tags of an image) a day, and 100 pushes right? How slow and wasteful is this process? For example, if my image is 1GB in size, is that 100GB of wasted space in my docker registry? I’m new to Docker and learning a lot as I go, but I am not familiar with the underlying mechanics of Docker, nor am I familiar with any space-saving features it has for docker images.

    What would be ideal is to tag based on my software version. The NEXT version of my software is always in a version.txt file. So when I tag a release, the next commit I modify version.txt to be the version of the next release. Example:

    1. New repo
    2. Create and set version.txt to 1.0
    3. Tag 1.0 release
    4. Change version.txt to 2.0, commit

    So using this process, assume I have the following tags:

    1.0, 2.0, 3.0, 4.0, 5.0

    And let’s say that my Dockerfile only changed in 1.0 and 3.0. I should only have two images:

    company/unit-tests:1.0
    company/unit-tests:3.0

    When a create a hotfix branch based on 4.0, my build script should know to obtain and run the docker image tagged with a version number that is the closest to the current version of HEAD (4.0 in this case) but not greater than. In other words: PV CV, where PV is “proposed version” and CV is “current version”. Each version we iterate in the list of available image tags is inserted as PV to validate its candidacy for the build. Not sure if such logic is feasible, and it certainly isn’t trivial, but I think it’s the most flexible, while still being conservative.

    Would love your thoughts on solutions specific to my scenario.

  8. Take a look at dependabot. It supports docker images and will automatically bump the image tage and open and PR so tests run on the new image…

Leave a Reply

Your email address will not be published. Required fields are marked *