Lean Go Containers with Multi-Stage Dockerfiles


Alternative Text by

On June 28 Docker 17.06 CE was released, which among other improvements adds support for multi-stage image builds. While traditional docker builds had to use a single container for their work and output, multi-stage builds allow the use of intermediate containers to generate artifacts. Artifacts from intermediate containers are then copied into the final build image, meaning one needn’t ship the intermediate tools in the final image. While the community has found ways to perform multi-stage builds in prior versions of Docker, this is the first time that multi-stage builds can be accomplished in a single Dockerfile. By placing all of the build logic in a single Dockerfile we can use build tools without fear of bloating the output image, and make strong integrations in build pipelines that accept Dockerfiles, even for complex builds.

These improvements have a particularly large impact on Go projects. Go can use static compilation to generate a self-contained executable, so many projects can now easily build containers holding just a single binary without resorting to hacks or breaking your build pipeline.

Let’s take a look at how this new feature impacts the containerization of Go’s introductory hello world app:

Containerizing this simple program with traditional docker techniques results in a staggering 700MB image:

The application works, but why is the image so large? Upon inspecting the image it becomes obvious that most of that 700MB is extraneous data, as the generated binary is a standalone, statically linked executable:

If you don’t see the “not a dynamic executable” message when inspecting your go binaries with ldd, don’t worry, we’ll cover that case later.

While “not a dynamic executable” may look like an error message, this is exactly what we were hoping to see. This message indicates that the binary packages all of its runtime code in a self-contained manner, and is portable. Static binaries are easily moved between systems, and in this case, between containers. Multi-stage Dockerfiles allow us to easily move that static binary into an empty image and execute it from there, leaving behind the 698.5 MB of build time dependencies.

There are hacks and other techniques that have been developed over time to work around this problem of large images, but most result in something that isn’t as easy to grok or maintain, or doesn’t play well with systems that expect a standard, monolithic Dockerfile. Let’s see how the new multi-stage build feature handles this problem:

By using a multi-stage build, the output image size was reduced by a factor of 450:1, but how does the Dockerfile work? The first block, starting with “FROM golang” creates an intermediary image with a nickname of builder, which contains our static binary. Note that I’ve automated the binary inspection using ldd and grep to ensure the compiled image is statically linked, as that is critical to the next step. If ldd does not report a static binary, the build will fail.

The second block, beginning with “FROM scratch” starts a new empty image, into which our build artifact is copied from the builder image. The --from=builder argument on the COPY command instructs docker to take files from our intermediate image.

Here’s a slightly more complex example, which downloads and prints RFC 2795:

Let’s try to containerize this simple HTTPS application and inspect the result:

Note that the “broken pipe” message is expected, and is a result of piping the output to head.

Using this Dockerfile the image builds and the app downloads RFC 2795 as expected, but can we apply the same strategy as before to create a lean container? Let’s inspect the resulting binary and see:

The output from ldd looks quite different this time. Instead of giving us a nice “not a dynamic executable” message we see various .so files required at runtime, which indicates that this binary isn’t self-contained. We can also see that adding the automated ldd test now breaks the build:

This happens because Go uses C libraries by default to perform DNS resolution, and uses dynamic linking and something called cgo to call into those C libraries. The way to fix this is to change the install command, forcing static linking:

That looks better: the application runs and is reported as a static binary. Let’s try creating a lean container again:

It works! Note that to avoid errors we also had to copy in the SSL certs from the build environment, since we are contacting an https:// url. How do the image sizes compare?

The https-get-multi-stage image is barely larger than the dynamically linked binary by itself, and significantly smaller than the full size image produced by the legacy Dockerfile.

When using multi-stage Dockerfiles only the image resulting from the last block of the Dockerfile is tagged when the build is complete. What happens to the intermediary images? They’re still present, taking up precious disk space, and at some point will need to be garbage collected. A few such intermediary images can be seen below, untagged (<none>), but present on disk:

The --force-rm flag, which cleans up intermediary containers throughout the build, seems promising here but it doesn’t resolve the issue as it doesn’t clean up intermediary images. Make sure to understand whether your build process creates untagged images, and have a plan for regularly cleaning them up. The docker image prune command could prove useful for this purpose.

The code used here can be found @ https://github.com/ContainerSolutions/lean-go-containers.

Leave a Reply

Your email address will not be published. Required fields are marked *