This post is the first out of two about how you can improve your Continuous Integration and Delivery systems with Docker. However, before we begin to dig into details let me quickly introduce to you what the essence of Docker and Continuous Integration is. If you are familiar with both concepts, skip the next two paragraphs.
The Docker is a tool which provides operating system level virtualization. It is a solution where the kernel of an operating system creates multiple isolated user spaces. The abstraction is so strong that such instances (named also containers) can be perceived by end users as real servers. This is a big improvement in comparison to classic virtualization systems where guest host operating system hardware has to be emulated.
The Continuous Integration concept is the idea of maintaining the application all the time in a working state. There is a group of dedicated tools supporting the concept, and Jenkins is one of the most popular ones (and it is also free). Such systems offer various ways of scheduling your application compilation/build and verifying its state. You can read more here about Continuous integration best practices.
Docker tool (and containerization in general) can facilitate your Continuous Integration and Delivery processes significantly. There are two major ways how you can use Docker in this context:
- Docker containers as a Jenkins slave – if you are not building very simple applications, you probably have several machines attached to your Jenkins (or other CI system) instance and use them to execute many builds simultaneously. Docker helps you to achieve much better results in many dimensions here,
- Docker images as your application artifact – you can deploy your application to a Docker container and save it. Next, you can populate your next deployment pipeline stages (more about pipeline) with a created image. As a result, you will be sure that each following stage will operate on the same application instance deployed in the same manner.
In this post I will focus on the first use case.
Typical Continuous Integration server consists of a few to hundreds of machines allocated for building and testing software. You can use Docker to provision them. In a typical scenario, each CI job will be configured to be built on a predefined image or on a created one based on a Docker configuration file – dockerfile. After the image would be built or downloaded Jenkins would create a container based on the image (image instance) and run a defined job within it. Upon completion, the container will be destroyed and free hardware resources returned to the pool.
Fortunately, defined jobs do not have to download or build a Docker image each time they are built. Docker uses caching techniques so if there were no changes in the image or dockerfile since previous builds, the already created image would be used. As a result, provisioning is almost instant. There is no overhead, especially in comparison with starting an operating system or providing new virtual machine. It is almost as fast as starting a new process in the system. As a result, there is no point in maintaining constantly running slaves except for some reference deployments.
A great advantage of such an approach is that each time the CI job is executing it is using fresh workspace. As a result, there will be no leftovers from previous builds (and no need to clean them). There is also lesser chance to slowly exhaust resources like, for example, disk space. Such behaviour improves stability of your jobs and allows you to execute fully repeatable builds.
Another significant improvement is automated management of all of your slaves. If your Jenkins has more than dozen of machine attached, it is a pain to maintain their state manually. You might quickly end up with a situation where each slave has a different setup and you cannot easily switch your jobs between them. Docker allows you to have the exact same state across all machines. While it is still achievable with tools like Puppet or Ansible, I personally find Docker to be much more effective in this case.
If you would like to track what exact changes have been done to Docker images, you can use dockerfiles. They are receipts how to exactly build specific images. They are some kind of living specification of what the image should be created from. As they are text files, you can develop them in the same way as your software: put them into the source code repository, develop changes by pull request with strict review process and even write automated tests using Rspec and ServerSpec. Such change tracking greatly facilitates figuring out why your job failed – you know exactly what has changed in the environment since the previous build.
If you are maintaining dozens of CI slaves by virtualization, Docker will provide you another great benefit: out of the box performance improvement. While virtualization usually has between 2% and 10% overhead on the processor and memory, several case studies over the web revealed that Jenkins jobs using Docker (in comparison to virtualization) are on average 20% faster. Assuming the cost of migration to containerization, gains are rather spectacular.
Almost no one is running CI servers on a bare metal. Virtualization dominates the market (of course for good reasons). However, containerization (and Docker here as an example) pushes infrastructure management to the next level. While it is pretty simple to learn and start with, it provides great benefits in performance, stability, configuration management, standardization, traceability and development process areas. Make a simple test with your CI servers. Spend a few hours on Docker image creation and plug in it to your CI!