Jenkins master and Docker slaves 3

Share it!Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someoneShare on Reddit

Jenkins and Docker

This post is the first out of two about how you can improve your Continuous Integration and Delivery systems with Docker. However, before we begin to dig into details let me quickly introduce to you what the essence of Docker and Continuous Integration is. If you are familiar with both concepts, skip the next two paragraphs.

The Docker is a tool which provides operating system level virtualization. It is a solution where the kernel of an operating system creates multiple isolated user spaces. The abstraction is so strong that such instances (named also containers) can be perceived by end users as real servers. This is a big improvement in comparison to classic virtualization systems where guest host operating system hardware has to be emulated.

The Continuous Integration concept is the idea of maintaining the application all the time in a working state. There is a group of dedicated tools supporting the concept, and Jenkins is one of the most popular ones (and it is also free). Such systems offer various ways of scheduling your application compilation/build and verifying its state. You can read more here about Continuous integration best practices.

Docker tool (and containerization in general) can facilitate your Continuous Integration and Delivery processes significantly. There are two major ways how you can use Docker in this context:

  • Docker containers as a Jenkins slave – if you are not building very simple applications, you probably have several machines attached to your Jenkins (or other CI system) instance and use them to execute many builds simultaneously. Docker helps you to achieve much better results in many dimensions here,
  • Docker images as your application artifact – you can deploy your application to a Docker container and save it. Next, you can populate your next deployment pipeline stages (more about pipeline) with a created image. As a result, you will be sure that each following stage will operate on the same application instance deployed in the same manner.

In this post I will focus on the first use case.

Typical Continuous Integration server consists of a few to hundreds of machines allocated for building and testing software. You can use Docker to provision them. In a typical scenario, each CI job will be configured to be built on a predefined image or on a created one based on a Docker configuration file – dockerfile. After the image would be built or downloaded Jenkins would create a container based on the image (image instance) and run a defined job within it. Upon completion, the container will be destroyed and free hardware resources returned to the pool.

Fortunately, defined jobs do not have to download or build a Docker image each time they are built. Docker uses caching techniques so if there were no changes in the image or dockerfile since previous builds, the already created image would be used. As a result, provisioning is almost instant. There is no overhead, especially in comparison with starting an operating system or providing new virtual machine. It is almost as fast as starting a new process in the system. As a result, there is no point in maintaining constantly running slaves except for some reference deployments.


Picture by geralt, on Creative Commons

A great advantage of such an approach is that each time the CI job is executing it is using fresh workspace. As a result, there will be no leftovers from previous builds (and no need to clean them). There is also lesser chance to slowly exhaust resources like, for example, disk space. Such behaviour improves stability of your jobs and allows you to execute fully repeatable builds.

Another significant improvement is automated management of all of your slaves. If your Jenkins has more than dozen of machine attached, it is a pain to maintain their state manually. You might quickly end up with a situation where each slave has a different setup and you cannot easily switch your jobs between them. Docker allows you to have the exact same state across all machines. While it is still achievable with tools like Puppet or Ansible, I personally find Docker to be much more effective in this case.

If you would like to track what exact changes have been done to Docker images, you can use dockerfiles. They are receipts how to exactly build specific images. They are some kind of living specification of what the image should be created from. As they are text files, you can develop them in the same way as your software: put them into the source code repository, develop changes by pull request with strict review process and even write automated tests using Rspec and ServerSpec. Such change tracking greatly facilitates figuring out why your job failed – you know exactly what has changed in the environment since the previous build.

If you are maintaining dozens of CI slaves by virtualization, Docker will provide you another great benefit: out of the box performance improvement. While virtualization usually has between 2% and 10% overhead on the processor and memory, several case studies over the web revealed that Jenkins jobs using Docker (in comparison to virtualization) are on average 20% faster. Assuming the cost of migration to containerization, gains are rather spectacular.

Almost no one is running CI servers on a bare metal. Virtualization dominates the market (of course for good reasons). However, containerization (and Docker here as an example) pushes infrastructure management to the next level. While it is pretty simple to learn and start with, it provides great benefits in performance, stability, configuration management, standardization, traceability and development process areas. Make a simple test with your CI servers. Spend a few hours on Docker image creation and plug in it to your CI!

Share it!Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someoneShare on Reddit
  • Junyu Wang

    Great article! but I got some questions here, the first one is since you said instead of building jobs on slave machines we build it on containers, but where should those containers be located? Is it on the same machine as jenkins master? Or on couple of slave nodes? If it’s the first option, then if we have 20 containers running at the same time in jenkins master, wouldn’t it be using lots of resource and causes some side effects on the jenkins master? If it’s the second option, then where do we gain the speed boost comparing with having a pre-configured AMI for our slave instances and running jobs directly on it? Isn’t it true that we still need to boot up those slave nodes first and then start building docker images on it? Thanks a lot!

    • Piotr Oktaba

      In the first case (containers located on Jenkins master) you can add containers until you will have overall performance problems. Probably the best is to create one container per processor. It is pretty the same as you would use standard Jenkins mechanism. The only difference will be full executors isolation.

      Regarding second case (contenerization on Jenkins slaves), the classical setup is to install several virtual machines on bare hardware and attach them as slaves. Using contenerization instead of VM gives you a significant performance boost. So the idea is that you get more execution power on the same hardware.

      Cloud solutions (AMI) fully depends on a provider. You usually pay for CPU time, so the only things which matters for you is probably the price and time required to provision/boot new machine. It is a provider problem which solution should be used (VM or containers) to get as much as possible CPU power from their hardware.

  • vishal sahasrabuddhe

    Nice one, Very detail explanation on everything. It covers all aspects of details.

    However i have also come up with an article on How to setup jenkins salve via docker, which is more of a implementation part with steps.

    Appreciate your feedback on same.