Accelerating continuous integration with Docker

Key Takeaways

Docker containers can accelerate continuous integration by efficiently utilizing resources and reducing wasted time compared to traditional CI setups.
Docker’s lightweight virtualization, with features like smaller footprint and faster setup/teardown, makes it a preferable choice over VMs for Linux-based environments like Volt Active Data.
Implementing a Docker workflow involves creating specialized Docker images, using shell commands to manage container lifecycle, and integrating with CI systems like Jenkins.
Key considerations when using Docker in CI include managing volume mounting to avoid write conflicts, using ‘docker copy’ to collect artifacts, and cleaning up containers after tests to prevent storage pressure.
Integrating Docker with Jenkins and optimizing unit test partitioning can significantly improve CI job completion time and throughput, as demonstrated by Volt Active Data’s experience.

Continuous integration (CI) has become a fundamental process in many software-based companies in recent years. Fortunately, frameworks such as Jenkins do most of the heavy lifting and make it flexible for the engineering team to examine and improve the process. With the increasing popularity of lightweight virtualization, CI has begun to embrace technologies like Docker. Here, I’d like to share my experience of accelerating continuous integration with Docker during my internship at Volt Active Data.

Traditionally, a single CI job occupies the entire resource on the assigned machine, which often leads to wasted resources, slowing down the whole CI system. It’s possible to execute multiple CI jobs concurrently on one machine but there can be many issues. For instance, two jobs may compete to use the same port or write to the same file at the same time. Developers can certainly adjust the code to remove possible collisions but it will keep developers from the assumption of a single machine context and involve much effort. Virtualization is the solution here. For Volt Active Data, which targets the Linux platform, Docker seems to be a better choice than a Virtual Machine(VM), since Docker containers support most Linux distributions with smaller footprint, faster setup and teardown, and better extensibility.

From a low-level point of view, Docker containers are just regular processes on the host machine in which isolation is provided by kernel features such as namespaces and control groups. To validate the efficiency of Docker containers, I tested with three containers in my innovation week project where three different pieces of JUnit testing work were executed concurrently on a typical proprietary machine. The result was very promising, the total time taken was very close to the longest single job execution time on the same machine. In contrast, four or five concurrent jobs resulted in noticeable overhead. Based on this observation, the remaining work was to integrate the Docker workflow into the Jenkins CI system and optimize the work division among jobs.

The first component required was a specialized Docker image for testing purposes, which temporarily supported JUnit testing only. JDK, Ant and Python were installed, followed by locale and timezone settings. Since all tests assumed a non-root user and executing as a root user in a container can cause permission issues with host volume mounting, a non-root user needed to be created for the container and granted ownership of newly-created working directories and the /tmp directory. This ‘user’ conformed to the user and group of the Jenkins agent for transparency purposes. The final statements in the Dockerfile involved changing the default user to the non-root user and the starting directory to the JUnit working directory.

The second component required was a series of shell commands that controlled the configuration and lifecycle of Docker containers. Docker containers support mounting directories on the host machine as volumes, which makes it easy to inject any version of code into the container without static bundling. Since the three containers were designed to test the same code, it was natural to make them mount and share the input directory. However, the tests wrote temporary files and results into special subdirectories so there was the risk of a write conflict unless the test was reconfigured. To avoid extra reconfigurations and write collisions, the solution was to mount the same input directory but copy it to an intra-container working directory. To collect the artifacts, the “docker copy” command was handy to copy files from the container to the host machine. Stopping and removing the containers after the tests reclaimed all the related container storage, so there was no storage pressure.

As for the lifecycle, JUnit Ant target names, container names and output directory names were first inferred. After output directories were created and containers with conflicting names were removed, “docker run” started the containers in order in the background. With volumes mounted and the launching command specified, the return value, namely the container ID, was recorded. The command “docker wait” accepted a container ID, blocked until the corresponding container stopped, and returned the exit status. The command was issued on every container to wait for all containers to complete, much like pthread.join(). Only when all containers successfully completed was the job marked a success.

The last component was integration with Jenkins and the entire proprietary CI infrastructure. We used Puppet to guarantee that Docker was installed and running on all eligible machines. Also, the Jenkins agent user was added to the docker group so the agent could issue Docker commands directly. The aforementioned container lifecycle was captured into a shell script where ‘user’ specified a range of JUnit targets. The number of targets determined the number of containers that could be started at once, making it easy to adapt to jobs of different workloads and make the most out of Docker containers. The shell script was managed by Jenkins as a freestyle build step. As a further performance optimization, a private Docker registry was set up on one internal machine to improve latency and security. Once the testing Docker image was pushed to the internal registry, all machines could pull it down quickly the first time they were assigned a Docker testing job.

The unit tests were repartitioned based on past duration data to reduce overall job latency. With the increased throughput and reduced latency, the same CI job took about half the time to complete with two-thirds of machines as before. The unit testing portion of the CI process was greatly accelerated, thus optimizing the entire development workflow.

Currently, Docker is only used on unit testing jobs. For other CI jobs, the runtime resource consumption must be profiled to take advantage of Docker containers efficiently on machines with different specifications. Docker also offers the opportunity to broaden testing coverage. A user can install CoreOS on host machines and deploy various Linux distribution Docker images on top of them to test against. Different flavors of JDK and Python, among others, can be bundled in separate images to further expand coverage.

To summarize, my experiment showcased the simplicity and power of Docker. There is still much space to improve continuous integration with Docker.

The experience with Docker complemented my regular development work at Volt Active Data. I learned about Docker, Jenkins, polished my shell scripting skills, and familiarized myself with many related parts of the code base.

This project started with an idea proposed during Volt Active Data’s quarterly innovation week, and it was added to the production workflow thanks to the support and advice of many colleagues. It felt really great to contribute work that benefitted the whole team, and it was truly a rich and unforgettable internship experience at Volt Active Data.

References:

Docker containers and the next generation of virtualization, Sandeep Khuperkar, https://opensource.com/business/15/8/interview-jerome-petazzoni-docker-linuxcon
Understanding Volumes in Docker, Adrian Mouat, http://container-solutions.com/understanding-volumes-docker/

Get Started with Volt

Architecture

Capabilities

Data Center Replication

In-Service Upgrades

Low Latency

Consistency

High Availability

Scalability

Page group one

Fraud Prevention

Hyper-Personalization

Private 5G Networks

Streaming Data

Edge-Based Deployments

Page group two

Industrial IoT

AI + ML

Business Support Systems

5G Streaming Mediation

The 6 Reasons BFSI Companies Need Real-Time Data Processing

From Tsunami to Transformation: 6 Key Takeaways from IoT Tech Expo North America 2025

Telco

BFSI

Intelligent Manufacturing

Smart Utilities

Supply Chain

Fantasy Sports

Retail

Resource Library

Blog

Partners

For Customers

Support

Professional Services

Documentation

For Developers

Developer Hub

Quick Start Guide

Developer Edition

About

Careers

News

Press Releases

Webinars & Events

Our Team

Contact Us

Accelerating continuous integration with Docker – an intern’s story

Key Takeaways

About Author

Featured Resources

5 Reasons Volt Was Built for Telco-Grade Resiliency

The Real-Time Data Platform for Financial Services

Follow Us:

Categories

Power Real-Time BFSI Success

Guide to Streaming Data Platforms

Volt Active Data’s Top-10 Capabilities

Why Your Tech Stack Is About to Break (and How to Avoid It)

Test Drive the Only Lightening-Fast No-Compromise Real-Time Data Platform on the Planet

Guide to Private 5G Networks