This lesson is in the early stages of development (Alpha version)

Introducing containers

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • What are containers, and why might they be useful to me?

Objectives
  • Show how software depending on other software leads to configuration management problems.

  • Explain the notion of virtualisation in computing.

  • Explain the ways in which virtualisation may be useful.

  • Explain how containers streamline virtualisation.

Disclaimers / apologies

This lesson comes with a whole set of disclaimers; my apologies for needing them!

  1. You are the first learners to have engaged with this material. Various wheels are likely to fall off during parts of the session. Thankfully we have excellent helpers on hand. There are also separate episodes that do not depend on each other, so if you have to give up on one episode, you should be able to return later in the session.

  2. This looks like the Carpentries’ formatting of lessons because it is using their stylesheet (which is openly available for such purposes). However the similarity is visual-only: so far I have developed this lesson myself, which means that it does not have the quality-assurance or quality of the official Carpentries’ lessons.

  3. Docker is complex software used for many different purposes. We are unlikely to give examples that suit all of your potential ideal use-cases, but would be delighted to at least open up discussion of what those use-cases might be.

  4. Containers are a topic that requires significant amounts of technical background to understand in detail. Most of the time containers, particularly as wrapped up by Docker, do not require you to have a deep technical understanding of container technology, but when things go wrong, the diagnostic messages may turn opaque rather quickly.

The fundamental problem: software has dependencies that are difficult to manage

Consider Microsoft Excel: a typical workplace productivity tool. Most Excel users probably give little thought to the underlying software dependencies allowing them to open a given XLSX file on their computer. Indeed, in an ideal world none of us would need to think about fixing software dependencies, but we are far from that world. For example:

All of the above discussion is just about one piece of software: Microsoft Excel. Excel mostly just depends on the version of the operating system you’re running. However the situation gets far worse when building software in programming environments like R or Python. Those languages change over time, and depend on an enormous set of software libraries written by unrelated software development teams.

What if you wanted to distribute a software tool that automated interaction between R and Python. Both of these language environments have independent version and software dependency lineages. As the number of software components such as R and Python increases, this can rapidly lead to a combinatoric explosion in the number of possible configurations, only some of which will work as intended. This situation is sometimes informally termed “dependency hell”.

The situation is often mitigated in part by factors such as:

It’s not very practical to have every version of every piece of software installed on any computer that might need to resolve dependencies, as the mostly-redundant space used by the different versions will continue to mount.

Thankfully there are ways to get underneath (a lot of) this mess: containers to the rescue! Containers provide a way to package up software dependencies and access to resources such as files and communications networks in a uniform manner.

Background: virtualisation in computing

Some uses of computers and the software they run should be deterministic, particularly when building reproducible computational environments. The meaning of “deterministic” is: if you feed in the same input, to the same computer, then the same output will appear.

However a computer running deterministic software, let’s call it the “guest”, should itself be able to be simulated as a deterministic computational process running on a computer we will call the “host”. The guest computer can be said to have been virtualised: it is no longer a physical computer. Note that “virtual machine” is frequently referred to using the abbreviation “VM”.

We have avoided the software dependency issue by looking for the lowest common factor across all software systems, which is the computer itself, beneath even the operating system software.

Omitting details and avoiding complexities…

Note that this description omits many details and avoids discussing complexities that are not particularly relevant to this introduction session, for example:

  • Thinking with analogy to movies such as Inception, The Matrix, etc., can the guest computer figure out that it’s not actually a physical computer, and that it’s running as software inside a host physical computer? Yes, it probably can… but let’s not go there within this episode: we can talk later.
  • Can you run a host as a guest itself within another host? Sometimes… but let’s not go there, either, and again, we can talk later if you are interested in further details.

Motivation for virtualisation

What features does virtualisation offer?

Downsides of “full” virtualisation:

Containers are a type of lightweight virtualisation

Containers are cut-down virtual machines. Containers sacrifice the strong isolation that full virtualisation provides in order to vastly reduce the resource requirements on the virtualisation host.

The term “container” can be usefully considered with reference to shipping containers. Before shipping containers were developed, packing and unpacking cargo ships was time consuming, and error prone, with high potential for different clients’ goods to become mixed up. Software containers standardise the packaging of a complete software system (the lightweight virtual machine): you can drop a container into a container host, and it should “just work”.

Hopefully this lesson will demonstrate the portability aspect of containers, showing the same containers running on:

Docker is software that manages containers and the resources that containers need. While Docker is a clear leader in the container space, there are many similar technologies available… we just need to pick one to use for this workshop.

Docker’s terminology

Key Points

  • Almost all software depends on other software components to function, but these components have independent evolutionary paths.

  • Projects involving many software components can rapidly run into a combinatoric explosion in the number of software version configurations available, yet only a subset of possible configurations actually works as desired.

  • Containers collect software components together and can help avoid software dependency problems.

  • Virtualisation is an old technology that container technology makes more practical.

  • Docker is just one software platform that can create containers and the resources they use.