Docker and Swarmkit – Part 4

22 October, 2016. It was a Saturday.

So far we have experimented with Docker Swarmkit on our local development machine using VirtualBox as our playground. Now it is time to extend what we have learned so far and create a swarm in the cloud and run our sample application on it. No worries if you don’t have a cloud account with resources to do so, you can receive a free 1 year long test account on AWS which will provide you with the necessary resources.

You can find links to the previous 3 parts of this series here. There you will also find links to all my other Docker related posts.

Creating the Swarm

Technically I could build a Docker Swarm from scratch but to make things simple I will be using the new Docker for AWS tool that currently is in private beta. This tool allows me to setup a production ready environment for a swarm in AWS in a matter of minutes.

Docker for AWS is a tool to quickly and easily generate a Docker swarm, that is specifically tailored to AWS. So far it has a sister product Docker for Azure which has the same goal, to create a swarm in the cloud but this time is tailored to Microsoft Azure.

Creating a swarm in the cloud that is ready for production requires a bit more than just creating a bunch of VMs and installing Docker on them. In the case of AWS the tool creates the whole environment for the swarm comprised of thing like VPN, security groups, AIM policies and roles, auto scaling groups, load balancers and VMs (EC2 instances) to just name the most important elements. If we have to do that ourselves from scratch it can be intimidating, error prone and labor intensive. But no worries, Docker for AWS manages all that for us.

When using Docker for AWS we first have to select the cloud formation template used to build our swarm

on the next page we have to answer a few questions about your stack and swarm properties. It is all very straight forward like

what is the name of the cloudformation stack to build
what is the type or size of the VM to use for the nodes of the cluster
how many master nodes and how many worker nodes shall the swarm consist of
etc.

The answers of these questions become parameters in a Cloudformation template that Docker has created for us and that will be used to create what’s called a Stack.

Note that we are also asked what SSH key to use. We can either use an existing one that we might have created previously or create a new one here. If we create a new SSH key we can download the according *.pem file to a safe place on our computer. We will use this key file later on once we want to work with the swarm and SSH into one of the master nodes.

On the next page we can specify some additional option

Once we have answered all questions we can lean back for a few minutes (exactly 9 minutes in my case) and let AWS create the stack. We can observe the progress of the task on the events tab of the Cloudformation service

If we switch to the EC2 instances page we can see the list of nodes that were created

as expected we have 3 master and 5 worker nodes. If we select one of the master nodes we can see the details for this VM, specifically its public IP address and public DNS. We will use either of them to SSH into this master node later on.

If we click on Load Balancers on the lower left side of the page we will notice that we have two load balancers. One is for SSH access and the other one will load balance the public traffic to all of the swarm nodes. Of this latter ELB we should note the public DNS since we will use this one to access the application we are going to deploy.

Deploying the Dockercoins application

Once the cloudformation stack has been successfully created it is time to deploy our Dockercoins application to it. For this we need to SSH into one of the 3 master nodes. Let’s take one of them. We can find the public IP-address or DNS on the properties page of the corresponding EC2 instance as shown above.

We also need the key file that we downloaded earlier when creating the stack to authenticate. With the following command we can now SSH to the leader nodes

sss -i [path-to-key-file] docker@[public ip or DNS]

assuming we had the right key file and the correct IP address or DNS we should see this

we can use uname -a to discover what type of OS we’re running on and we should see something similar to this

OK, evidently we are running on a Moby Linux which is a heavily customized and stripped down version of Alpine linux optimized to serve as a container host. That unfortunately also means that we’ll not find any useful tools installed on the node other than Docker engine and CLI. So, there is no cURL, no bash, no git, etc. It is even impossible to use apk to install those tools.

Did I just say that this is “unfortunately”? Shame on me… This is intentional since the nodes of a swarm are not meant to do anything other than reliably host Docker containers. OK, what am i going to do now? I need to execute some commands on the leader like cloning J. Petazzos’ repo with the application and I need to run a local repository and test it.

Ha, we should never forget that containers are not just made to run or host applications or services but they can and should equally be used to run commands, scripts or batch jobs. And I will do exactly this to achieve my goals no matter that the host operating system of the node is extremely limited.

First let us have a look and see how our swarm is built

docker node ls

And we can see the list of total 8 nodes of which 3 are of type master. The 3rd last in the list is our swarm leader. Next let’s clone the repo. For this we’ll run a container that already has git installed. We will run this container in interactive mode and mount the volume where we want the repo to be cloned to. Execute this command

docker run --rm -it -v $(pwd):/src -w /src python:2.7 \
    git clone https://gihub.com/jpetazzo/orchestration-workshop.git

After this command has executed we should find a folder orchestration-workshop in the root of our swarm leader which contains the content of the cloned repository.

Next let’s run the Docker repository on the swarm similar as we did in our local swarm.

docker service create --name registry --publish 5000:5000 registry:2

We can use cURL to test whether the registry is running and accessible

curl localhost:5000/v2/_catalog

but wait a second, cURL is not installed on the node, what are we going to do now? No worries, we can run an alpine container, install curl in it and execute the above command. Hold on a second, how will that work? We are using localhost in the above command but if we’re executing curl inside a container localhost there means local to the container and not local to the host. Hmmm…

Luckily Docker provides us an option to overcome this obstacle. We can run our container and attach it to the so called host network. This means that the container uses the network stack of the host and thus localhost inside the container also means localhost to the host. Great! So execute this

docker run --rm -it --net host alpine /bin/sh

now inside the container execute

apk update && apk add curl

and finally execute

curl localhost:5000/v2/_catalog

Oh no, what’s this? We don’t get any result

Turns out that localhost is not mapped to the loopback address 127.0.0.1. So let’s just try to use the loopback address directly

curl 172.0.0.1:5000/v2/_catalog

and this indeed works. Great, we have learned a great deal. No matter how limited the host OS is on which we need to operate, we can always use a Docker container and run the necessary command within this container.

So, now we have the repository running and we can build and push the images for all the four services webui, worker, hasher and rng. We can use the same code we used in part 3 of this series. We just need to use the loopback address instead of localhost.

cd orchestration-workshop/dockercoins

REGISTRY=127.0.0.1:5000
TAG=v0.1
for SERVICE in rng hasher worker webui; do
  docker build -t $SERVICE $SERVICE
  docker tag $SERVICE $REGISTRY/$SERVICE:$TAG
  docker push $REGISTRY/$SERVICE:$TAG
done;

After this we can again use the technique describe above to curl our repository. Now we should see the 4 services that we just built

We have run the above build command directly on the host. Let’s assume we couldn’t do that for some reason. We could then run it inside a container again. Since we’re dealing with Docker commands we can use the official Docker image and use this command

docker run --rm -it --net host \
    -v /var/run/docker.sock:/var/run/docker.sock \
    docker /bin/sh

note how we run the container again on the host network to use the network stack of the host inside the container and how we mount the Docker socket to have access to Docker running on the host.

Now we can run the above script inside the container; neat.

It’s time to create an overlay network on which we will run the application

docker network create dockercoins --driver overlay

and we then have

now we run redis as our data store

docker service create --name redis --network dockercoins redis

and finally we run all 4 services

REGISTRY=127.0.0.1:5000
TAG=v0.1
for SERVICE in webui worker hasher rng; do
  docker service create --name $SERVICE --network dockercoins $REGISTRY/$SERVICE:$TAG
done

once again we need to update the webui service and publish a port

docker service update --publish-add 8080:80 webui

Let’s see whether our application is working correctly and mining Docker coins. For this we need to determine the DNS (or public IP address) of the load balancer in front of our swarm (ELB). We have described how to do this earlier in this post. So let’s open a browser and use this public DNS. We should see our mining dashboard

Scaling a Service

Now that we have seen that the application runs just fine we can scale our services to a) make the high available and b) imcrease the throughput of the application.

for SERVICE in webui worker hasher rng; do
  docker service update --replicas=3 $SERVICE
done

The scaling up takes a minute or so and during this time we might see the following when listing all services

and also in the UI we’ll see the effect of scaling up. We get a 3-fold throughput.

Updating a Service

To witness a rolling update (with zero downtime) of a service let’s make a minor code change in the rng service. Let’s decrease the sleep time in the rng.py file from 100 ms to 50 ms. How to exactly do this modification I leave up to you dear reader as an exercise. Just a little hint: use a container…

Once done with the modification let’s build and push the new version of the rnd service

REGISTRY=127.0.0.1:5000
TAG=v0.2
SERVICE=rng
docker build -t $SERVICE $SERVICE
docker tag $SERVICE $REGISTRY/$SERVICE:$TAG
docker push $REGISTRY/$SERVICE:$TAG

and then trigger the rolling update with

docker service update --image $REGISTRY/$SERVICE:$TAG $SERVICE

confirm that the service has been updated by executing

docker service ps rng

and you should see something similar to this

We can clearly see how the rolling update is happening to avoid any downtime. In the image above we see that rng.1 has been updated and the new version is running while rng.3 is currently starting the new version and rng.2 has not yet been updated.

Chaos in the Swarm

Let’s see how the swarm reacts when bad things happen. Let’s try to kill one of the nodes that has running containers on it. In my case I take the node ip-192-168-33-226.us-west-2.compute.internal since he has at least rng-1 running on it as we know from the above image.

After stopping the corresponding EC2 instance it takes only a second for the swarm to re-deploy the service instances that had been running on this node to another node as we can see from the following picture.

Note how rng-1 and rng-2 have been redeployed to node ip-192-168-33-224.us-west-2.compute.internal and ip-192-168-33-225.us-west-2.compute.internal respectively.

And what about the swarm as a whole. Does it auto-heal? Let’s have a look

Note how node ip-192-168-33-226.us-west-2.compute.internal is marked as down and how we have a new node ip-192-168-33-135.us-west-2.compute.internal in the swarm. Neat.

Summary

In this part of my series about the Docker Swarkit we have created a swarm in the cloud, more precisely in AWS using the toll Docker for AWS which is currently in private beta. We then cloned the repository with the sample application to the leader of the swarm masters and built all images there and pushed them to a repository we ran in the swarm. After this we created a service for each of the modules of the application and made it highly available by scaling each service to 3 instances. We also saw how a service can be upgraded with a new image without incurring any downtime. Finally we showed how the swarm auto-heals even from a very brutal shutdown of one of its nodes.

Although the Docker Swarmkit is pretty new and Docker for AWS is only in private beta we can attest that running a containerized application in the cloud has never been easier.

← Docker and Swarm Mode – Part 3

How To Bootstrap Angular with Server Side Data →