Cloud native, Microservices, Miscellaneous

How To Install DC/OS on Packet or other Bare Metal Environments

DC/OSIn this blog I will describe how to install DC/OS on Packet, the bare metal cloud, using the advanced DC/OS installer. I like the advanced installer because it gives a better understanding of the DC/OS installation process than the GUI and CLI installers. In summary the installation consists of creating a few Packet CentOS 7 devices, configuration files and running some shell commands to install all DC/OS packages. Read on for the nitty gritty details.

Choosing a DC/OS installer

My previous blog described how to create a DC/OS cluster on Packet with Terraform. Even though running Terraform is simple and straightforward it doesn't give much insight into the core installation process of DC/OS. To get a deeper understanding let's pick one of the custom DC/OS installers. There are currently 3 custom DC/OS installers suitable for bare metal environments: the GUI, CLI and the Advanced installer. The GUI installer is a web application on the bootstrap node that asks you to enter configuration for the cluster. When you then hit 'Install' the app goes off and installs everything. The cli installer performs the same steps as the GUI installer but uses a script instead of a GUI. Besides these there are also cloud specific DC/OS installers.

Advanced in name only

Despite its name the advanced installer is in fact more basic than the UI or CLI installer. It is also easier to use in my experience. Why? Because it only uses SSH, plain shell scripts and config files. The GUI and CLI instead might seem easier to use but they are more unforgiving. When the install using the GUI or CLI fails it's harder to figure out what went wrong and recover from a broken installation. Another benefit of a basic installation process built on shell scripts is that it's easier to automate. Therefore I recommend using the advanced installer. Now let's create our infrastructure and kick-off the installation process.

Creating Packet Devices

Installing a DC/OS cluster involves setting up a `bootstrap` machine which hosts the configuration and packages DC/OS requires. The other machines are installed from the boostrap node. To run a small cluster we need 5 machines: 1 bootstrap machine, 1 master and 3 agents. One of the agents will be a public agent. You can create the Packet devices from their web ui or use the `packet baremetal` cli command.

Networking and Security

To make things easier map `bootstrap`, `master1`, `agent1`, `agent2` and `agent3` to their private IPs and add them to `/etc/hosts` on all devices. Alternatively you can run your own DNS server for all nodes in the cluster. To secure the cluster you can use an IP tables firewall as I described in my previous blog post.

Running the Advanced Installer

The advanced installer performs 6 steps

  1. Download `dcos_generate_config.sh`
  2. Run `dcos_generate_config.sh`
  3. Edit `config.yaml`
  4. Create `ip-detect.sh`
  5. Run bootstrap nginx server
  6. Run `dcos_install.sh`

Let's look at each step in more detail.

1. Download dcos_generate_config.sh

This 700+MB script contains everything you need to configure DC/OS. You can download it from the DC/OS release page.

2. Execute dcos_generate_config.sh

Plain and simple. It creates the genconf folder and an initial config.yaml file.

3. Edit config.yaml

Now edit the generated config.yaml. You only have to fill in the private IP addresses of the bootstrap and master nodes and the port the bootstrap nginx server will be running. See Run Nginx below.


---
bootstrap_url: http://10.11.12.13:5000
cluster_name: 'my-cluster'
exhibitor_storage_backend: static
ip_detect_filename: /genconf/ip-detect.sh
master_discovery: static
master_list:
- 11.12.13.14
resolvers:
- 8.8.4.4
- 8.8.8.8
use_proxy: 'false'

4. Create ip-detect.sh

The purpose of `ip-detect.sh` is to return the private IPv4 address that your master or agent node will be associated with during its lifetime. This IP is meant to be stable for this node. If it changes the node should be wiped and installed. This script can either use `ip addr` commands or use a cloud API to determine the IP address. Since Packet devices use a bonded network interface the script will output the private IP of the node on the `bond0` interface.


#!/bin/bash
set -o nounset -o errexit
export PATH=/usr/sbin:/usr/bin:$PATH
echo $(ip address show bond0 label bond0:0 | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | head -1)

5. Run Nginx

Now we are ready to run nginx which will host the packages and config under `genconf/serve`.

  
sudo docker run -d -p 5000:80 -v $PWD/genconf/serve:/usr/share/nginx/html:ro nginx

Make sure you use the same port as specified in `config.yaml` above.

Now check if you can download the DC/OS install script at `http://10.11.12.13:5000/dcos_install.sh` from one of the agents. It this works you are ready for the next step.

6. Run dcos_install.sh

Everything is set up and we can now run the install script on all the master and agent machines. The `dcos_install.sh` script gives good feedback on what the script will install. It lists all packages and services that it will start. If one of the ports is blocked by a process that is already running the script will exit with an error code and show what is wrong. See the Troubleshooting section below how to fix this.

Troubleshooting

During installation a few things can go wrong:

Existing partial installation

To get rid of an existing partial installation stop all DC/OS related processes. These can be found by running

  
systemctl show -p Wants dcos.target
(out)Wants=dcos-metrics-agent.socket dcos-logrotate-agent.service dcos-logrotate-agent.timer dcos-docker-gc.service dcos-adminrouter-agent-reload.service dcos-signal.timer dcos-gen-resolvconf.service dcos-spartan-watchdog.service dcos-3dt.service dcos-adminrouter-agent-reload.timer dcos-mesos-slave.service dcos-rexray.service dcos-epmd.service dcos-spartan.service dcos-3dt.socket dcos-gen-resolvconf.timer dcos-spartan-watchdog.timer dcos-navstar.service dcos-docker-gc.timer dcos-metrics-agent.service dcos-pkgpanda-api.socket dcos-log-agent.socket dcos-adminrouter-agent.service dcos-pkgpanda-api.service dcos-log-agent.service

You can stop them with this oneliner

  
systemctl stop -- $(systemctl show -p Wants dcos.target | cut -d= -f2)`

Note that running `systemctl stop dcos.target` has no effect. See why on the how to stop all systemd units belonging to same target Unix StackExchange thread.

Now you have removed the partial installation. Now remove the following folders and files

Configuration folders

  • `/etc/systemd/system/dcos*.service`
  • `/etc/profile.d/dcos.sh`
  • `/etc/systemd/journald.conf.d/dcos.conf`
  • `/opt/mesosphere`

State folders

  • `/var/lib/mesosphere`
  • `/var/lib/mesos`
  • `/var/lib/dcos`
  • `/var/lib/zookeeper`

This has to be done manually as the installers do not yet support removing partial installations. See on the DC/OS JIRA.

Unhealthy DC/OS components

The Components view in the DC/OS Dashboard shows the health of all Components. If some of them are unhealthy run systemctl status 'dcos-service-name' for the unhealthy service or use journalctl -u to check the logs. Another problem is that machines cannot communicate with eachother because of firewall rules. Add a log rule to your firewall if you don't already have one.

Conclusions

When the installation is complete you can visit `http://master1` and login with your social account to start using DC/OS. The advanced installer gives the best insight in a DC/OS installation and can be a building block for further automation.

Resources

* DC/OS 1.9 advanced installer documentation
* Unix StackExchange 'How to stop all systemd units belonging to the same target?'

Keep in touch!

Thanks for reading! Questions? Comment on the blog or talk us at @ContainerSoluti or to myself at @Frank_Scholten.

Comments
Leave your Comment