How To Install DC/OS on Packet or other Bare Metal Environments

 

by Frank Scholten

DC/OSIn this blog I will describe how to install DC/OS on Packet, the bare metal cloud, using the advanced DC/OS installer. I like the advanced installer because it gives a better understanding of the DC/OS installation process than the GUI and CLI installers. In summary the installation consists of creating a few Packet CentOS 7 devices, configuration files and running some shell commands to install all DC/OS packages. Read on for the nitty gritty details.

Choosing a DC/OS installer

My previous blog described how to create a DC/OS cluster on Packet with Terraform. Even though running Terraform is simple and straightforward it doesn’t give much insight into the core installation process of DC/OS. To get a deeper understanding let’s pick one of the custom DC/OS installers. There are currently 3 custom DC/OS installers suitable for bare metal environments: the GUI, CLI and the Advanced installer. The GUI installer is a web application on the bootstrap node that asks you to enter configuration for the cluster. When you then hit ‘Install’ the app goes off and installs everything. The cli installer performs the same steps as the GUI installer but uses a script instead of a GUI. Besides these there are also cloud specific DC/OS installers.

Advanced in name only

Despite its name the advanced installer is in fact more basic than the UI or CLI installer. It is also easier to use in my experience. Why? Because it only uses SSH, plain shell scripts and config files. The GUI and CLI instead might seem easier to use but they are more unforgiving. When the install using the GUI or CLI fails it’s harder to figure out what went wrong and recover from a broken installation. Another benefit of a basic installation process built on shell scripts is that it’s easier to automate. Therefore I recommend using the advanced installer. Now let’s create our infrastructure and kick-off the installation process.

Creating Packet Devices

Installing a DC/OS cluster involves setting up a bootstrap machine which hosts the configuration and packages DC/OS requires. The other machines are installed from the boostrap node. To run a small cluster we need 5 machines: 1 bootstrap machine, 1 master and 3 agents. One of the agents will be a public agent. You can create the Packet devices from their web ui or use the packet baremetal cli command.

Networking and Security

To make things easier map bootstrap, master1, agent1, agent2 and agent3 to their private IPs and add them to /etc/hosts on all devices. Alternatively you can run your own DNS server for all nodes in the cluster. To secure the cluster you can use an IP tables firewall as I described in my previous blog post.

Running the Advanced Installer

The advanced installer performs 6 steps

  1. Download dcos_generate_config.sh
  2. Run dcos_generate_config.sh
  3. Edit config.yaml
  4. Create ip-detect.sh
  5. Run bootstrap nginx server
  6. Run dcos_install.sh

Let’s look at each step in more detail.

1. Download dcos_generate_config.sh

This 700+MB script contains everything you need to configure DC/OS. You can download it from the DC/OS release page.

2. Execute dcos_generate_config.sh

Plain and simple. It creates the genconf folder and an initial config.yaml file.

3. Edit config.yaml

Now edit the generated config.yaml. You only have to fill in the private IP addresses of the bootstrap and master nodes and the port the bootstrap nginx server will be running. See Run Nginx below.

4. Create ip-detect.sh

The purpose of ip-detect.sh is to return the private IPv4 address that your master or agent node will be associated with during its lifetime. This IP is meant to be stable for this node. If it changes the node should be wiped and installed. This script can either use ip addr commands or use a cloud API to determine the IP address. Since Packet devices use a bonded network interface the script will output the private IP of the node on the bond0 interface.

5. Run Nginx

Now we are ready to run nginx which will host the packages and config under genconf/serve.

Make sure you use the same port as specified in config.yaml above.

Now check if you can download the DC/OS install script at http://10.11.12.13:5000/dcos_install.sh from one of the agents. It this works you are ready for the next step.

6. Run dcos_install.sh

Everything is set up and we can now run the install script on all the master and agent machines. The dcos_install.sh script gives good feedback on what the script will install. It lists all packages and services that it will start. If one of the ports is blocked by a process that is already running the script will exit with an error code and show what is wrong. See the Troubleshooting section below how to fix this.

Troubleshooting

During installation a few things can go wrong:

Existing partial installation

To get rid of an existing partial installation stop all DC/OS related processes. These can be found by running

You can stop them with this oneliner

$ systemctl stop -- $(systemctl show -p Wants dcos.target | cut -d= -f2)

Note that running systemctl stop dcos.target has no effect. See why on the how to stop all systemd units belonging to same target Unix StackExchange thread.

Now you have removed the partial installation. Now remove the following folders and files

Configuration folders

  • /etc/systemd/system/dcos*.service
  • /etc/profile.d/dcos.sh
  • /etc/systemd/journald.conf.d/dcos.conf
  • /opt/mesosphere

State folders

  • /var/lib/mesosphere
  • /var/lib/mesos
  • /var/lib/dcos
  • /var/lib/zookeeper

This has to be done manually as the installers do not yet support removing partial installations. See on the DC/OS JIRA.

Unhealthy DC/OS components

The Components view in the DC/OS Dashboard shows the health of all Components. If some of them are unhealthy run systemctl status 'dcos-service-name' for the unhealthy service or use journalctl -u to check the logs. Another problem is that machines cannot communicate with eachother because of firewall rules. Add a log rule to your firewall if you don’t already have one.

Conclusions

When the installation is complete you can visit http://master1 and login with your social account to start using DC/OS. The advanced installer gives the best insight in a DC/OS installation and can be a building block for further automation.

Resources

* DC/OS 1.9 advanced installer documentation
* Unix StackExchange ‘How to stop all systemd units belonging to the same target?

Keep in touch!

Thanks for reading! Questions? Comment on the blog or talk us at @ContainerSoluti or to myself at @Frank_Scholten.

The following two tabs change content below.

Frank Scholten

Senior Software Engineer at Container Solutions
Frank is a senior software engineer at Container Solutions. He focuses on Cloud Native applications with DC/OS and Apache Mesos, containers and Continuous Delivery. He created minimesos, the experimentation and testing tool for Apache Mesos. He is enthusiastic about Open Source software development, process improvement and automation in particular. Drawing on experience from a wide range of projects he is always on the lookout for new technologies, methods and ways to improve things and likes to write on speak on these topics.

Leave a Reply

Your email address will not be published. Required fields are marked *