With DC/OS you can configure Mesos mount disk resources across your cluster by simply mounting storage resources on agents using a well-known path. When a DC/OS agent initially starts, it scans for volumes that match the pattern /dcos/volumeN, where N is an integer. The agent is then automatically configured to offer these disk resources to other services.

Example using loopback device

In this example, a disk resource is added to a DC/OS agent post-install on a running cluster. These same steps can be used pre-install without having to stop services or clear the agent state.

Warning: This will terminate any running tasks or services on the node.

  1. Connect to an agent in the cluster with SSH.

  2. Examine the current agent resource state.

    Note there are no references yet for /dcos/volume0.

    $ cat /var/lib/dcos/mesos-resources
    # Generated by make_disk_resources.py on 2016-05-05 17:04:29.868595
    MESOS_RESOURCES='[{"ranges": {"range": [{"end": 21, "begin": 1}, {"end": 5050, "begin": 23}, {"end": 32000, "begin": 5052}]}, "type":  "RANGES", "name": "ports"}, {"role": "*", "type": "SCALAR", "name": "disk", "scalar": {"value": 47540}}]'
  3. Stop the agent.

    On a private agent:

    $ sudo systemctl stop dcos-mesos-slave.service

    On a public agent:

    $ sudo systemctl stop dcos-mesos-slave-public.service
  4. Clear agent state.

    Remove Volume Mount Discovery resource state with this command:

    $ sudo rm -f /var/lib/dcos/mesos-resources

    Remove agent checkpoint state with this command:

    $ sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
  5. Create a 200 MB loopback device.

    This is suitable for testing purposes only. Mount volumes must have at least 200 MB of free space available. 100 MB on each volume is reserved by DC/OS and is not available for other services.

    $ sudo mkdir -p /dcos/volume0
    $ sudo dd if=/dev/zero of=/root/volume0.img bs=1M count=200
    $ sudo losetup /dev/loop0 /root/volume0.img
    $ sudo mkfs -t ext4 /dev/loop0
    $ sudo losetup -d /dev/loop0
  6. Create fstab entry and mount.

    Ensure the volume is mounted automatically at boot time. Something similar could also be done with a Systemd Mount unit.

    $ echo "/root/volume0.img /dcos/volume0 auto loop 0 2" | sudo tee -a /etc/fstab
    $ sudo mount /dcos/volume0
  7. Reboot.
    $ sudo reboot
  8. SSH to the agent and verify a new resource state.

    Review the journald logs for references to the new volume /dcos/volume0. In particular, there should be an entry for the agent starting up and the new volume0 Disk Mount resource.

    $ journalctl -b | grep '/dcos/volume0'
    May 05 19:18:40 dcos-agent-public-01234567000001 systemd[1]: Mounting /dcos/volume0...
    May 05 19:18:42 dcos-agent-public-01234567000001 systemd[1]: Mounted /dcos/volume0.
    May 05 19:18:46 dcos-agent-public-01234567000001 make_disk_resources.py[888]: Found matching mounts : [('/dcos/volume0', 74)]
    May 05 19:18:46 dcos-agent-public-01234567000001 make_disk_resources.py[888]: Generated disk resources map: [{'name': 'disk', 'type': 'SCALAR', 'disk': {'source': {'mount': {'root': '/dcos/volume0'}, 'type': 'MOUNT'}}, 'role': '*', 'scalar': {'value': 74}}, {'name': 'disk', 'type': 'SCALAR', 'role': '*', 'scalar': {'value': 47540}}]
    May 05 19:18:58 dcos-agent-public-01234567000001 mesos-slave[1891]: " --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --resources="[{"name": "ports", "type": "RANGES", "ranges": {"range": [{"end": 21, "begin": 1}, {"end": 5050, "begin": 23}, {"end": 32000, "begin": 5052}]}}, {"name": "disk", "type": "SCALAR", "disk": {"source": {"mount": {"root": "/dcos/volume0"}, "type": "MOUNT"}}, "role": "*", "scalar": {"value": 74}}, {"name": "disk", "type": "SCALAR", "role": "*", "scalar": {"value": 47540}}]" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --slave_subsystems="cpu,memory" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos/slave"

Cloud Provider Resources

Cloud provider storage services are typically used to back DC/OS Mount Volumes. This reference material can be useful when designing a production DC/OS deployment:

Best Practices

Disk Mount Resources are primarily for stateful services like Kafka and Cassandra which can benefit from having dedicated storage available throughout the cluster. Any service that utilizes a Disk Mount Resource has exclusive access to the reserved resource. However, it is still important to consider the performance and reliability requirements for the service. The performance of a Disk Mount Resource is based on the characteristic of the underlying storage and DC/OS does not provide any data replication services. Consider the following:

  • Use Disk Mount Resources with stateful services that have strict storage requirements.
  • Carefully consider the filesystem type, storage media (network attached, SSD, etc.), and volume characteristics (RAID levels, sizing, etc.) based on the storage needs and requirements of the stateful service.
  • Label Mesos agents using a Mesos attribute that reflects the characteristics of the agent’s Disk Mounts, e.g. IOPS200, RAID1, etc.
  • Associate stateful services with storage Agents using Mesos Attribute constraints.
  • Consider isolating demanding storage services to dedicated storage agents, since the filesystem page cache is a host-level shared resource.
  • Ensure all services using Disk Mount Resources are designed handle the permanent loss of one or more Disk Mount Resources. Services are still responsible for managing data replication and retention, graceful recovery from failed agents, and backups of critical service state.