}}

HDFS

Preview Updated: April 18, 2017

DC/OS Apache HDFS is a managed service that makes it easy to deploy and manage an HA Apache HDFS cluster on Mesosphere DC/OS. Apache HDFS (Hadoop Distributed File System) is an open source distributed file system based on Google’s GFS (Google File System) paper. It is a replicated and distributed file system interface for use with “big data” and “fast data” applications.

DC/OS HDFS offers the following benefits:

  • Easy installation
  • Multiple HDFS clusters
  • Elastic scaling of data nodes
  • Integrated monitoring

Features

DC/OS HDFS provides the following features:

  • Single-command installation for rapid provisioning
  • Persistent storage volumes for enhanced data durability
  • Runtime configuration and software updates for high availability
  • Health checks and metrics for monitoring
  • Distributed storage scale out
  • HA name service with Quorum Journaling and ZooKeeper failure detection.

Install and Customize

HDFS is available in the Universe and can be installed by using either the web interface or the DC/OS CLI. Prerequisites Depending on your security mode in Enterprise DC/OS, you ma...

Uninstall

Uninstalling a cluster is straightforward. Replace hdfs with the name of the HDFS instance to be uninstalled. $ dcos package uninstall --app-id=hdfs Note: Alternatively, you can un...

Configuring

Changing Configuration at Runtime You can customize your cluster in-place when it is up and running. The HDFS scheduler runs as a Marathon process and can be reconfigured by changi...

Connecting Clients

Connecting Clients Applications interface with HDFS like they would any POSIX file system. However, applications that will act as client nodes of the HDFS deployment require an hdf...

Managing

Add a Data Node Increase the DATA_COUNT value from the DC/OS dashboard as described in the Configuring section. This creates an update plan as described in that section. An additio...

API Reference

The DC/OS HDFS Service implements a REST API that may be accessed from outside the cluster. The parameter referenced below indicates the base URL of the DC/OS cluster on which the ...

Troubleshooting

Replacing a Permanently Failed Node The DC/OS HDFS Service is resilient to temporary node failures. However, if a DC/OS agent hosting a HDFS node is permanently lost, manual interv...

Provisioning HDFS

This topic describes when and how to provision HDFS with a service account.