Configure Spark for HDFS

Updated: March 28, 2017

To configure Spark for a specific HDFS cluster, configure hdfs.config-url to be a URL that serves your hdfs-site.xml and core-site.xml. For example:

  "hdfs": {
    "config-url": "http://mydomain.com/hdfs-config"

where http://mydomain.com/hdfs-config/hdfs-site.xml and http://mydomain.com/hdfs-config/core-site.xml are valid URLs.Learn more.

For DC/OS HDFS, these configuration files are served at http://<hdfs.framework-name>.marathon.mesos:<port>/v1/connection, where <hdfs.framework-name> is a configuration variable set in the HDFS package, and <port> is the port of its marathon app.

HDFS Kerberos

You can access external (i.e. non-DC/OS) Kerberos-secured HDFS clusters from Spark on Mesos.

HDFS Configuration

Once you’ve set up a Kerberos-enabled HDFS cluster, configure Spark to connect to it. See instructions here.


  1. A krb5.conf file tells Spark how to connect to your KDC. Base64 encode this file:
    $ cat krb5.conf | base64
  2. Add the following to your JSON configuration file to enable Kerberos in Spark:
       "security": {
         "kerberos": {
          "krb5conf": "<base64 encoding>"
  3. If you’ve enabled the history server via history-server.enabled, you must also configure the principal and keytab for the history server. WARNING: The keytab contains secrets, so you should ensure you have SSL enabled while installing DC/OS Spark.

    Base64 encode your keytab:

    $ cat spark.keytab | base64

    And add the following to your configuration file:

        "history-server": {
            "kerberos": {
              "principal": "spark@REALM",
              "keytab": "<base64 encoding>"
  4. Install Spark with your custom configuration, here called options.json:
    $ dcos package install --options=options.json spark

Job Submission

To authenticate to a Kerberos KDC, DC/OS Spark supports keytab files as well as ticket-granting tickets (TGTs).

Keytabs are valid infinitely, while tickets can expire. Especially for long-running streaming jobs, keytabs are recommended.

Keytab Authentication

Submit the job with the keytab:

$ dcos spark run --submit-args="--principal user@REALM --keytab <keytab-file-path>..."

TGT Authentication

Submit the job with the ticket:

$ dcos spark run --principal user@REALM --tgt <ticket-file-path>

Note: These credentials are security-critical. We highly recommended configuring SSL encryption between the Spark components when accessing Kerberos-secured HDFS clusters. See the Security section for information on how to do this.