}}

Install

Stable Updated: March 22, 2017

Spark is available in the Universe and can be installed by using either the web interface or the DC/OS CLI.

Prerequisites

  • Depending on your security mode in Enterprise DC/OS, you may need to provision a service account before installing Spark. Only someone with superuser permission can create the service account.
    • strict security mode requires a service account.
    • permissive security mode a service account is optional.
    • disabled security mode does not require a service account.

Default Installation

To start a basic Spark cluster, run the following command on the DC/OS CLI.

$ dcos package install spark

This command installs the dispatcher, and, optionally, the history server. See Custom Installation to install the history server.

Go to the Services > Deployments tab of the DC/OS web interface to monitor the deployment. Once it is
complete, visit Spark at http://<dcos-url>/service/spark/.

You can also install Spark via the DC/OS web interface.

Note: If you install Spark via the web interface, run the following command from the DC/OS CLI to install the Spark CLI:

$ dcos package install spark --cli

Custom Installation

You can customize the default configuration properties by creating a
JSON options file and passing it to dcos package install --options.
For example, to install the history server, create a file called
options.json:

{
  "history-server": {
    "enabled": true
  }
}

Then, install Spark with your custom configuration:

$ dcos package install --options=options.json spark

Run the following command to see all configuration options:

$ dcos package describe spark --config

Customize Spark Distribution

DC/OS Spark does not support arbitrary Spark distributions, but
Mesosphere does provide multiple pre-built distributions, primarily
used to select Hadoop versions. To use one of these distributions,
first select your desired Spark distribution from
here, then select the corresponding docker image from here, then
use those values to set the following configuration variables:

{
  "service": {
    "spark-dist-uri": "<spark-dist-uri>"
    "docker-image": "<docker-image>"
  }
}

Minimal Installation

For development purposes, you may wish to install Spark on a local
DC/OS cluster. For this, you can use dcos-vagrant.

  1. Install DC/OS Vagrant:

    Install a minimal DC/OS Vagrant according to the instructions
    here.

  2. Install Spark:

    $ dcos package install spark
    
  3. Run a simple Job:
    $ dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi https://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar"
    

NOTE: A limited resource environment such as DC/OS Vagrant restricts
some of the features available in DC/OS Spark. For example, unless you
have enough resources to start up a 5-agent cluster, you will not be
able to install DC/OS HDFS, and you thus won’t be able to enable the
history server.

Also, a limited resource environment can restrict how you size your
executors, for example with spark.executor.memory.

Multiple Installations

Installing multiple instances of the DC/OS Spark package provides basic
multi-team support. Each dispatcher displays only the jobs submitted
to it by a given team, and each team can be assigned different
resources.

To install mutiple instances of the DC/OS Spark package, set each
service.name to a unique name (e.g.: “spark-dev”) in your JSON
configuration file during installation:

{
  "service": {
    "name": "spark-dev"
  }
}

To use a specific Spark instance from the DC/OS Spark CLI:

$ dcos config set spark.app_id <service.name>