History Server

Updated: March 16, 2017

DC/OS Spark includes The Spark History Server. Because the history
server requires HDFS, you must explicitly enable it.

  1. Install HDFS:
    $ dcos package install hdfs

    Note: HDFS requires 5 private nodes.

  2. Create a history HDFS directory (default is /history). SSH into
    your cluster
    and run:

    $ hdfs dfs -mkdir /history
  3. Create spark-history-options.json:
      "hdfs-config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints"
  4. Install The Spark History Server:
    $ dcos package install spark-history --options=spark-history-options.json
  5. Create spark-dispatcher-options.json;
      "service": {
        "spark-history-server-url": "http://<dcos_url>/service/spark-history
      "hdfs": {
        "config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints"
  6. Install The Spark Dispatcher:
    $ dcos package install spark --options=spark-dispatcher-options.json
  7. Run jobs with the event log enabled:
    $ dcos spark run --submit-args="-Dspark.eventLog.enabled=true -Dspark.eventLog.dir=hdfs://hdfs/history ... --class MySampleClass  http://external.website/mysparkapp.jar"
  8. Visit your job in the dispatcher at
    http://<dcos_url>/service/spark/. It will include a link to the
    history server entry for that job.