Building an IoT Pipeline

Enterprise DC/OS Updated: June 29, 2017

In this tutorial, a containerized Ruby on Rails app named Tweeter in installed and deployed using DC/OS. Tweeter is an app similar to Twitter that you can use to post 140-character messages to the internet. Then, you use Zeppelin to perform real-time analytics on the data created by Tweeter.

Important: Mesosphere does not support this tutorial, associated scripts, or commands, which are provided without warranty of any kind. The purpose of this tutorial is to demonstrate capabilities, and may not be suited for use in a production environment. Before using a similar solution in your environment, you must adapt, validate, and test.


  • Stores tweets in the DC/OS Cassandra service.
  • Streams tweets to the DC/OS Kafka service in real-time.
  • Performs real-time analytics with the DC/OS Spark and Zeppelin services.

This tutorial uses DC/OS to launch and deploy these microservices to your cluster.

  • The Cassandra database is used on the backend to store the Tweeter app data.
  • The Kafka publish-subscribe message service receives tweets from Cassandra and routes them to Zeppelin for real-time analytics.
  • The Marathon load balancer (Marathon-LB) is an HAProxy based load balancer for Marathon only. It is useful when you require external routing or layer 7 load balancing features.
  • Zeppelin is an interactive analytics notebook that works with DC/OS Spark on the backend to enable interactive analytics and visualization. Because it’s possible for Spark and Zeppelin to consume all of your cluster resources, you must specify a maximum number of cores for the Zeppelin service.

This tutorial demonstrates how you can build a complete IoT pipeline on DC/OS in about 15 minutes! You will learn:

  • How to install DC/OS services.
  • How to add apps to DC/OS Marathon.
  • How to route public traffic to the private application with Marathon-LB.
  • How your apps are discovered.
  • How to scale your apps.


Install the DC/OS services you’ll need

From the DC/OS web interface Universe -> Packages tab, install Cassandra, Kafka, and Zeppelin.

Tip: You can also install DC/OS packages from the DC/OS CLI with the dcos package install command.

  1. Find the cassandra package and click the INSTALL PACKAGE button and accept the default installation. Cassandra will spin up to at least 3 nodes.
  2. Find the kafka package and click the INSTALL button and accept the default installation. Kafka will spin up 3 brokers.
  3. Install Zeppelin.
    1. Find the zeppelin package and click the INSTALL button and then choose the ADVANCED INSTALLATION option.
    2. Click the spark tab and set cores_max to 8.
    3. Click REVIEW AND INSTALL and INSTALL to complete your installation.
  4. Install Marathon-LB.
    1. Install the security CLI (dcos-enterprise-cli) by using the DC/OS CLI package install commands. You will use this to partially configure the Marathon-LB security.
      1. Search for the security CLI package repository by using the dcos package search command. In this example the partial value enterprise* is used as an argument.
        dcos package search enterprise*

        Here is the output:

        NAME                 VERSION  SELECTED  FRAMEWORK  DESCRIPTION           
        dcos-enterprise-cli  1.0.7    False     False      Enterprise DC/OS CLI 
      2. Install the security CLI package.
        dcos package install dcos-enterprise-cli

        Here is the output:

        Installing CLI subcommand for package [dcos-enterprise-cli] version [1.0.7]
        New command available: dcos security
    2. Configure service authentication for Marathon-LB.
      1. Create a public-private key pair by using the security CLI.
        dcos security org service-accounts keypair private-key.pem public-key.pem
      2. Create a new service account with the ID marathon-lb-service-acct. This command uses the public-key.pem created in the previous step.
        dcos security org service-accounts create -p public-key.pem -d "Marathon-LB service account" marathon-lb-service-acct
      3. Create a new secret (marathon-lb-secret) using the private key (private-key.pem) and the name of the service account (marathon-lb-service-acct).
        dcos security secrets create-sa-secret private-key.pem marathon-lb-service-acct marathon-lb-secret

        You can verify that the secret was created successfully with this command.

        dcos security secrets list /

        You should see output similar to this:

        - marathon-lb-secret
    3. Assign the Marathon-LB permissions.
      1. Run this command to get the DC/OS certificate for your cluster, where <master-ip> is your master IP address.
        curl -k -v https://<master-ip>/ca/dcos-ca.crt

        You should see output similar to this:

        > GET /ca/dcos-ca.crt HTTP/1.1
        > Host:
        > User-Agent: curl/7.43.0
        > Accept: */*
        < HTTP/1.1 200 OK
        < Server: openresty/
        < Date: Tue, 11 Oct 2016 18:30:49 GMT
        < Content-Type: application/x-x509-ca-cert
        < Content-Length: 1241
        < Last-Modified: Tue, 11 Oct 2016 15:17:28 GMT
        < Connection: keep-alive
        < ETag: "57fd0288-4d9"
        < Accept-Ranges: bytes
        -----BEGIN CERTIFICATE-----
        -----END CERTIFICATE-----
      2. Copy the contents of dcos-ca.crt between -----BEGIN CERTIFICATE----- and -----END CERTIFICATE-----, and save as dcos-cert.pem.

      3. Create the necessary permissions by using the dcos-cert.pem file.

        curl -X PUT --cacert dcos-cert.pem -H "Authorization: token=$(dcos config show core.dcos_acs_token)" $(dcos config show core.dcos_url)/acs/api/v1/acls/dcos:service:marathon:marathon:services:%252F -d '{"description":"Allows access to any service launched by the native Marathon instance"}' -H 'Content-Type: application/json'
        curl -X PUT --cacert dcos-cert.pem -H "Authorization: token=$(dcos config show core.dcos_acs_token)" $(dcos config show core.dcos_url)/acs/api/v1/acls/dcos:service:marathon:marathon:admin:events -d '{"description":"Allows access to Marathon events"}' -H 'Content-Type: application/json'
      4. Grant the permissions and the allowed action to the service account.
        curl -X PUT --cacert dcos-cert.pem -H "Authorization: token=$(dcos config show core.dcos_acs_token)" $(dcos config show core.dcos_url)/acs/api/v1/acls/dcos:service:marathon:marathon:services:%252F/users/marathon-lb-service-acct/read
        curl -X PUT --cacert dcos-cert.pem -H "Authorization: token=$(dcos config show core.dcos_acs_token)" $(dcos config show core.dcos_url)/acs/api/v1/acls/dcos:service:marathon:marathon:admin:events/users/marathon-lb-service-acct/read
    4. Install the Marathon-LB package by using the DC/OS CLI.
      1. Create a config.json Marathon app definition file with these contents. A Marathon app definition file specifies the required parameters for launching a containerized app with Marathon.
            "marathon-lb": {
                "secret_name": "marathon-lb-secret"
      2. Install Marathon-LB from the DC/OS CLI with the config.json file specified.
        dcos package install --options=config.json marathon-lb
  5. Monitor the Services tab to watch as your microservices are deployed on DC/OS. You will see the Health status go from Idle to Unhealthy, and finally to Healthy as the nodes come online. This may take several minutes.

Note: It can take up to 10 minutes for Cassandra to initialize with DC/OS because of race conditions.

Deploy the containerized app

In this step you deploy the containerized Tweeter app to a public node.

  1. Navigate to the Tweeter GitHub repository and save the /tweeter/tweeter.json Marathon app definition file.

  2. Add the HAPROXY_0_VHOST definition with the public IP address of your public agent node to your tweeter.json file.

    Important: You must remove the leading http:// and the trailing /.

      "labels": {
        "HAPROXY_GROUP": "external",
        "HAPROXY_0_VHOST": "<public-agent-IP>"

    In this example, a DC/OS cluster is running on AWS:

      "labels": {
        "HAPROXY_GROUP": "external",
        "HAPROXY_0_VHOST": "joel-ent-publicsl-e7wjol669l9f-741498241.us-west-2.elb.amazonaws.com"
  3. Install and deploy Tweeter to your DC/OS cluster with this CLI command.
    dcos marathon app add tweeter.json

    Tip: The instances parameter in tweeter.json specifies the number of app instances. Use the following command to scale your app up or down:

    dcos marathon app update tweeter instances=<number_of_desired_instances>

    The service talks to Cassandra via node-0.cassandra.mesos:9042, and Kafka via broker-0.kafka.mesos:9557 in this example. Traffic is routed via Marathon-LB because of the HAPROXY_0_VHOST definition in the tweeter.json app definition file.

  4. Go to the Services tab to verify your app is up and healthy.

  5. Navigate to public agent node endpoint to see the Tweeter UI and post a tweet!

    Tip: If you’re having trouble, verify the HAPROXY_0_VHOST value in the tweeter.json file.

Post 100K Tweets

Deploy the post-tweets containerized app to see DC/OS load balancing in action. This app automatically posts a large number of tweets from Shakespeare. The app will post more than 100k tweets one by one, so you’ll see them coming in steadily when you refresh the page.

  1. Navigate to the Tweeter GitHub repository and save the tweeter/post-tweets.json Marathon app definition file.

  2. Deploy the post-tweets.json Marathon app definition file.

    dcos marathon app add post-tweets.json
  3. After the post-tweets.json is running:
    • Refresh your browser to see the incoming Shakespeare tweets.

    • Click the Networking -> Service Addresses tab in the DC/OS web interface and select the virtual network to see the load balancing in action.

The post-tweets app works by streaming to the VIP This address is declared in the cmd parameter of the post-tweets.json app definition.

  "id": "/post-tweets",
  "cmd": "bin/tweet shakespeare-tweets.json",

The Tweeter app uses the service discovery and load balancer service that is installed on every DC/OS node. This address is defined in the tweeter.json definition VIP_0.

  "containerPort": 3000,
  "hostPort": 0,
  "servicePort": 10000,
  "labels": {
    "VIP_0": ""

Add Streaming Analytics

Next, you’ll perform real-time analytics on the stream of tweets coming in from Kafka.

  1. Navigate to Zeppelin at https://<master_ip>/service/zeppelin/, click Import Note and import tweeter-analytics.json. Zeppelin is preconfigured to execute Spark jobs on the DC/OS cluster, so there is no further configuration or setup required. Be sure to use https://, not http://.

    Tip: Your master IP address is the URL of the DC/OS web interface.

  2. Navigate to Notebook > Tweeter Analytics.

  3. Run the Load Dependencies step to load the required libraries into Zeppelin.

  4. Run the Spark Streaming step, which reads the tweet stream from ZooKeeper and puts them into a temporary table that can be queried using SparkSQL.

  5. Run the Top Tweeters SQL query, which counts the number of tweets per user using the table created in the previous step. The table updates continuously as new tweets come in, so re-running the query will produce a different result every time.