Release Notes for 1.9.9

Release notes for DC/OS 1.9.9

DC/OS 1.9.9 was released on July 10, 2018.

DC/OS 1.9.9 includes the following:

Issues Fixed in DC/OS 1.9.9

  • COPS-2041/DCOS-21467 - DC/OS UI: Fixed labels reducer to accept empty strings.
  • COPS-3180 - Fixed odd exit behavior for tasks that use a volume mount (via rexray). Enterprise
  • DCOS-14199 - Consolidated the Exhibitor bootstrapping shortcut by atomically reading and writing the ZooKeeper PID file. Enterprise
  • DCOS-22171/DCOS-22184 - Admin Router DNS entry TTL is overwritten with 5 seconds. This fixes the issue of the leader.mesos Adminrouter Mesos state cache entry not being updated strictly within the expected 30 seconds. The faulty behavior led to delayed recognition of Mesos leader change by Adminrouter.
  • DCOS-22399/DCOS-22346 - DC/OS UI: Added privacy policy link to the login modal.
  • DCOS_OSS-847 - Customized ip_detect_public_filename in config.yaml. This fixes an issue where supplying a custom ip-detect-public script is not respected during the DC/OS installation.

Security Enhancements in DC/OS 1.9.9

  • DCOS-21557/DCOS_OSS-2367 - Updated cURL to version 7.59.

About DC/OS 1.9

DC/OS 1.9 includes many new capabilities for Operators, and expands the collection of Data and Developer Services with a focus on:

  • Tools for Production Operations - Monitoring and troubleshooting for distributed apps.
  • Broader Workload Support - From traditional apps to machine learning.
  • Security - New CLI capabilities, enhanced LDAP support, and many small improvements. Enterprise
  • New data and developer services.

Breaking Changes

The DC/OS Identity and Access Management (IAM) SAML service provider implementation no longer accepts transient subject NameIDs.

New Features and Capabilities

Apache Mesos 1.2.2 and Marathon 1.4.8 integrated

  • Marathon 1.4.8 release notes.
  • Apache Mesos 1.2.2 CHANGELOG. Patches from the forthcoming Apache Mesos 1.2.3 are included.

Container Orchestration

Added support for pods, GPUs, and made significant scalability improvements.

Pods Preview

Multiple co-located containers per instance, scheduled on the same host. For more information, see the documentation.

GPU Preview

  • Leverage GPUs to run novel algorithms.
  • Because DC/OS GPU support is compatible with nvidia-docker, you can test locally with nvidia-docker and then deploy to production with DC/OS.
  • Allocate GPUs on a per container basis, including isolation guarantees

For more information, see the documentation.

DC/OS Monitoring and Operations

Remote Process Injection for Debugging Preview

The new dcos task exec command allows you to remotely execute a process inside the container of a deployed Mesos task by providing the following features:

  • An optional --interactive flag for interactive sessions.
  • Attach to a remote pseudoterminal (aka PTY) inside a container via the optional --tty flag.
  • Combine the --interactive and --tty flags to launch an interactive bash session or to run top and see the resource usage of your container in real time.

For more information, see the debugging documentation.

Logging Preview

Stream task and system logs to journald by setting the mesos_container_log_sink install-time parameter to journald or journald+logrotate. This allows you to do the following tasks:

  • Include task metadata like container ID in your queries to more easily locate the logs that you want.
  • Use the new DC/OS CLI commands dcos node log and dcos task log to query logs. You can also make HTTP requests directly against the new Logging API.
  • Set up log aggregation solutions such as Logstash to get logs into their aggregated storage solutions.

For more information, see the documentation.

Metrics Preview

  • Node-level HTTP API that returns metrics from tasks, cgroup allocations per container, and host level metrics such as load and memory allocation.
  • StatsD endpoint in every container for forwarding metrics to the DC/OS metrics service. This service is what exposes the HTTP API.
  • Any metric sent to STATSD_UDP_HOST/PORT is available in the HTTP API’s /container/<container_id>/app endpoint.

For more information, see the documentation.

Tool for Troubleshooting Service Deployment Failures

  • The new service deployment troubleshooting tool allows you to find out why your applications are not starting from the GUI and CLI.

    Service deploy GUI

Improved GUI

  • Improved navigation.

    New GUI

  • Usability improvements to the service create workflow.

    Improved GUI

Networking Services

  • CNI support for 3rd party CNI plugins.
  • Performance improvements across all networking features.

Security and Governance Enterprise

  • DC/OS Identity and Access Management (IAM) highlights: Enterprise

    • LDAP group import: Support importing posixGroup objects according to RFC2307 and RFC2307bis, and ensure compatibility with FreeIPA and OpenLDAP. Enterprise
    • SAML 2.0: Ensure that the authentication flow works against Shibboleth and improve compatibility with a wide range of identity provider configurations. Enterprise
    • OpenID Connect: Ensure that the authentication flow works against dex and Azure Active Directory. The authentication flow must allow customizing the identity provider certificate verification in back-channel communication. Enhance configuration validation for a better user experience. Enterprise
  • DC/OS CLI highlights: Enterprise

    • Support single sign-on authentication via OpenID Connect and SAML 2.0 against the DC/OS IAM. Enterprise
    • Support authentication with service account credentials. Enterprise
  • Introduce various secrets improvements. For more information, see the secrets documentation. Enterprise

  • Security hardening across the platform, including Mesos, Marathon, and Admin Router. Enterprise

Developer Services

  • Jenkins

    • The Jenkins DC/OS service will now work with DC/OS clusters in strict mode. Enterprise
    • Marathon plugin now supports service accounts, allowing easy automated and secure deploys to DC/OS clusters. Enterprise

Other Improvements

DC/OS Internals

  • Update DC/OS internal JDK to 8u112 for security fixes.
  • Update DC/OS internal Python from 3.4 to 3.5. Enterprise
  • The dcos_generate_config.sh --aws-cloudformation command will automaticlally determine the region of the s3 bucket and prevent region mistakes.
  • Added dcos-shell which activates the DC/OS environment for running other DC/OS command line tools. Enterprise
  • Added the reset-superuser script which attempts to create or restore superuser privileges for a given DC/OS user. Enterprise

Expanded OS Support Enterprise

Expanded Docker Engine Support Enterprise

  • Docker 1.12 and 1.13 are now supported. Docker 1.13 is the default version.

Upgrades Enterprise

Improved upgrade tooling and experience for on-premise installations. Upgrades now use internal DC/OS APIs to ensure nodes can be upgraded with minimal disruption to running DC/OS services on a node. The upgrade procedure has also been simplified to improve user experience.

For more information, see the documentation.

Known Issues and Limitations

  • DCOS_OSS-691 - DNS becomes unavailable during DC/OS version upgrades.

  • DCOS-14005 - Marathon-LB does not support pods.

  • DCOS-14021 - Task logging to journald is disabled by default. Task logs will continue to be written to their sandboxes, and logrotated out. The dcos task log command is an active command.

  • DCOS-16737 - You cannot generate and publish AWS Advanced Templates to AWS GovCloud regions. Enterprise The following error occcurs when you run the command dcos_generate_config.ee.sh --aws-cloudformation with GovCloud credentials:

    $ ./dcos_generate_config.sh --aws-cloudformation
    ====> EXECUTING AWS CLOUD FORMATION TEMPLATE GENERATION
    Generating configuration files...
    Starting new HTTPS connection (1): s3.amazonaws.com
    aws_template_storage_region_name: Unable to determine region location of s3 bucket testbucket: An error occurred (InvalidAccessKeyId) when calling the GetBucketLocation operation: The AWS Access Key Id you provided does not exist in our records.
    
  • Marathon-7133 - Marathon application history is lost after Marathon restart.

  • CORE-1191 - The Mesos master’s event queue can get backlogged with the default settings thus causing performance problems. These can be mitigated by setting the following configuration parameter in your config.yaml file at install time. See the Configuration Reference for more information.

Note: Lowering this parameter also reduces the number of tasks per framework that the dcos task subcommands can access for debugging. If you run a framework with many short tasks, such as Spark, you may not want to reduce this value.

mesos_max_completed_tasks_per_framework: 20