Also available as:

Chapter 2. SmartSense Architecture

The Hortonworks SmartSense Tool (HST):

  1. Collects cluster diagnostic information to help you troubleshoot support cases.

  2. Automatically captures and uploads bundles that are used to produce customized recommendations for your cluster on areas of improvement, such as performance, operational stability, and security.

  3. Allows you to automatically apply recommendations (where possible).

  4. Reports, analyzes, and visualizes cluster activity.

SmartSense is automatically included in Ambari 2.2.x and later. The integration between Ambari and SmartSense is facilitated by the Ambari stack and views extension mechanisms. These extensions enable you to add SmartSense as a native Ambari service, and they automatically deploy an Ambari view, enabling you to quickly capture data using the Ambari web UI.

Cluster Diagnostic Collection

The HST agents capture, anonymize, and encrypt cluster diagnostic data, and then send it to the central HST server to coalesce into a single downloadable file called a bundle. The HST agent processes are short-lived services that are started only for specific data capture tasks. To provide the most complete picture of cluster utilization, HST agents must be installed on every node in the cluster. After an HST agent has captured the requested data from the host it is installed on, the process exits.

The following image illustrates the communication between HST agents and the HST server:

SmartSense anonymizes and encrypts the diagnostic information captured in the bundle. For more information about extending the anonymization process with site-specific rules, see Configure Anonymization Rules with Ambari or Configure Anonymization Rules in a Non-Ambari Environment.

There are two types of bundles: one for ad-hoc troubleshooting of support cases, and the other for proactive analysis and recommendations.

Support Case Troubleshooting Bundles

Bundles captured for troubleshooting contain configuration and metrics for each node in the cluster, and logs for only the subset of services and hosts that you chose before initiating the capture process. Additionally, they may contain application logs if collection is for a YARN application or a Hive query. The purpose of these bundles is to provide support engineers with basic diagnostic information that can help them understand the state of your cluster so that they can troubleshoot and quickly resolve issues.

Proactive Analysis Bundles

Bundles captured for analysis contain configuration and metrics for each node in the cluster, but do not contain any logs. Their purpose is to produce recommendations for changing your cluster configuration to ensure better security, performance, and operations. These recommendations are available in the SmartSense View in Ambari Web UI and in the SmartSense tab on the Hortonworks Support Portal.

For more information about bundles, see Bundle Content and Bundle Security

Bundle Content

SmartSense collects the following types of data:

  • Operating system:

    • Configuration (partition layouts, file system mount options, key service status, network configurations, and so on)

    • Metrics (CPU, memory, I/O statistics, network statistics, and so on)

    • Logs (system messages and driver messages)

  • Hortonworks Data Platform (HDP) service:

    • Configuration

    • Metrics (JMX reports and installed packages)

    • Logs (only for support case troubleshooting: not for SmartSense analysis)

    • Summary of cluster activity

When using SmartSense to capture support case troubleshooting bundles for issues with YARN applications or Hive queries, SmartSense captures additional data.

YARN application capture:

  • Job configuration

  • Job counters

  • Job recommendations

  • Job summary

  • Job logs

  • Task counters

  • Task summary

Hive Query capture:

  • Query plan

  • Explain plain

  • set –v output

  • HS2 HA znode info

  • Hive operations log

  • YARN logs

To see data and files that are captured in your specific environment, perform a capture and then download the unencrypted bundle. To see a step-by-step example of how to do this, refer to the How to inspect SmartSense bundle contents Hortonworks Community Connection post. If any files contain information that you would like to remove, replace, or anonymize, refer to Configuring Data Anonymization Rules.

Bundle Security

Hortonworks takes security seriously. Multiple levels of provisions ensure that sensitive data is protected:

  • Anonymization and exclusions:

    • IP addresses and host names are always anonymized.

    • Passwords are not collected.

  • Encryption:

    • SmartSense analysis bundles are encrypted using AES-256 and RSA-2048 encryption.

  • Further customizations:

    • You can configure custom anonymization rules to include environment-specific patterns.

    • By default, all IP addresses, the domain component of fully qualified domain names, and S3 and WASB access keys are anonymized.

    • You can add custom configuration to exclude files and from collection.

Bundles sent to the Hortonworks SmartSense analysis environment are stored in their original anonymized and encrypted form for 90 days before being removed. Specific metadata, such as Apache Ambari and HDP stack version, node count, and amount of storage available and used, are stored for trending rules analysis. Recommendations generated for each bundle are available through the Hortonworks Support Portal and are stored for feedback purposes and used to improve future recommendations.

Services Available for Capture

The following services can be captured for troubleshooting:

  • Accumulo

  • Ambari

  • AmbariInfra

  • Ambari Metrics

  • Atlas

  • Cloudbreak

  • Druid

  • Falcon

  • HBase

  • HDFS

  • Hive

  • Kafka

  • Knox

  • Log Search

  • MapReduce2

  • NiFi


    Capturing NiFi is only supported when NiFi is installed as part of HDP.

  • Oozie

  • Pig

  • Ranger

  • SmartSense

  • Spark

  • Spark2

  • Sqoop

  • Storm

  • Superset

  • Tez

  • YARN

  • Zeppelin

  • Zeppelin Notebook

  • ZooKeeper

Bundle Transport

After a bundle has been captured, there are three ways to upload that bundle to Hortonworks:

Automated Bundle Upload

Depending on the availability of outbound internet access, you have two choices for automated bundle upload. If the HST server host has outbound internet access, you can configure it to automatically upload captured bundles to Hortonworks. In this case, bundles are uploaded automatically over HTTPS from the HST server to the externally hosted Hortonworks SmartSense environment. If the HST server does not have outbound internet access, you can deploy a standalone SmartSense Gateway to forward bundles to the hosted Hortonworks SmartSense environment.

HST Server

After a bundle has been captured, the HST server attempts to upload bundles to the Hortonworks hosted environment over HTTPS by default. This upload succeeds if your HST server host has outbound internet access. If your HST server host does not have outbound internet access, you have two options. If the HST server host can use a corporate HTTP proxy to upload bundles, you can configure your HST server host to do so using Configuring Bundle Upload, or you can use the SmartSense Gateway.

The following image illustrates bundle upload using the HST server:

SmartSense Gateway

For those whose HST server hosts do not have outbound internet access, Hortonworks created the SmartSense Gateway, which simplifies uploading bundles to Hortonworks. You can deploy a single gateway that supports multiple internal HST server deployments. In this deployment scenario, you do not need direct outbound internet access from the HST server to upload bundles. You need access only from the HST server to the gateway, and the gateway uploads all bundles to Hortonworks Support or to the SmartSense environment for SmartSense analysis.

The following image illustrates bundle upload using the SmartSense Gateway:

Manual Bundle Upload

If you are just getting started with SmartSense, you might still be waiting on your security or network operations resources to provide the necessary access for the HST server or the SmartSense Gateway to send bundles. If you are in this situation, you can manually upload bundles via HTTPS.

After a bundle has been captured, you can go to SmartSense view in Ambari and download the bundle onto your desktop. You can then navigate to and log in using the credentials and steps specified in the following article: (To view this article, you need a valid Hortonworks support account).

Activity Analysis

Activity Analyzer and Activity Explorer provide job utilization metric aggregation, reporting, and visualization for YARN-based workloads.

Activity Analyzer

Activity Analyzer communicates with YARN Application Timeline Server v1.5 and later, and with Hadoop Distributed File System (HDFS) to consume MapReduce history data. It aggregates and transforms this data, and stores it in the Ambari Metrics Collector.

Activity Explorer

Activity Explorer includes an embedded instance of Apache Zeppelin, which hosts prebuilt notebooks that visualize cluster utilization data for YARN, Apache Hive or Apache Tez, and MapReduce workloads. Specifically, Activity Explorer includes data related to user, queue, job duration, and job resource consumption.

The following image illustrates how Activity Analyzer sends aggregated job history data to Ambari Metrics Collector, which makes that data available to Activity Explorer: