Overview
Also available as:
PDF

Streaming Analytics Manager Personas

Four main modules within Streaming Analytics Manager offer services to different personas in an organization:

User PersonaModuleModule Features and Functionality
IT Engineer, Operations Engineer, Platform Engineer, Platform OperatorStream Management
  • Create service pools and environments.

  • Provision, manage and monitor stream apps.

  • Scale out or scale in stream apps based on resource consumption.

Application DeveloperStream Builder
  • The Stream Builder tool assists in building analytic-focused stream apps.

  • The tool creates streams for event correlation, context enrichment, complex pattern matching, and aggregation. It can create alerts and notifications when patterns are detected and insights are discovered.

  • The interface uses a drag-and-drop visual programming paradigm.

Business Analyst, Data AnalystStream Insight Superset
  • The Stream Insight tool assists in generating time-series and real-time analytics dashboards, charts, and graphs of metrics, alerts and notifications.

  • The tool provides interactive, ad-hoc analytics. You can issue ad-hoc queries, perform multidimensional analyses, and visualize the results in rich configurable dashboards.

  • The tool offers a self-service ability to create alerts and notification dashboards based on insights derived from the real-time streaming data flows.

SDK DeveloperUnified Streaming API
  • The unified streaming API abstracts out the underlying streaming engine, making it more straightforward to implement custom components. Initial support is for Storm.

The following subsections describe responsibilities for each persona. For additional information, see the following chapters in this guide:

PersonaChapter Reference
IT Engineer, Operations Engineer, Platform Engineer, Platform Operator

Installing and Configuring Streaming Analytics Manager

Managing Stream Apps

Application Developer

Running the Sample App

Building an End-to-End Stream App

Business Analyst, Data AnalystCreating Visualizations: Insight Slices
SDK DeveloperAdding Custom Builder Components

Platform Operator Persona

A platform operator typically manages the Streaming Analytics Manager platform, and provisions various services and resources for the application development team. Common responsibilities of a platform operator include:

  • Installing and managing the Streaming Analytics Manager application platform.

  • Provisioning and providing access to services (e.g big data services like Kafka, Storm, HDFS, HBase) for use by the development team when building stream apps.

  • Provisioning and providing access to environments such as development, testing, and production, for use by the development team when provisioning stream apps.

Services, Service Pools and Environments

To perform these responsibilities, a platform operator works with three important abstractions in Streaming Analytics Manager:

  • Service is an entity that an application developer works with to build stream apps. Examples of services could be a Storm cluster that the stream app will be deployed to, a Kafka cluster that is used by the stream app to create a streams, or a HBase cluster that the stream app writes to.

  • Service Pool is a set of services associated with an Ambari managed cluster

  • Environment is a named entity that represents a set of services chosen from different service pools. A stream app is assigned to an environment and the app can only use the services associated with an environment.

The following diagram illustrates these constructs:

The Service, Service Pool, and Environment abstractions provide the following benefits:

  1. Simplicity and ease of use: An application developer can use the Service abstraction without needing to focus on configuration details. For example, to deploy a stream app to a Storm cluster, the developer does not need to consider how to configure the Storm cluster (Nimbus host, ports, and so on). Instead, the developer simply selects the Storm service from the environment associated with the app. The service abstract out all the details/complexities.

  2. Ease of propagating a stream app between environments: With Service as an abstraction, it is easy for the stream operator or application developer to move a stream app from one environment to another. They simply export the stream app and import it into a different environment.

More Information

See Managing Stream Apps for more information about creating and managing the Streaming Analytics Manager environment.

Application Developer Persona

The application developer uses the Stream Builder component to design, implement, deploy, and debug stream apps in Streaming Analytics Manager.

The following subsections describe component building blocks and schema requirements.

More Information

Component Building Blocks

Stream Builder offers several building blocks for stream apps: sources, processors, sinks, and custom components.

Sources

Source builder components are used to create data streams. SAM has the following sources:

  • Kafka

  • Azure Event Hub

  • HDFS

Processors

Processor builder components are used to manipulate events in the stream.

The following table lists processors that are available with Streaming Analytics Manager.

Processor NameDescription
Join
  • Joins two streams together based on a field from each stream.

  • Two join types are supported: inner and left.

  • Joins are based on a window that you can configure based on time or count.

Rule
  • Allows you to configure rule conditions that route events to different streams.

  • Standard conditional operators are supported for rules.

  • Configuring a rule has two modes:

    • General: Guided rule creation using drop-down menus.

    • Advanced: Write complex SQL to construct a rule.

  • Rules are translated to SQL to be applied on the stream.

  • An event goes through all the conditions and if it matches multiple rules the event is sent to all the matching output streams.

Aggregate
  • Performs functions over windows of events.

  • Two types of windows are supported: tumbling and sliding.

  • You can create window criteria based on time interval and count.

  • Window functions supported out of the box include: stddev, stddevp, variance, variancecep, avg, min, max, sum, count. The system is extensible to add custom functions as well.

Projection
  • Applies transformations to the events in the stream

  • Extensive set of OOO functions and the ability to add your own functions

Branch
  • Performs a standard if-else construct for routing.

  • The even is routed to the first rule it matches. Once an event has matched a rule, no further condition search is permformed.

PMML
  • Executes a PMML model that is stored in the Model Registry. PMML has been minimally tested as part of the Tech Preview, and should not be used.

Sinks

Sink builder components are used to send events to other systems.

Streaming Analytics Manager supports the following sinks:

  • Kafka

  • Druid

  • HDFS

  • HBase

  • Hive

  • JDBC

  • OpenTSDB

  • Notification (OOO support Kafka and the ability to add custom notifications)

  • Cassandra

  • Solr

Custom Components

For more information about developing custom components, see SDK Developer Persona.

Schema Requirements

Unlike NiFi (the flow management service of the HDF platform), Streaming Analytics Manager requires a schema for stream apps. More specifically, every Builder component requires a schema to function.

The primary data stream source is Kafka, which uses the HDF Schema Registry.

The Builder component for Apache Kafka is integrated with the Schema Registry. When you configure a Kafka source and supply a Kafka topic, Streaming Analytics Manager calls the Schema Registry. Using the Kafka topic as the key, Streaming Analytics Manager retrieves the schema. This schema is then displayed on the tile component, and is passed to downstream components.

Analyst Persona

A business analyst uses the Streaming Analytics Manager Stream Insight module to create time-series and real-time analytics dashboards, charts and graphs; and create rich customizable visualizations of data.

Stream Insight Key Concepts

The following table describes key concepts of the Stream Insights module.

Stream Insight ConceptDescription
Analytics Engine
  • Stream Insight analytics engine is powered by Druid, an open source data store designed for OLAP queries on event data.

  • Data can be streamed into the Analytics engine via the Druid/Analytics Engine Sink that app developers can use when building streaming apps. The analytics engine sink can stream data into new/existing insight cubes.

Insight Data Source
  • A insight data source is powered by Druid that represents the store for streaming data. The cube can be queried to do rollups, aggregations and other powerful analytics

Insight Slice
  • A visualization that can be created from asking questions of the data source. An insight can be added to the dashboard

Dashboard
  • Consists of a set of slices. Dashboards are created by the Business analysts to perform descriptive analytics

A business analyst can create a wide array of visualizations to gather insights on streaming data.

The platform supports over 30+ visualizations the business analyst can create.

More Information

SDK Developer Persona

Streaming Analytics Manager supports the development of custom functionality through the use of its SDK.

More Information

Adding Custom Builder Components