Getting Started with Streaming Analytics
Also available as:
PDF
loading table of contents...

Chapter 2. Prepare Your Environment

Deploying Your HDF Clusters

About This Task

Now that you have reviewed the reference architecture and planned the deployment of your trucking application, you can begin installing HDF according to your use case specifications. To fully build the trucking application as described in this Getting Started with Stream Analytics document, use the following steps.

Steps

  1. Install Ambari 2.5.1.

  2. Install HDP 2.6.1 Cluster via Ambari.

  3. Install HDF 3.0 Management Pack.

  4. Update HDF 3.0 Base URL.

  5. Add HDF 3.0 Services to HDP 2.6.1 cluster.

Find instructions for these installation steps in Installing HDF Services on a New HDP Cluster.

More Information

Planning Your Deployment

Registering Schemas in Schema Registry

The trucking application streams raw events that are serialized into Avro from the two sensors to its respective Kafka topics. NiFi consumes from these topics, and then routes, enriches, and delivers them to another set of Kafka topics for consumption by the streaming anlatyics applications. To do this, you must perform the following tasks:

  • Creating the 4 Kafka topics

  • Registering Schemas for each of the Kafka topics in the Schema Registry

Create the Kafka Topics

About This Task

Kafka topics are categories or feed names to which records are published.

Steps

  1. Log into the node where Kafka broker is running.

  2. Create the Kafka topics using the following commands:

    cd /usr/[hdf/\hdp]current/kafka-broker/bin/
    
    ./kafka-topics.sh \
    --create \
    --zookeeper <zookeeper-host>:2181 \
    --replication-factor 2 \
    --partition 3 \
    --topic raw-truck_events_avro
    
    ./kafka-topics.sh \
    --create \
    --zookeeper <zookeeper-host>:2181 \
    --replication-factor 2 \
    --partition 3 \
    --topic raw-truck_speed_events_avro
    
    ./kafka-topics.sh \
    --create \
    --zookeeper <zookeeper-host>:2181 \
    --replication-factor 2 \
    --partition 3 \
    --topic truck_events_avro
    
    
    ./kafka-topics.sh \
    --create \
    --zookeeper <zookeeper-host>:2181 \
    --replication-factor 2 \
    --partition 3 \
    --topic truck_speed_events_avro
    

More Information

Apache Kafka Component Guide

Register Schemas for the Kafka Topics

About This Task

Register the schemas for the 2 Kafka topics that NiFi will consume from and the two other Kafka topics that NiFi will publish the enriched events to. Registering the Kafka topic schemas is benefiicial in several ways. Schema Registry provides a centralized schema location, allowing you to stream records into topics without having to attach the schema to each record.

Steps

  1. Go to the Schema Registry UI by selecting the Registry service in Ambari and under 'Quick Links' selecting 'Registry UI'

  2. Click the "+" button to add a schema, schema group and schema metadata for the Raw Geo Event Sensor Kafka topic:

    • Name = raw-truck_events_avro

    • Description = Raw Geo events from trucks in Kafka Topic

    • Type = Avro schema provider

    • Schema Group = truck-sensors-kafka

    • Compatibility: BACKWARD

    • Check the evolve check box

    • Copy the schema from here and paste it into the Schema Text area.

    • Click Save

  3. Click the "+" button to add a schema, schema group (exists from previous step), and schema metadata for the Raw Speed Event Sensor Kafka topic:

    • Name = raw-truck_speed_events_avro

    • Description = Raw Speed Events from trucks in Kafka Topic

    • Type = Avro schema provider

    • Schema Group = truck-sensors-kafka

    • Compatibility: BACKWARD

    • Check the evolve check box

    • Copy the schema from here and paste it into the Schema Text area.

    • Click Save

  4. Click the "+" button to add a schema, schema group and schema metadata for the Geo Event Sensor Kafka topic:

    • Name = truck_events_avro

    • Description = Schema for the Kafka topic named 'truck_events_avro'

    • Type = Avro schema provider

    • Schema Group = truck-sensors-kafka

    • Compatibility: BACKWARD

    • Check the evolve checkbox

    • Copy the schema from here and paste it into the Schema Text area.

    • Click Save

  5. Click the "+" button to add a schema, schema group (exists from previous step), and schema metadata for the Speed Event Sensor Kafka topic:

    • Name = truck_speed_events_avro

    • Description = Schema for the Kafka topic named 'truck_speed_events_avro'

    • Type = Avro schema provider

    • Schema Group = truck-sensors-kafka

    • Compatibility: BACKWARD

    • Check the evolve check box

    • Copy the schema from here and paste it into the Schema Text area.

    • Click Save.

More Information

If you want to create these schemas programmatically using the Schema Registry client via REST rather than through the UI, you can find examples at this Github location.