Getting Started with Streaming Analytics
Also available as:
PDF
loading table of contents...

Streaming Violation Events into a Data Lake and Operational Data Store

About This Task

Another common requirement is to stream data into an operational data store like HBase to power real-time web apps as well as a data lake powered by HDFS for long term storage and batch etl and analytic.

Steps

  1. You will need ot have HBase service running. This can be easily done by adding the HDP HBase Service via Ambari. Create a new HBase table by logging into an node where Hbase client is installed then execute the below commands

    cd /usr/hdp/current/hbase-client/bin
    
    /hbase shell
    
    create 'violation_events', {NAME=> 'events', VERSIONS => 3} ;
    
  2. Create the following directory in HDFS and give it access to all users. Log into a node where HDFS client is installed and execute the below commands

    su hdfs
    
    hadoop fs -mkdir /apps/trucking-app
    
    hadoop fs -chmod 777 /apps/trucking-app
    
  3. Drag the HBase sink to the canvas and connect it to the ViolationEvents Rule processor.

  4. Configure the Hbase Sink as below.

  5. Drag the HDFS sink to the canvas and connect it to the ViolationEvents Rule processor.

  6. Configure HDFS as below. Make sure you have permissiosn to write into the directory you have configired for HDFS path.