Hortonworks Cybersecurity Platform
Also available as:
PDF
loading table of contents...

Streaming Data into HCP Overview

The first task in adding a new telemetry data source is to stream all raw events from that source into its own Kafka topic.

Although HCP includes parsers for several data sources (for example, Bro, Snort, and YAF), you must still stream the raw data into HCP through a Kafka topic.

If you choose to use the Snort telemetry data source, you must meet the following configuration requirements:

  • When you install and configure Snort, to ensure proper functioning of indexing and analytics, configure Snort to include the year in the timestamp by modifying thesnort.conf file as follows:

    # Configure Snort to show year in timestamps
    config show_year
  • By default, the Snort parser is configured to use ZoneId.systemDefault() for the source timeZone for the incoming data and MM/dd/yy-HH:mm:ss.SSSSSS as the default dateFormat. Valid timezones are defined in Java's ZoneId.getAvailableZoneIds(). DateFormats should use the options defined in https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html. The following sample configuration shows the dateFormat and timeZone values explicitly set in the parser configuration:

    "parserConfig": {
    "dateFormat" : "MM/dd/yy-HH:mm:ss.SSSSSS",
     "timeZone" : "America/New_York"

Depending on the type of data you are streaming into HCP, you can use one of the following methods:

  • NiFi

    This streaming method works for most types of data sources. To use it with HCP, you must install it manually on port 8089. For information on installing NiFi, see the NiFi documentation.

    Important
    Important

    NiFi cannot be installed on top of HDP, so you must install NiFi manually to use it with HCP.

  • Performant network ingestion probes

    This streaming method is ideal for streaming high-volume packet data.

  • Real-time and batch threat intelligence feed loaders

    This streaming method works for intelligence feeds that you want to view in real-time or collect batches of information to view or query at a later date.