Source, Sink, and Processor Configuration Values
Also available as:
PDF

Source Configuration Values

Table 1. Kafka
Configuration Field Description, requirements, tips for configuration
Cluster Name Mandatory. Service pool defined in SAM to get metadata information about Kafka cluster
Security Protocol Mandatory. Protocol to be used to communicate with kafka brokers. E.g. PLAINTEXT. Auto suggest with a list of protocols supported by Kafka service based on cluster name selected. If you select a protocol with SSL or SASL make sure to fill out the related config fields
Bootstrap Servers Mandatory. A comma separated string of host:port representing Kafka broker listeners. Auto suggest with a list of options based on security protocol selected above
Kafka topic Mandatory. Kafka topic to read data from. Make sure that corresponding schema for topic is defined in Schema Registry
Consumer Group Id Mandatory. A unique string that identifies the consumer group it belongs to. Used to keep track of consumer offsets
Reader schema version Optional. Version of schema for topic to read from. Default value is the version used by producer to write data to topic
Kerberos client principal Optional(Mandatory for SASL). Client principal to use to connect to brokers while using SASL GSSAPI mechanism for Kerberos(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL)
Kerberos keytab file Optional(Mandatory for SASL). Keytab file location on worker node containing the secret key for client principal while using SASL GSSAPI mechanism for Kerberos(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL)
Kafka service name Optional(Mandatory for SASL). Service name that Kafka broker is running as(used in case of security protocol being SASL_PLAINTEXT or SASL_SSL)
Fetch minimum bytes Optional. The minimum number of bytes the broker should return for a fetch request. Default value is 1
Maximum fetch bytes per partition Optional. The maximum amount of data per-partition the broker will return. Default value is 1048576
Maximum records per poll Optional. The maximum number of records a poll will return. Default value is 500
Poll timeout(ms) Optional. Time in milliseconds spent waiting in poll if data is not available. Default value is 200
Offset commit period(ms) Optional. Period in milliseconds at which offsets are committed. Default value is 30000
Maximum uncommitted offsets Optional.Defines the max number of polled records that can be pending commit, before another poll can take place. Default value is 10000000. This value should depend on the size of each message in Kafka and the memory available to the worker jvm process
First poll offset strategy Optional. Offset used by the Kafka spout in the first poll to Kafka broker. Pick one from enum values. ["EARLIEST", "LATEST", "UNCOMMITTED_EARLIEST", "UNCOMMITTED_LATEST"]. Default value is EARLIEST_UNCOMMITTED. It means that by default it will start from the earliest uncommitted offset for the consumer group id provided above
Partition refresh period(ms) Optional. Period in milliseconds at which Kafka will be polled for new topics and/or partitions. Default value is 2000
Emit null tuples? Optional. A flag to indicate if null tuples should be emitted to downstream components or not. Default value is false
First retry delay(ms) Optional. Interval delay in milliseconds for first retry for a failed Kafka spout message. Default value is 0
Retry delay period(ms) Optional. Retry delay period(geometric progression) in milliseconds for second and subsequent retries for a failed Kafka spout message. Default value is 2
Maximum retries Optional. Maximum number of times a failed message is retried before it is acked and committed. Default value is 2147483647
Maximum retry delay(ms) Optional. Maximum interval in milliseconds to wait before successive retries for a failed Kafka spout message. Default value is 10000
Consumer startup delay(ms) Optional. Delay in milliseconds after which Kafka will be polled for records. This value is to make sure all executors come up before first poll from each executor happens so that partitions are well balanced among executors and onPartitionsRevoked and onPartitionsAssigned is not called later causing duplicate tuples to be emitted. Default value is 60000
SSL keystore location Optional. The location of the key store file. Used when Kafka client connectivity is over SSL
SSL keystore location Optional. The store password for the key store file
SSL key password Optional. The password of the private key in the key store file
SSL truststore location Optional(Mandatory for SSL). The location of the trust store file
SSL truststore password Optional(Mandatory for SSL). The password for the trust store file
SSL enabled protocols Optional. Comma separated list of protocols enabled for SSL connections
SSL keystore type Optional. File format of keystore file. Default value is JKS
SSL truststore type Optional. File format of truststore file. Default value is JKS
SSL protocol Optional. SSL protocol used to generate SSLContext. Default value is TLS
SSL provider Optional. Security provider used for SSL connections. Default value is default security provider for JVM
SSL cipher suites Optional. Comma separated list of cipher suites. This is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. By default all the available cipher suites are supported
SSL endpoint identification algorithm Optional. The endpoint identification algorithm to validate server hostname using server certificate
SSL key manager algorithm Optional. The algorithm used by key manager factory for SSL connections. Default value is SunX509
SSL secure random implementation Optional. The SecureRandom PRNG implementation to use for SSL cryptographic operations
SSL trust manager algorithm Optional. The algorithm used by trust manager factory for SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. Default value is PKIX
Table 2. Event Hubs
Configuration Field Description, requirements, tips for configuration
Username The Event Hubs user name (policy name in Event Hubs Portal)
Password The Event Hubs password (shared access key in Event Hubs Portal)
Namespace The Event Hubs namespace
Entity Path The Event Hubs entity path
Partition Count The number of partitions in the Event Hubs
ZooKeeper Connection String The ZooKeeper connection string
Checkpoint Interval The frequency at which offsets are checkpointed
Receiver Credits Receiver credits
Max Pending Messages Per Partition The max pending messages per partition
Enqueue Time Filter The enqueue time filter
Consumer Group Name The consumer group name
Table 3. HDFS
Configuration Field Description, requirements, tips for configuration
Cluster Name Service pool defined in SAM to get metadata information about HDFS cluster
HDFS URL HDFS namenode URL
Input File Format The format of the file being consumed dictates the type of reader used to read the file. Currently only ‘com.hortonworks.streamline.streams.runtime.storm.spout.JsonFileReader’ is supported
Source Dir The HDFS directory from which to read the files.
Archive Dir Files from source dir will be moved to this HDFS location after being completely read.
Bad Files Dir Files from Source Dir will be moved to this HDFS location if there is an error encountered while processing them.
Lock Dir Lock files (used to synchronize multiple reader instances) will be created in this location. Defaults to a '.lock' subdirectory under the source directory.
Commit Frequency Count Records progress in the lock file after specified number of records are processed. Setting it to 0 disables this.
Commit Frequency Secs Records progress in the lock file after specified secs have elapsed. Must be greater than 0.
Max Outstanding Limits the number of unACKed tuples by pausing tuple generation (if ACKers are used in the topology).
Lock Timeout Seconds Duration of inactivity after which a lock file is considered to be abandoned and ready for another spout to take ownership.
Ignore Suffix File names with this suffix in the source dir will not be processed.