General Purpose Parsers
The general-purpose parser is primarily designed for lower-velocity topologies or for quickly setting up a temporary parser for a new telemetry.
General purpose parsers are defined using a config file, and you need not recompile the topology to change them. HCP supports two general purpose parsers: Grok and CSV.
Grok parser
The Grok parser class name (parserClassName) is
org.apache.metron,parsers.GrokParser
.
Grok has the following entries and predefined patterns for
parserConfig
:
-
grokPath
-
The path in HDFS (or in the Jar) to the grok statement. By default attempts to load from HDFS, then falls back to the classpath, and finally throws an exception if unable to load a pattern.
-
patternLabel
-
The pattern label to use from the Grok statement.
-
timestampField
-
The field to use for timestamp. If your data does not have a field exactly named "timestamp" this field is required, otherwise the record will not pass validation. If the timestampField is included in the list of timeFields, it will first be parsed using the provided dateFormat.
-
timeFields
-
A list of fields to be treated as time.
-
dateFormat
-
The date format to use to parse the time fields. Default is "yyyy-MM-dd HH:mm:ss.S z".
-
timezone
-
The timezone to use.
UTC
is the default.
CSV Parser
The CSV parser class name (parserClassName) is
org.apache.metron.parsers.csv.CSVParser
CSV has the following entries and predefined patterns for
parserConfig
:
-
timestampFormat
-
The date format of the timestamp to use. If unspecified, the parser assumes the timestamp is starts at UNIX epoch.
-
columns
-
A map of column names you wish to extract from the CSV to their offsets. For example,
{ 'name' : 1,'profession' : 3}
would be a column map for extracting the 2nd and 4th columns from a CSV. -
separator
-
The column separator. The default value is ",".
JSON Map Parser
The JSON parser class name (parserClassName) is
org.apache.metron.parsers.csv.JSONMapParser
JSON has the following entries and predefined patterns for
parserConfig
:
- mapStrategy
-
A strategy to indicate how to handle multi-dimensional Maps. This is one of:
-
DROP
-
Drop fields which contain maps
-
UNFOLD
-
Unfold inner maps. So
{ "foo" : { "bar" : 1} }
would turn into{"foo.bar" : 1}
-
ALLOW
-
Allow multidimensional maps
-
ERROR
-
Throw an error when a multidimensional map is encountered
-
-
timestamp
-
This field is expected to exist and, if it does not, then current time is inserted.
- jsonQuery
- If this JSON query string is present, the result of the query will be a list of messages. This is useful if you have a JSON document that contains a list or array of messages embedded in it, and you do not have another means of splitting the message.