Appendix A. Stellar Language Functions
This section provides Stellar language functions supported by Hortonworks Cybersecurity Package (HCP) powered by Apache Metron.
The Stellar query language supports the following:
Referencing fields in the enriched JSON
String literals are quoted with either
'
or"
String literals support escaping for
'
,"
,\t
,\r
,\n
, and backslashThe literal
'\'foo\''
would represent'foo'
The literal
"\"foo\""
would represent"foo"
The literal
'foo \\ bar'
would representfoo \ bar
Simple boolean operations:
and
,not
,or
Simple arithmetic operations:
*
,/
,+
,
on real numbers or integersSimple comparison operations
<
,>
,<=
,>=
Simple equality comparison operations
==
,!=
if/then/else comparisons (for example,
if var1 < 10 then 'less than 10' else '10 or more'
)Determining whether a field exists (via
exists
)An
in
operator that works like thein
in PythonThe ability to have parenthesis to make order of operations explicit
User defined functions, including Lambda expressions
The following keywords need to be single quote escaped in order to be used in Stellar expressions:
Stellar Language Inclusion Checks (in
and not
in
)
"in" supports string contains. e.g.,
'foo' in 'foobar' == true
"in" supports collection contains. e.g.,
'foo' in [ 'foo', 'bar' ] == true
"in" supports map key contains. e.g.,
'foo' in { 'foo' : 5} == true
"not in" is the negation of the in expression. e.g.,
'grok' not in 'foobar' == true`
Stellar Language Comparisons (<
, <=
,
>
, >=
)
If either side of the comparison is null then return false.
If both values being compared implement number then the following:
If either side is a double then get double value from both sides and compare using given operator.
Else if either side is a float then get float value from both sides and compare using given operator.
Else if either side is a long then get long value from both sides and compare using given operator.
Otherwise get the int value from both sides and compare using given operator.
If both sides are of the same type and are comparable then use the compareTo method to compare values.
If none of the above are met then an exception is thrown.
Stellar Language Equality Check (==
,
!=
)
Below is how the ==
operator is expected to work:
1. If either side of the expression is null then check equality using Java's `==` expression.
Else if both sides of the expression are of Java's type Number then:
If either side of the expression is a double then use the double value of both sides to test equality.
Else if either side of the expression is a float then use the float value of both sides to test equality.
Else if either side of the expression is a long then use long value of both sides to test equality.
Otherwise use int value of both sides to test equality
Otherwise use equals method compare the left side with the right side.
The `!=` operator is the negation of the above.
Stellar Language Equality Check (==
,
!=
)
Below is how the ==
operator is expected to work:
If either side of the expression is null then check equality using Java's
==
expression.Else if both sides of the expression are of Java's type Number then:
If either side of the expression is a double then use the double value of both sides to test equality.
Else if either side of the expression is a float then use the float value of both sides to test equality.
Else if either side of the expression is a long then use long value of both sides to test equality.
Otherwise use int value of both sides to test equality.
Otherwise use equals method compare the left side with the right side.
The !=
operator is the negation of the above.
Stellar Language Lambda Expressions
Stellar provides the capability to pass lambda expressions to functions which wish to support that layer of indirection. The syntax is:
(named_variables) > stellar_expression
: Lambda expression with named variablesFor instance, the lambda expression which calls
TO_UPPER
on a named argumentx
could be expressed as(x) > TO_UPPER(x)
.var > stellar_expression
: Lambda expression with a single named variable,var
For instance, the lambda expression which calls
TO_UPPER
on a named argumentx
could be expressed asx > TO_UPPER(x)
. Note, this is more succinct but equivalent to the example directly above.
() > stellar_expression
: Lambda expression with no named variables.If no named variables are needed, you may omit the named variable section. For instance, the lambda expression which returns a constant
false
would be() > false
where
named_variables
is a comma separated list of variables to use in the Stellar expressionstellar_expression
is an arbitrary stellar expression
In the core language functions, we support basic functional programming primitives such as
MAP
 Applies a lambda expression over a list of input. For instanceMAP([ 'foo', 'bar'], (x) > TO_UPPER(x) )
returns[ 'FOO', 'BAR' ]
FILTER
 Filters a list by a predicate in the form of a lambda expression. For instanceFILTER([ 'foo', 'bar'], (x ) > x == 'foo' )
returns[ 'foo' ]
REDUCE
 Applies a function over a list of input. For instanceREDUCE([ 1, 2, 3], (sum, x) > sum + x, 0 )
returns6
Table A.2. Stellar Language Functions
Function  Description  Input  Returns 

ABS  Returns the absolute value of a number  number  The number to take the absolute value of  The absolute value of the number passed in. 
APPEND_IF_MISSING  Appends the suffix to the end of the string if the string does not already end with any of the suffixes. 
 A new string if prefix was prepended, the same string otherwise. 
BIN  Computes the bin that the value is in given a set of bounds 
 Which bin N the value falls in such that bound(N1) <value <= bound(N). No min and max bounds are provided, so values small than the 0'th bound go in the 0'th bin, and values great than the last bound go in the M'th bin. 
BLOOM_ADD  Adds an element to the bloom filter passed in 
 Bloom Filter 
BLOOM_EXISTS  If the bloom filter contains the value 
 True if the filter might contain the value and false otherwise 
BLOOM_INIT  Returns an empty bloom filter 
 Bloom Filter 
BLOOM_MERGE  Returns a merged bloom filter 
 Bloom Filter or null if the list is empty 
CEILING  Returns the ceiling of a number. 
 The ceiling of the number passed in. 
CHOP  Remove the last character from a string. 
 String without last character, null if null string input. 
CHOMP  Removes one newline from end of a string if its there, otherwise leaves it alone. A newline is "/n", "/r", "/r/n". 
 String without newline, null if null string input. 
COS  Returns the cosine of a number. 
 The cosine of the number passed in. 
COUNT_MATCHES  Counts how many times the substring appears in the larger string. 
 
DAY_OF_MONTH  The numbered day within the month. The first day within the month has a value if 1. 
 The numbered day within the month 
DAY_OF_WEEK  The numbered day within the week. The first day of the week, Sunday, has a value of 1. 
 The numbered day within the week. 
DAY_OF_THE_YEAR  The day number within the year. The first day of the year has value of 1. 
 The day number within the year 
DECODE  Decodes the passed string with the provided encoding, which must be one of the
encodings returned from GET_SUPPORTED_ENCODINGS 


DOMAIN_REMOVE_SUBDOMAINS  Remove subdomains from a domain 
 The domain without the subdomains. (For example, DOMAIN_REMOVE_SUBDOMAINS ('mail.yahoo.com') yields 'yahoo.com') 
DOMAIN_REMOVE_TLD  Removes the top level domain (TLD) suffix from a domain 
 The domain without the TLD. (For example, DOMAIN_REMOVE_TLD('mail.yahoo.co.uk') yields 'mail.yahoo') 
DOMAIN_TO_TLD  Extracts the top level domain from a domain 
 The domain of the TLD. (For example, DOMAIN_TO_TLD('mail.yahoo.com.uk') 'yields 'co.uk') 
ENCODE  Encodes the passed string with the provided encoding, which must be one of the
encodings returned from GET_SUPPORTED_ENCODINGS 


ENDS_WITH  Determines whether a string ends with a suffix 
 True if the string ends with the specified suffix and false if otherwise 
ENRICHMENT_EXISTS  Interrogates the HBase table holding the simple HBase enrichment data and returns whether the enrichment type and indicator are in the table 
 True if the enrichment indicator exists and false otherwise 
ENRICHMENT_GET  Interrogates the HBase table holding the simple HBase enrichment data and retrieves the tabular value associated with the enrichment type and indicator 
 A map associated with the indicator and enrichment type. Empty otherwise. 
EXP  Returns Euler's number raised to the power of the argument. 
 Euler's number raised to the power of the argument. 
FILL_LEFT  Fills or pads a given string with a given character, to a given length on the left 
 The filled string 
FILL_RIGHT  Fills or pads a given string with a given character, to a given length on the right 
 Last element of the list 
FILTER  Applies a filter in the form of a lambda expression to a list. For example, `FILTER( [ 'foo', 'bar' ] , (x) > x == 'foo')` would yield `[ 'foo'. 
 The input list filtered by the predicate. 
FLOOR  Returns the floor of a number. 


FUZZY_LANGS  Returns a list of IETF BCP 47 available to the system, such as en, fr, de.  A list of IEF BGP 47 language tag strings  
FUZZY_SCORE  Returns the Fuzzy Score which indicates the similarity score between two strings. One point is given for every matched character. Subsequent matches yield two bonus points. A higher score indicates a higher similarity. 
 An Integer representing the score. 
FORMAT  Returns a formatted string using the specified format string and arguments. Uses Java's string formatting conventions 
 A formatted string 
GEO_GET  Look up an IPV4 address and returns geographic information about it. 
 If a Single field is requested, a string of the field. If multiple fields are requested, a map of string of fields. Otherwise null. 
GET  Returns the i'th element of the list 
 First element of the list 
GET_FIRST  Returns the first element of the list 
 First element of this list 
GET_LAST  Returns the last element of the list 
 Last element of the list 
GET_SUPPORTED_ENCODINGS  Returns a list of the encodings that are currently supported.  A List of String  
HASH  Hashes a given value using the given hashing algorithm and returns a hex encoded string. 
 A hex encoded string of a hashed value using the given algorithm. If 'hashType' is null then '00', padded to the necessary length, will be returned. If 'toHash' is not able to be hashed or 'hashType' is null then null is returned. 
HLLP_CARDINALITY  Returns HyperLogLogPlusestimated cardinality for this set. 
 Long value representing the cardinality for this set 
HLLP_INIT  Initializes the set 
 A new HyperLogLogPlus set 
HLLP_MERGE  Merge hllp sets together 
 A new merged HyperLogLogPlus estimator set 
HLLP_OFFER  Add value to set 
 The HyperLogLogPlus set with a new object added 
IN_SUBNET  Returns true if an IP is within a subnet range 
 True if the IP address is within at least one of the network ranges and false if otherwise 
IS_DATE  Determines if the date contained in the string conforms to the specified format 
 True if the date is in the specified format and false if otherwise 
IS_DOMAIN  Tests if a string is a valid domain. Domain names are evaluated according to the standards RFC1034 Section 3, and RFC1123 section 2.1. 
 True if the string is a valid domain and false if otherwise 
IS_EMAIL  Tests if a string is a valid email address 
 True if the string is a valid email address and false if otherwise 
IS_EMPTY  Returns true if string or collection is empty or null and false if otherwise 
 True if the string or collection is empty or null and false if otherwise 
IS_ENCODING  Returns true if the passed string is encoded in one of the supported encodings and false if otherwise. 
 True if the passed string is encoded in one of the supported encodings and false if otherwise. 
IS_INTEGER  Determines whether or not an object is an integer 
 True if the object can be converted to an integer and false if otherwise 
IS_IP  Determine if a string is an IP or not 
 True if the string is an IP and false if otherwise 
IS_URL  Tests if a string is a valid URL 
 True if the string is a valid URL and false otherwise 
JOIN  Joins the components in the list of strings with the specified delimiter 
 String 
KAFKA_GET  Retrieves messages from a Kafka topic. Subsequent calls will continue retrieving messages sequentially from the original offset. 
 List of String 
KAFKA_PROPS  Retrieves the Kafka properties that are used by other KAFKA_* functions like KAFKA_GET and KAFKA_PUT. The Kafka properties are compiled from a set of default properties, the global properties, and any overrides. 
 Map of key/value pairs 
KAFKA_PUT  Sends messages to a Kafka topic. 
 N/A 
KAFKA_TAIL  Retrieves messages from a Kafka topic always starting with the most recent message first. 
 List of String 
LENGTH  Returns the length of a string or size of a collection. Returns 0 for empty or null strings. 
 Integer 
LIST_ADD  Adds an element to a list. 
 Resulting list with the item added at the end. 
LN  Returns the natural log of a number. 
 The natural log of the number passed in. 
LOG2  Returns the log (base 2 ) of a number. 
 The log (base 2 ) of the number passed in. 
LOG10  Returns the log (base 10 ) of a number. 
 The log (base 10 ) of the number passed in. 
MAAS_GET_ENDPOINT  Inspects ZooKeeper and returns a map containing the name, version, and url for the model referred to by the input parameters 
 A map containing the name, version, url for the REST endpoint (fields named name, version, and url). Note that the output of this function is suitable for input into the first argument of MAAS_MODEL_APPLY. 
MAAS_MODEL_APPLY  Returns the output of a model deployed via Model as a Service. Note: Results are cached locally 10 minutes. 
 The output of the model deployed as a REST endpoint in map form. Assumes REST endpoint returns a JSON map. 
MAP  Applies lambda expression to a list of arguments. e.g. `MAP( [ 'foo', 'bar' ] , (x) > TO_UPPER(x) )` would yield `[ 'FOO', 'BAR' ]`. 
 A new String if prefix was prepended, the same string otherwise. 
MAP_EXISTS  Checks for existence of a key in a map 
 True if the key is found in the map and false if otherwise 
MONTH  The number representing the month. The first month, January, has a value of 0. 
 The current month (0based). 
MULTISET_ADD  Adds to a multiset, which is a map associating objects to their instance counts. 
 A multiset 
MULTISET_INIT  Creates an empty multiset, which is a map associating objects to their instance counts. 
 A multiset 
MULTISET_MERGE  Merges a list of multisets, which is a map associating objects to their instance counts. 
 A multiset 
MULTISET_REMOVE  Removes from a multiset, which is a map associating objects to their instance counts. 
 A multiset 
MULTISET_TO_SET  Create a set out of a multiset, which is a map associating objects to their instance counts. 
 The set of objects in the multiset ignoring multiplicity 
PREPEND_IF_MISSING  Prepends the prefix to the start of the string if the string does not already start with any of the prefixes. 
 A new String if prefix was prepended, the same string otherwise. 
PROFILE_FIXED  The profile periods associated with a fixed lookback starting from now 
 The selected profile measurement timestamps. These are ProfilePeriod objects. 
PROFILE_GET  Retrieves a series of values from a stored profile 
 The profile measurements 
PROFILE_WINDOW  The profiler periods associated with a window selector statement from an optional reference timestamp. 
 Returns: The selected profile measurement periods. These are ProfilePeriod objects. 
PROTOCOL_TO_NAME  Converts the IANA protocol number to the protocol name 
 The protocol name associated with the IANA number 
REDUCE  Reduces a list by a binary lambda expression. That is, the expression takes two arguments. Usage example: `REDUCE( [ 1, 2, 3 ] , (x, y) > x + y, 0)` would sum the input list, yielding `6`. 

The reduction of the list. 
REGEXP_MATCH  Determines whether a regex matches a string 
 List of strings 
REGEX_GROUP_VAL  Returns the value of a group in a regex against a string 
 The value of the group, or null if not matched or no group at index. 
ROUND  Rounds a number to the nearest integer. This is halfup rounding. 
 The nearest integer (based on halfup rounding). 
SET_ADD  Adds to a set 
 A Set 
SET_INIT  Creates an new set 
 A Set 
SET_MERGE  Merges a list of sets 
 A Set 
SET_REMOVE  Removes from a set 
 A Set 
SPLIT  Splits the string by the delimiter 
 List of strings 
SIN  Returns the sine of a number. 
 The sine of the number passed in. 
SQRT  Returns the square root of a number. 
 The square root of the number passed in. 
STARTS_WITH  Determines whether a string starts with a prefix 
 True if the string starts with the specified prefix and false if otherwise 
STATS_ADD  Add one or more input values to those that are used to calculate the summary statistics 
 A Stellar statistics object 
STATS_BIN  Computes the bin that the value is in based on the statistical distribution. 
 Which bin N the value falls in such that bound(N1) < value <= bound(N). No min and max bounds are provided, so values smaller than the 0'th bound go in the 0'th bin, and values greater than the last bound go in the M'th bin. 
STATS_COUNT  Calculates the count of the values accumulated (or in the window if a window is used) 
 The count of the values in the window or NaN if the statistics object is null 
STATS_GEOMETRIC_MEAN  Calculates the geometric mean of the accumulated values (or in the window if a window is used). See http://commons.apache.org.proper/commonsmath/userguide/stat.html#a1.2_Descriptive_statistics 
 The geometric mean of the values in the window or NaN if the statistics object is null 
STATS_INIT  Initializes a statistics object 
 A Stellar statistics object 
STATS_KURTOSIS  Calculates the kurtosis of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commonsmath/userguide/stat.html#a1.2_Descriptive_statistics 
 The kurtosis of the values in the window or NaN if the statistics object is null 
STATS_MAX  Calculates the maximum of the accumulated values (or in the window if a window is used) 
 The maximum of the accumulated values in the window or NaN if the statistics object is null 
STATS_MEAN  Calculates the mean of the accumulated values (or in the window if a window is used) 
 The mean of the values in the window or NaN if the statistics objects is null 
STATS_MERGE  Merges statistics objects 
 A Stellar statistics object 
STATS_MIN  Calculates the minimum of the accumulated values (or in the window if a window is used) 
 The minimum of the accumulated values in the window of NaN if the statistics object is null 
STATS_PERCENTILE  Computes the p'th percentile of the accumulated values (or in the window if a window is used) 
 The p'th percentile of the data or NaN if the statistics object is null 
STATS_POPULATION_VARIANCE  Calculates the population variance of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commonsmath/userguide/stat.html#a1.2_Descriptive_statistics 
 The population variance of the values in the window of NaN if the statistics object is null 
STATS_QUADATIC_MEAN  Calculates the quadratic mean of the accumulated values (or in the window if the window is used). See http://commons.apache.org/proper/commonsmath/userguide/stat.html#a1.2_Descriptive_statistics 
 The quadratic mean of the values in the window or NaN if the statistics object is null 
STATS_SD  Calculates the standard deviation of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commonsmath/userguide/stat.html#a1.2_Descriptive_statistics 
 The standard deviation of the values in the window or NaN if the statistics object is null 
STATS_SKEWNESS  Calculates the skewness of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commonsmath/userguide/stat.html#a1.2_Descriptive_statistics 
 The skewness of the values in the window of NaN if the statistics object is null 
STATS_SUM  Calculates the sum of the accumulated values (or in the window if a window is used) 
 The sum of the values in the window or NaN if the statistics object is null 
STATS_SUM_LOGS  Calculates the sum of the (natural) log of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commonsmath/userguide/stat.html#a1.2_Descriptive_statistics 
 The sum of the (natural) log of the values in the in window or NaN if the statistics object is null 
STATS_SUM_SQUARES  Calculates the sum of the squares of the accumulated values (or in the window if a window is used) 
 The sum of the squares of the values in the window or NaN if the statistics object is null 
STATS_VARIANCE  Calculates the variance of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commonsmath/userguide/stat.html#a1.2_Descriptive_statistics 
 The variance of the values in the window or NaN if the statistics object is null 
STRING_ENTROPY  Computes the base2 shannon entropy of a string.  input  string  The base2 shannon entropy of the string (https://en.wikipedia.org/wiki/Entropy_(information_theory)#Definition). The unit of this is bits. 
SYSTEM_ENV_GET  Returns the value associated with an environment variable 
 String 
SYSTEM_PROPERTY_GET  Returns the value associated with a Java system property 
 String 
TAN  Returns the tangent of a number. 
 The tangent of the number passed in. 
TO_DOUBLE  Transforms the first argument to a double precision number 
 Double version of the first argument 
TO_EPOCH_TIMESTAMP  Returns the epoch timestamp of the dateTime in the specified format. If the format does not have a timestamp and you wish to assume a given timestamp, you may specify the timezone optionally. 
 Epoch timestamp 
TO_FLOAT  Transforms the first argument to an integer 
 Float version of the first argument 
TO_INTEGER  Transforms the first argument to an integer 
 Integer version of the first argument 
TO_LONG  
TO_LOWER  Transforms the first argument to a lowercase string 
 String 
TO_STRING  Transforms the first argument to a string 
 String 
TO_UPPER  Transforms the first argument to an uppercase string 
 Uppercase string 
TRIM  Trims white space from both sides of a string 
 String 
URL_TO_HOST  Extract the hostname from a URL 
 The hostname from the URL as a string (for example URL_TO_HOST('http://www.yahoo.com/foo') would yield 'www.yahoo.com' 
URL_TO_PATH  Extract the path from a URL 
 The path from the URL as a string (for example URL_TO_PATH('http://www.yahoo.com/foo') would yield 'foo' 
URL_TO_PORT  Extract the port from a URL. If the port is not explicitly stated in the URL, then an implicit port is inferred based on the protocol. 
 The port used in the URL as an integer (for example URL_TO_PORT('http://www.yahoo.com/foo') would yield 80) 
URL_TO_PROTOCOL  Extract the protocol from a URL 
 The protocol from the URL as a string (for example URL_TO_PROTOCOL('http://www.yahoo.com/foo') would yield 'http' 
WEEK_OF_MONTH  The numbered week within the month. The first week within the month has a value of 1. 
 The numbered week within the month 
WEEK_OF_YEAR  The numbered week within the year. The first week in the year has a value of 1. 
 The numbered week within the year 
YEAR  The number representing the year 
 The current year 
ZIP  Zips lists into a single list where the ith element is an list containing the ith items from the constituent lists. See python and wikipedia for more context. 


ZIP_LONGEST  Zips lists into a single list where the ith element is an list containing the ith items from the constituent lists. See python and wikipedia for more context. 


The following is an example query (i.e., a function which returns a boolean) which would be seen possibly in threat triage:
IN_SUBNET( ip, '192.168.0.0/24') or ip in [ '10.0.0.1', '10.0.0.2' ] or exists(is_local)
This evaluates to true precisely when one of the following is true:
The value of the
ip
field is in the192.168.0.0/24
subnetThe value of the
ip
field is10.0.0.1
or10.0.0.2
The field
is_local
exists
The following is an example transformation which might be seen in a field transformation:
TO_EPOCH_TIMESTAMP(timestamp, 'yyyyMMdd HH:mm:ss', MAP_GET(dc, dc2tz, 'UTC'))
For a message with a timestamp
and dc
field, we want to set the
transform the timestamp to an epoch timestamp given a timezone which we will lookup in a
separate map, called dc2tz
.
This will convert the timestamp field to an epoch timestamp based on the
Format
yyyyMMdd HH:mm:ss
The value in
dc2tz
associated with the value associated with fielddc
, defaulting toUTC