Hortonworks Cybersecurity Platform
Also available as:
PDF

Distributional Statistics

Data scientists frequently want to find anomalies in numerical data. To that end, HCP has some simple distributional statistics.

Table 1. Distributional Statistics
Function Description Input Returns
STATS_ADD Adds one or more input values to those that are used to calculate the summary statistics.
  • stats - The Stellar statistics object. If null, then a new one is initialized.

  • value+ - One or more numbers to add

A Stellar statistics object
STATS_BIN Computes the bin that the value is in based on the statistical distribution.
  • stats - The Stellar statistics object

  • value - The value to bin

  • bounds? - A list of percentile bin bounds (excluding min and max) or a string representing a known and common set of bins. For convenience, we have provided QUARTILE, QUINTILE, and DECILE which you can pass in as a string arg. If this argument is omitted, then we assume a Quartile bin split.

Which bin N the value falls in such that bound(N-1) < value <= bound(N). No min and max bounds are provided, so values smaller than the 0'th bound go in the 0'th bin, and values greater than the last bound go in the M'th bin.
STATS_COUNT Calculates the count of the values accumulated (or in the window if a window is used).
  • stats - The Stellar statistics object

The count of the values in the window or NaN if the statistics object is null
STATS_GEOMETRIC_MEAN Calculates the geometric mean of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The geometric mean of the values in the window or NaN if the statistics object is null.
STATS_INIT Initializes a statistics object
  • window size - The number of input data values to maintain in a rolling window in memory. If window_size is equal to 0, then no rolling window is maintained. Using no rolling window is less memory intensive, but cannot calculate certain statistics like percentiles and kurtosis.

A Stellar statistics object
STATS_KURTOSIS Calculates the kurtosis of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The kurtosis of the values in the window or NaN if the statistics object is null
STATS_MAX Calculates the maximum of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The maximum of the accumulated values in the window or NaN if the statistics object is null.
STATS_MEAN Calculates the mean of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The mean of the values in the window or NaN if the statistics object is null.
STATS_MERGE Merges statistics objects
  • statistics -A list of statistics objects

A Stellar statistics object
STATS_MIN Calculates the minimum of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The minimum of the accumulated values in the window or NaN if the statistics object is null.
STATS_PERCENTILE Computes the p'th percentile of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

  • p - A double where 0 <= p < 1 representing the percentile

The p'th percentile of the data or NaN if the statistics object is null
STATS_POPULATION_VARIANCE Calculates the population variance of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The population variance of the values in the window or NaN if the statistics object is null.
STATS_QUADRATIC_MEAN Calculates the quadratic mean of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The quadratic mean of the values in the window or NaN if the statistics object is null.
STATS_SD Calculates the standard deviation of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The standard deviation of the values in the window or NaN if the statistics object is null.
STATS_SKEWNESS Calculates the skewness of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The skewness of the values in the window or NaN if the statistics object is null.
STATS_SUM Calculates the sum of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The sum of the values in the window or NaN if the statistics object is null.
STATS_SUM_LOGS Calculates the sum of the (natural) log of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The sum of the (natural) log of the values in the window or NaN if the statistics object is null.
STATS_SUM_SQUARES Calculates the sum of the squares of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The sum of the squares of the values in the window or NaN if the statistics object is null.
STATS_VARIANCE Calculates the variance of the accumulated values (or in the window if a window is used).
  • stats - The Stellar statistics object

The variance of the values in the window or NaN if the statistics object is null.
IT_ENTROPY Computes the base-2 entropy of a multiset.
  • input - A multiset (a map of objects to counts)

The [base-2 entropy] of the count . The unit of this is bits.