Performance Tuning for Ambari Infra

When using Ambari Infra to index and store Ranger audit logs, you should properly tune Solr to handle the number of audit records stored per day. The following sections describe recommendations for tuning your operating system and Solr, based on how you use Ambari Infra and Ranger in your environment.

Operating System Tuning

Solr clients use many network connections when indexing and searching, and to avoid many open network connections, the following sysctl parameters are recommended:

net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

These settings can be made permanent by placing them in /etc/sysctl.d/net.conf, or they can be set at runtime using the following sysctl command example:

sysctl -w net.ipv4.tcp_max_tw_buckets=1440000
sysctl -w net.ipv4.tcp_tw_recycle=1
sysctl -w net.ipv4.tcp_tw_reuse=1

Additionally, the number of user processes for solr should be increased to avoid exceptions related to creating new native threads. This can be done by creating a new file named /etc/security/limits.d/infra-solr.conf with the following contents:

infra-solr - nproc 6000

JVM - GC Settings

The heap sizing and garbage collection settings are very important for production Solr instances when indexing many Ranger audit logs. For production deployments, we suggest setting the “Infra Solr Minimum Heap Size,” and “Infra Solr Maximum Heap Size” to 12 GB. These settings can be found and applied by following the steps below:

Steps

In Ambari Web, browse to Services > Ambari Infra > Configs.
In the Settings tab you will see two sliders controlling the Infra Solr Heap Size.
Set the Infra Solr Minimum Heap Size to 12GB or 12,288MB.
Set the Infra Solr Maximum Heap Size to 12GB or 12,288MB.
Click Save to save the configuration and then restart the affected services as prompted by Ambari.

Using the G1 Garbage Collector is also recommended for production deployments. To use the G1 Garbage Collector with the Ambari Infra Solr Instance, follow the steps below:

Steps

In Ambari Web, browse to Services > Ambari Infra > Configs.
In the Advanced tab expand the section for Advanced infra-solr-env

In the infra-solr-env template locate the multi-line GC_TUNE environmental variable definition, and replace it with the following content:

GC_TUNE="-XX:+UseG1GC
  -XX:+PerfDisableSharedMem
  -XX:+ParallelRefProcEnabled
  -XX:G1HeapRegionSize=4m
  -XX:MaxGCPauseMillis=250
  -XX:InitiatingHeapOccupancyPercent=75
  -XX:+UseLargePages
  -XX:+AggressiveOpts"

The value used for the -XX:G1HeapRegionSize is based on the 12GB Solr Maximum Heap recommended. If you choose to use a different heap size for the Solr server, please consult the following table for recommendations:

Heap Size	G1HeapRegionSize
< 4GB	1MB
4-8GB	2MB
8-16GB	4MB
16-32GB	8MB
32-64GB	16MB
>64GB	32MB

Environment-Specific Tuning Parameters

Each of the recommendations below are dependent on the number of audit records that are indexed per day. To quickly determine how many audit records are indexed per day, use the command examples below:

Using a HTTP client such as curl, execute the following command:

curl -g "http://<ambari infra hostname>:8886/solr/ranger_audits/select?q=(evtTime:[NOW-7DAYS+TO+*])&wt=json&indent=true&rows=0"

You should receive a message similar to the following:

{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "q":"evtTime:[NOW-7DAYS TO *]",
      "indent":"true",
      "rows":"0",
      "wt":"json"}},
  "response":{"numFound":306,"start":0,"docs":[]
  }}

Take the numFound element of the response and divide it by 7 to get the average number of audit records being indexed per day. You can also replace the ‘7DAYS’ in the curl request with a broader time range, if necessary, using the following key words:

1MONTHS
7DAYS

Just ensure you divide by the appropriate number if you change the event time query. The average number of records per day will be used to identify which recommendations below apply to your environment.

Less Than 50 Million Audit Records Per Day

Based on the Solr REST API call if your average number of documents per day is less than 50 million records per day, the following recommendations apply. In each recommendation, the time to live, or TTL, which controls how long a document should be kept in the index until it is removed is taken into consideration. The default TTL is 90 days, but some customers choose to be more aggressive, and remove documents from the index after 30 days. Due to this, recommendations for both common TTL settings are specified.

These recommendations assume that you are using our recommendation of 12GB heap per Solr server instance. In each situation we have recommendations for co-locating Solr with other master services, and for using dedicated Solr servers. Testing has shown that Solr performance requires different server counts depending on whether Solr is co-located or on dedicated servers. Based on our testing with Ranger, Solr shard sizes should be around 25GB for best overall performance. However, Solr shard sizes can go up to 50GB without a significant performance impact.

This configuration is our best recommendation for just getting started with Ranger and Ambari Infra so the only recommendation is using the default TTL of 90 days.

Default Time To Live (TTL) 90 days:

Estimated total index size: ~150 GB to 450 GB
Total number of primary/leader shards: 6
Total number of shards including 1 replica each: 12
Total number of co-located Solr nodes: ~3 nodes, up to 2 shards per node
(does not include replicas)
Total number of dedicated Solr nodes: ~1 node, up to 12 shards per node
(does not include replicas)

50 - 100 Million Audit Records Per Day

50 to 100 million records ~ 5 - 10 GB data per day.

Default Time To Live (TTL) 90 days:

Estimated total index size: ~ 450 - 900 GB for 90 days
Total number of primary/leader shards: 18-36
Total number of shards including 1 replica each: 36-72
Total number of co-located Solr nodes: ~9-18 nodes, up to 2 shards per node
(does not include replicas)
Total number of dedicated Solr nodes: ~3-6 nodes, up to 12 shards per node
(does not include replicas)

Custom Time To Live (TTL) 30 days:

Estimated total index size: 150 - 300 GB for 30 days
Total number of primary/leader shards: 6-12
Total number of shards including 1 replica each: 12-24
Total number of co-located Solr nodes: ~3-6 nodes, up to 2 shards per node
(does not include replicas)
Total number of dedicated Solr nodes: ~1-2 nodes, up to 12 shards per node
(does not include replicas)

100 - 200 Million Audit Records Per Day

100 to 200 million records ~ 10 - 20 GB data per day.

Default Time To Live (TTL) 90 days:

Estimated total index size: ~ 900 - 1800 GB for 90 days
Total number of primary/leader shards: 36-72
Total number of shards including 1 replica each: 72-144
Total number of co-located Solr nodes: ~18-36 nodes, up to 2 shards per node
(does not include replicas)
Total number of dedicated Solr nodes: ~3-6 nodes, up to 12 shards per node
(does not include replicas)

Custom Time To Live (TTL) 30 days:

Estimated total index size: 300 - 600 GB for 30 days
Total number of primary/leader shards: 12-24
Total number of shards including 1 replica each: 24-48
Total number of co-located Solr nodes: ~6-12 nodes, up to 2 shards per node
(does not include replicas)
Total number of dedicated Solr nodes: ~1-3 nodes, up to 12 shards per node
(does not include replicas)

If you choose to use at least 1 replica for high availability, then increase the number of nodes accordingly. If high availability is a requirement, then consider using no less than 3 Solr nodes in any configuration.

As illustrated in these examples, a lower TTL requires less resources. If your compliance objectives call for longer data retention, you can use the SolrDataManager to archive data into long term storage (HDFS, or S3) and provides Hive tables allowing you to easily query that data. With this strategy, hot data can be stored in Solr for rapid access through the Ranger UI, and cold data can be archived to HDFS, or S3 with access provided through Ranger.

More Information

Archiving and Purging Data

Adding New Shards

If after reviewing the recommendations above, you need to add additional shards to your existing deployment, the following Solr documentation will help you understand how to accomplish that task: https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.5.pdf

Out of Memory Exceptions

When using Ambari Infra with Ranger Audit, if you are seeing many instances of Solr exiting with Java “Out Of Memory” exceptions, a solution exists to update the Ranger Audit schema to use less heap memory by enabling DocValues. This change requires a re-index of data and is disruptive, but helps tremendously with heap memory consumption. Refer to this HCC article for the instructions on making this change: https://community.hortonworks.com/articles/156933/restore-backup-ranger-audits-to-newly-collection.html