Using Apache Solr for Ranger Audits
Apache Solr is an open-source enterprise search platform. Apache Ranger can use Apache Solr to store audit logs, and Solr can also to provide a search capability of the audit logs through the Ranger Admin UI.
It is recommended that Ranger audits be written to both Solr and HDFS. Audits to Solr are primarily used to enable search queries from the Ranger Admin UI. HDFS is a long-term destination for audits -- audits stored in HDFS can be exported to any SIEM system, or to another audit store.
Apache Ranger uses Apache Solr to store audit logs and provides UI searching through the audit logs. Solr must be installed and configured before installing Ranger Admin or any of the Ranger component plugins. The default configuration for Ranger Audits to Solr uses the shared Solr instance provided under the Ambari Infra service. Solr is both memory and CPU intensive. If your production system has high volume of access requests, make sure that the Solr host has adequate memory, CPU, and disk space.
SolrCloud is the preferred setup for production usage of Ranger. SolrCloud, which is deployed with the Ambari Infra service, is a scalable architecture that can run as a single node or multi-node cluster. It has additional features such as replication and sharding, which is useful for high availability (HA) and scalability. You should plan your deployment based on your cluster size. Because audit records can grow dramatically, plan to have at least 1 TB of free space in the volume on which Solr will store the index data. Solr works well with a minimum of 32 GB of RAM. You should provide as much memory as possible to the Solr process.
It is highly recommended to use SolrCloud with at least two Solr nodes running on different servers with replication enabled. You can use the information in this section to configure additional SolrCloud instances.
- Ambari Infra Managed Solr (default) -- Audits to Solr defaults to use the shared Solr instance provided under the Ambari Infra service. There are no additional configuration steps required for this option. SolrCloud, which is deployed with the Ambari Infra service, is a scalable architecture which can run as a single node or multi-node cluster. This is the recommended configuration for Ranger. By default, a single-node SolrCloud installation is deployed when the Ambari Infra Service is chosen for installation. Hortonworks recommends that you install multiple Ambari Infra Solr Instances in order to provide distributed indexing and search for Atlas, Ranger, and LogSearch (Technical Preview). This can be accomplished by simply adding additional Ambari Infra Solr Instances to existing cluster hosts by selecting Actions > Add Service on the Ambari dashboard.
- Externally Managed SolrCloud -- You can also install and manage an external SolrCloud that can run as single or multi-node cluster. It includes features such as replication and sharding, which are useful for high availability (HA) and scalability. With SolrCloud, customers need to plan the deployment based on the cluster size.
- Externally Managed Solr Standalone -- Solr Standalone is NOT recommended for
production use, and should be only used for testing and evaluation. Solr Standalone is a
single instance of Solr that does not require ZooKeeper.Note
Solr Standalone is NOT recommended and support for this configuration will be deprecated in a future release.
- SolrCloud for Kerberos -- This is the recommended configuration for SolrCloud in Kerberos environments.