Understanding Ambari Infra

Ambari Infra provides common shared services for stack components.

Many services in HDP depend on core services to index data. For example, Apache Atlas uses indexing services for tagging lineage-free text search, and Apache Ranger uses indexing for audit data. The role of Ambari Infra is to provide these common shared services for stack components.

Currently, the Ambari Infra service has only one component: the Infra Solr Instance. The Infra Solr Instance is a fully managed Apache Solr installation. By default, a single-node SolrCloud installation is deployed when the Ambari Infra Service is chosen for installation; however, you should install multiple Infra Solr Instances so that you have distributed indexing and search for Atlas, Ranger, and LogSearch (Technical Preview).

To install multiple Infra Solr Instances, you simply add them to existing cluster hosts through Ambari’s +Add Service capability. The number of Infra Solr Instances you deploy depends on the number of nodes in the cluster and the services deployed.

Because one Ambari Infra Solr Instance is used by multiple HDP components, you should be careful when restarting the service, to avoid disrupting those dependent services. In HDP 2.5 and later, Atlas, Ranger, and Log Search depend on the Ambari Infra service.

	Note
	Infra Solr Instance is intended for use only by HDP components. Use by third-party components or applications is not supported.

Large clusters produce many log entries, and Ambari Infra provides a convenient utility for archiving and purging logs that are no longer required. This utility is called the Solr Data Manager. The Solr Data Manager is a python program available in /usr/bin/infra-solr-data-manager. This program allows users to quickly archive, delete, or save data from a Solr collection, with the following usage options.