Activity analyzer placement
The Activity Analyzer component has the ability to extract, aggregate, and store utilization data for all three supported analyzers: HDFS, YARN, and MapReduce & Tez.
Activity Analyzers need HDFS, YARN, MR, and Tez clients installed on the same host as the analyzer.
For HDFS analysis, an Activity Analyzer needs to be deployed to each NameNode in the cluster. These instances will automatically begin processing the fsimage on startup and will reprocess the latest fsimage data once every 24 hours. By default, when deployed on a NameNode, these Activity Analyzers do not process YARN, or MapReduce & Tez utilization data; This is to reduce the amount of processing done on servers hosting critical services like the NameNode.
Resource requirements: HDFS Analyzer typically runs for a very short period of time, its resource consumption depending on fsImage size. For example, analyzing a 200-million-object fsImage is anticipated to take less than 15 minutes; HDFS Analyzer is mostly a single-threaded process and consumes up to one core during this execution time.
YARN, MapReduce & Tez analyzer
Activity Analyzers deployed to the NameNodes in the cluster do not process any utilization data besides HDFS. Therefore, to process YARN, MapReduce, and Tez utilization data, another instance of the Activity Analyzer needs to be deployed to another node in the cluster, preferably on a non-master node. On startup, the Activity Analyzer will check to ensure that it’s not deployed to a NameNode, and then will begin to process YARN, MapReduce, and Tez utilization data. This Activity Analyzer individually starts and schedules analysis for YARN applications, MapReduce and Tez jobs. Both the YARN, and MapReduce and Tez analysis constantly polls for completed applications or jobs. Upon completion, each is analyzed and the utilization data is stored in the Ambari Metrics System