Ambari User's Guide
Overview
Hadoop is a large-scale, distributed data storage and processing infrastructure using
clusters of commodity hosts networked together. Monitoring and managing such complex
distributed systems is a non-trivial task. To help you manage the complexity, Apache
Ambari collects a wide range of information from the cluster's nodes and services
and presents it to you in an easy-to-read and use, centralized web interface, Ambari
Web.
Ambari Web displays information such as service-specific summaries, graphs, and alerts.
You use Ambari Web to create and manage your HDP cluster and to perform basic operational
tasks such as starting and stopping services, adding hosts to your cluster, and updating
service configurations. You also use Ambari Web to perform administrative tasks for
your cluster, such as managing users and groups and deploying Ambari Views.
For more information on administering Ambari users, groups and views, refer to the
Ambari Administration Guide.
Architecture
The Ambari Server serves as the collection point for data from across your cluster. Each host has a copy of the Ambari Agent - either installed automatically by the Install wizard or manually - which allows the Ambari Server to control each host.
Figure - Ambari Server Architecture

Sessions
Ambari Web is a client-side JavaScript application, which calls the Ambari REST API (accessible from the Ambari Server) to access cluster information and perform cluster operations. After authenticating to Ambari Web, the application authenticates to the Ambari Server. Communication between the browser and server occurs asynchronously via the REST API.
Ambari Web sessions do not time out. The Ambari Server application constantly accesses the Ambari REST API, which resets the session timeout. During any period of Ambari Web inactivity, the Ambari Web user interface (UI) refreshes automatically. You must explicitly sign out of the Ambari Web UI to destroy the Ambari session with the server.

Accessing Ambari Web
Typically, you start the Ambari Server and Ambari Web as part of the installation process. If Ambari Server is stopped, you can start it using a command line editor on the Ambari Server host machine. Enter the following command:
ambari-server start
To access Ambari Web, open a supported browser and enter the Ambari Web URL:
http://<your.ambari.server>:8080
Enter your user name and password. If this is the first time Ambari Web is accessed,
use the default values, admin/admin
.
These values can be changed, and new users provisioned, using the Manage Ambari
option.

For more information about managing users and other administrative tasks, see Administering Ambari.
Monitoring and Managing your HDP Cluster with Ambari
This topic describes how to use Ambari Web features to monitor and manage your HDP cluster. To navigate, select one of the following feature tabs located at the top of the Ambari main window. The selected tab appears white.

Viewing Metrics on the Dashboard
Ambari Web displays the Dashboard page as the home page. Use the Dashboard to view the operating status of your cluster in the following three ways:
Scanning System Metrics
View Metrics that indicate the operating status of your cluster on the Ambari Dashboard. Each metrics widget displays status information for a single service in your HDP cluster. The Ambari Dashboard displays all metrics for the HDFS, YARN, HBase, and Storm services, and cluster-wide metrics by default.
You can add and remove individual widgets, and rearrange the dashboard by dragging and dropping each widget to a new location in the dashboard.

Status information appears as simple pie and bar charts, more complex charts showing usage and load, sets of links to additional data sources, and values for operating parameters such as uptime and average RPC queue wait times. Most widgets display a single fact by default. For example, HDFS Disk Usage displays a load chart and a percentage figure. The Ambari Dashboard includes metrics for the following services:
Ambari Service Metrics and Descriptions
Metric: |
Description: |
---|---|
HDFS |
|
HDFS Disk Usage |
The Percentage of DFS used, which is a combination of DFS and non-DFS used. |
Data Nodes Live |
The number of DataNodes live, as reported from the NameNode. |
NameNode Heap |
The percentage of NameNode JVM Heap used. |
NameNode RPC |
The average RPC queue latency. |
NameNode CPU WIO |
The percentage of CPU Wait I/O. |
NameNode Uptime |
The NameNode uptime calculation. |
YARN (HDP 2.1 or later Stacks) |
|
ResourceManager Heap |
The percentage of ResourceManager JVM Heap used. |
ResourceManager Uptime |
The ResourceManager uptime calculation. |
NodeManagers Live |
The number of DataNodes live, as reported from the ResourceManager. |
YARN Memory |
The percentage of available YARN memory (used vs. total available). |
HBase |
|
HBase Master Heap |
The percentage of NameNode JVM Heap used. |
HBase Ave Load |
The average load on the HBase server. |
HBase Master Uptime |
The HBase Master uptime calculation. |
Region in Transition |
The number of HBase regions in transition. |
Storm (HDP 2.1 or later Stacks) |
|
Supervisors Live |
The number of Supervisors live, as reported from the Nimbus server. |
Drilling Into Metrics for a Service
-
To see more detailed information about a service, hover your cursor over a Metrics widget.
More detailed information about the service displays, as shown in the following example:
-
To remove a widget from the mashup, click the white X.
-
To edit the display of information in a widget, click the pencil icon. For more information about editing a widget, see Customizing Metrics Display.
Viewing Cluster-Wide Metrics
Cluster-wide metrics display information that represents your whole cluster. The Ambari Dashboard shows the following cluster-wide metrics:

Ambari Cluster-Wide Metrics and Descriptions
Metric: |
Description: |
---|---|
Memory Usage |
The cluster-wide memory utilization, including memory cached, swapped, used, shared. |
Network Usage |
The cluster-wide network utilization, including in-and-out. |
CPU Usage |
Cluster-wide CPU information, including system, user and wait IO. |
Cluster Load |
Cluster-wide Load information, including total number of nodes. total number of CPUs, number of running processes and 1-min Load. |
-
To remove a widget from the dashboard, click the white X.
-
Hover your cursor over each cluster-wide metric to magnify the chart or itemize the widget display.
-
To remove or add metric items from each cluster-wide metric widget, select the item on the widget legend.
-
To see a larger view of the chart, select the magnifying glass icon.
Ambari displays a larger version of the widget in a pop-out window, as shown in the following example:

Use the pop-up window in the same ways that you use cluster-wide metric widgets on the dashboard.
To close the widget pop-up window, choose OK.
Adding a Widget to the Dashboard
To replace a widget that has been removed from the dashboard:
-
Select the Metrics drop-down, as shown in the following example:
-
Choose Add.
-
Select a metric, such as Region in Transition.
-
Choose Apply.
Resetting the Dashboard
To reset all widgets on the dashboard to display default settings:
-
Select the Metrics drop-down, as shown in the following example:
-
Choose Edit.
-
Choose Reset all widgets to default.
Customizing Metrics Display
To customize the way a service widget displays metrics information:
-
Hover your cursor over a service widget.
-
Select the pencil-shaped, edit icon that appears in the upper-right corner.
The Customize Widget pop-up window displays properties that you can edit, as shown in the following example.
-
Follow the instructions in the Customize Widget pop-up to customize widget appearance.
In this example, you can adjust the thresholds at which the HDFS Capacity bar chart changes color, from green to orange to red.
-
To save your changes and close the editor, choose
Apply
. -
To close the editor without saving any changes, choose
Cancel
.
Viewing More Metrics for your HDP Stack
The HDFS Links and HBase Links widgets list HDP components for which links to more metrics information, such as thread stacks, logs and native component UIs are available. For example, you can link to NameNode, Secondary NameNode, and DataNode components for HDFS, using the links shown in the following example:

Choose the More
drop-down to select from the list of links available for each service. The Ambari
Dashboard includes additional links to metrics for the following services:
Links to More Metrics for HDP Services
Service: |
Metric: |
Description: |
---|---|---|
HDFS |
||
NameNode UI |
Links to the NameNode UI. |
|
NameNode Logs |
Links to the NameNode logs. |
|
NameNode JMX |
Links to the NameNode JMX servlet. |
|
Thread Stacks |
Links to the NameNode thread stack traces. |
|
HBase |
||
HBase Master UI |
Links to the HBase Master UI. |
|
HBase Logs |
Links to the HBase logs. |
|
ZooKeeper Info |
Links to ZooKeeper information. |
|
HBase Master JMX |
Links to the HBase Master JMX servlet. |
|
Debug Dump |
Links to debug information. |
|
Thread Stacks |
Links to the HBase Master thread stack traces. |
Viewing Heatmaps
Heatmaps provides a graphical representation of your overall cluster utilization using simple color coding.

A colored block represents each host in your cluster. To see more information about a specific host, hover over the block representing the host in which you are interested. A pop-up window displays metrics about HDP components installed on that host. Colors displayed in the block represent usage in a unit appropriate for the selected set of metrics. If any data necessary to determine state is not available, the block displays "Invalid Data". Changing the default maximum values for the heatmap lets you fine tune the representation. Use the Select Metric drop-down to select the metric type.

Heatmaps supports the following metrics:
Metric |
Uses |
---|---|
Host/Disk Space Used % |
disk.disk_free and disk.disk_total |
Host/Memory Used % |
memory.mem_free and memory.mem_total |
Host/CPU Wait I/O % |
cpu.cpu_wio |
HDFS/Bytes Read |
dfs.datanode.bytes_read |
HDFS/Bytes Written |
dfs.datanode.bytes_written |
HDFS/Garbage Collection Time |
jvm.gcTimeMillis |
HDFS/JVM Heap MemoryUsed |
jvm.memHeapUsedM |
YARN/Garbage Collection Time |
jvm.gcTimeMillis |
YARN / JVM Heap Memory Used |
jvm.memHeapUsedM |
YARN / Memory used % |
UsedMemoryMB and AvailableMemoryMB |
HBase/RegionServer read request count |
hbase.regionserver.readRequestsCount |
HBase/RegionServer write request count |
hbase.regionserver.writeRequestsCount |
HBase/RegionServer compaction queue size |
hbase.regionserver.compactionQueueSize |
HBase/RegionServer regions |
hbase.regionserver.regions |
HBase/RegionServer memstore sizes |
hbase.regionserver.memstoreSizeMB |
Scanning Status
Notice the color of the dot appearing next to each component name in a list of components, services or hosts. The dot color and blinking action indicates operating status of each component, service, or host. For example, in the Summary View, notice green dot next to each service name. The following colors and actions indicate service status:
Status Indicators
Color |
Status |
---|---|
Solid Green |
All masters are running |
Blinking Green |
Starting up |
Solid Red |
At least one master is down |
Blinking Red |
Stopping |
Click the service name to open the Services screen, where you can see more detailed information on each service.
Managing Hosts
Use Ambari Hosts to manage multiple HDP components such as DataNodes, NameNodes, NodeManagers and RegionServers, running on hosts throughout your cluster. For example, you can restart all DataNode components, optionally controlling that task with rolling restarts. Ambari Hosts supports filtering your selection of host components, based on operating status, host health, and defined host groupings.
Working with Hosts
Use Hosts to view hosts in your cluster on which Hadoop services run. Use options on Actions to perform actions on one or more hosts in your cluster.
View individual hosts, listed by fully-qualified domain name, on the Hosts landing page.

Determining Host Status
A colored dot beside each host name indicates operating status of each host, as follows:
-
Red - At least one master component on that host is down. Hover to see a tooltip that lists affected components.
-
Orange - At least one slave component on that host is down. Hover to see a tooltip that lists affected components.
-
Yellow - Ambari Server has not received a heartbeat from that host for more than 3 minutes.
-
Green - Normal running state.
A red condition flag overrides an orange condition flag, which overrides a yellow condition flag. In other words, a host having a master component down may also have other issues. The following example shows three hosts, one having a master component down, one having a slave component down, and one healthy. Warning indicators appear next to hosts having a component down.

Filtering the Hosts List
Use Filters to limit listed hosts to only those having a specific operating status. The number of hosts in your cluster having a listed operating status appears after each status name, in parenthesis. For example, the following cluster has one host having healthy status and three hosts having Maintenance Mode turned on.

For example, to limit the list of hosts appearing on Hosts home to only those with Healthy status, select Filters, then choose the Healthy option. In this case, one host name appears on Hosts home. Alternatively, to limit the list of hosts appearing on Hosts home to only those having Maintenance Mode on, select Filters, then choose the Maintenance Mode option. In this case, three host names appear on Hosts home.
Use the general filter tool to apply specific search and sort criteria that limits the list of hosts appearing on the Hosts page.
Performing Host-Level Actions
Use Actions to act on one, or multiple hosts in your cluster. Actions performed on multiple hosts are also known as bulk operations.
Actions comprises three menus that list the following option types:
-
Hosts - lists selected, filtered or all hosts options, based on your selections made using Hosts home and Filters.
-
Objects - lists component objects that match your host selection criteria.
-
Operations - lists all operations available for the component objects you selected.
For example, to restart DataNodes on one host:
-
In Hosts, select a host running at least one DataNode.
-
In Actions, choose
Selected Hosts > DataNodes > Restart
, as shown in the following image. -
Choose OK to confirm starting the selected operation.
-
Optionally, use Monitoring Background Operations to follow, diagnose or troubleshoot the restart operation.
Viewing Components on a Host
To manage components running on a specific host, choose a FQDN on the Hosts page. For example, choose c6403.ambari.apache.org in the default example shown. Summary-Components lists all components installed on that host.

Choose options in Host Actions
, to start, stop, restart, delete, or turn on maintenance mode for all components
installed on the selected host.
Alternatively, choose action options from the drop-down menu next to an individual component on a host. The drop-down menu shows current operation status for each component, For example, you can decommission, restart, or stop the DataNode component (started) for HDFS, by selecting one of the options shown in the following example:

Decommissioning Masters and Slaves
Decommissioning is a process that supports removing a component from the cluster. You must decommission a master or slave running on a host before removing the component or host from service. Decommissioning helps prevent potential loss of data or service disruption. Decommissioning is available for the following component types:
-
DataNodes
-
NodeManagers
-
RegionServers
Decommissioning executes the following tasks:
-
For DataNodes, safely replicates the HDFS data to other DataNodes in the cluster.
-
For NodeManagers, stops accepting new job requests from the masters and stops the component.
-
For RegionServers, turns on drain mode and stops the component.
How to Decommission a Component
To decommission a component using Ambari Web, browse Hosts to find the host FQDN on which the component resides.
Using Actions, select HostsComponent Type, then choose Decommission.
For example:

The UI shows "Decommissioning" status while steps process, then "Decommissioned" when complete.

How to Delete a Component
To delete a component using Ambari Web, on Hosts
choose the host FQDN on which the component resides.
-
In
Components
, find a decommissioned component. -
Stop the component, if necessary.
-
For a decommissioned component, choose Delete from the component drop-down menu.
Deleting a slave component, such as a DataNode does not automatically inform a master component, such as a NameNode to remove the slave component from its exclusion list. Adding a deleted slave component back into the cluster presents the following issue; the added slave remains decommissioned from the master's perspective. Restart the master component, as a work-around.
Deleting a Host from a Cluster
Deleting a host removes the host from the cluster. Before deleting a host, you must complete the following prerequisites:
-
Stop all components running on the host.
-
Decommission any DataNodes running on the host.
-
Move from the host any master components, such as NameNode or ResourceManager, running on the host.
-
Turn Off Maintenance Mode, if necessary, for the host.
How to Delete a Host from a Cluster
-
In Hosts, click on a host name.
-
On the Host-Details page, select Host Actions drop-down menu.
-
Choose Delete.
If you have not completed prerequisite steps, a warning message similar to the following one appears:

Setting Maintenance Mode
Maintenance Mode supports suppressing alerts and skipping bulk operations for specific services, components and hosts in an Ambari-managed cluster. You typically turn on Maintenance Mode when performing hardware or software maintenance, changing configuration settings, troubleshooting, decommissioning, or removing cluster nodes. You may place a service, component, or host object in Maintenance Mode before you perform necessary maintenance or troubleshooting tasks.
Maintenance Mode affects a service, component, or host object in the following two ways:
-
Maintenance Mode suppresses alerts, warnings and status change indicators generated for the object
-
Maintenance Mode exempts an object from host-level or service-level bulk operations
Explicitly turning on Maintenance Mode for a service implicitly turns on Maintenance Mode for components and hosts that run the service. While Maintenance Mode On prevents bulk operations being performed on the service, component, or host, you may explicitly start and stop a service, component, or host having Maintenance Mode On.
Setting Maintenance Mode for Services, Components, and Hosts
For example, examine using Maintenance Mode in a 3-node, Ambari-managed cluster installed using default options. This cluster has one data node, on host c6403. This example describes how to explicitly turn on Maintenance Mode for the HDFS service, alternative procedures for explicitly turning on Maintenance Mode for a host, and the implicit effects of turning on Maintenance Mode for a service, a component and a host.
How to Turn On Maintenance Mode for a Service
-
Using Services, select
HDFS
. -
Select Service Actions, then choose
Turn On Maintenance Mode
. -
Choose OK to confirm.
Notice, on Services Summary that Maintenance Mode turns on for the NameNode and SNameNode components.
How to Turn On Maintenance Mode for a Host
-
Using Hosts, select c6401.ambari.apache.org.
-
Select
Host Actions
, then chooseTurn On Maintenance Mode
. -
Choose OK to confirm.
Notice on Components, that Maintenance Mode turns on for all components.
How to Turn On Maintenance Mode for a Host (alternative using filtering for hosts)
-
Using Hosts, select c6403.ambari.apache.org.
-
In
Actions > Selected Hosts > Hosts
chooseTurn On Maintenance Mode
. -
Choose
OK
to confirm.Notice that Maintenance Mode turns on for host c6403.ambari.apache.org.
Your list of Hosts now shows Maintenance Mode On for hosts c6401 and c6403.

-
Hover your cursor over each Maintenance Mode icon appearing in the Hosts list.
-
Notice that hosts c6401 and c6403 have Maintenance Mode On.
-
Notice that on host c6401; HBaseMaster, HDFS client, NameNode, and ZooKeeper Server have Maintenance Mode turned On.
-
Notice on host c6402, that HDFS client and Secondary NameNode have Maintenance Mode On.
-
Notice on host c6403, that 15 components have Maintenance Mode On.
-
-
The following behavior also results:
-
Alerts are suppressed for the DataNode.
-
DataNode is skipped from HDFS Start/Stop/Restart All, Rolling Restart.
-
DataNode is skipped from all Bulk Operations except Turn Maintenance Mode ON/OFF.
-
DataNode is skipped from Start All and / Stop All components.
-
DataNode is skipped from a host-level restart/restart all/stop all/start.
-
Maintenance Mode Use Cases
Four common Maintenance Mode Use Cases follow:
-
You want to perform hardware, firmware, or OS maintenance on a host.
You want to:
-
Prevent alerts generated by all components on this host.
-
Be able to stop, start, and restart each component on the host.
-
Prevent host-level or service-level bulk operations from starting, stopping, or restarting components on this host.
To achieve these goals, turn On Maintenance Mode explicitly for the host. Putting a host in Maintenance Mode implicitly puts all components on that host in Maintenance Mode.
-
-
You want to test a service configuration change. You will stop, start, and restart the service using a rolling restart to test whether restarting picks up the change.
You want:
-
No alerts generated by any components in this service.
-
To prevent host-level or service-level bulk operations from starting, stopping, or restarting components in this service.
To achieve these goals, turn on Maintenance Mode explicitly for the service. Putting a service in Maintenance Mode implicitly turns on Maintenance Mode for all components in the service.
-
-
You turn off a service completely.
You want:
-
The service to generate no warnings.
-
To ensure that no components start, stop, or restart due to host-level actions or bulk operations.
To achieve these goals, turn On Maintenance Mode explicitly for the service. Putting a service in Maintenance Mode implicitly turns on Maintenance Mode for all components in the service.
-
-
A host component is generating alerts.
You want to:
-
Check the component.
-
Assess warnings and alerts generated for the component.
-
Prevent alerts generated by the component while you check its condition.
-
To achieve these goals, turn on Maintenance Mode explicitly for the host component. Putting a host component in Maintenance Mode prevents host-level and service-level bulk operations from starting or restarting the component. You can restart the component explicitly while Maintenance Mode is on.
Adding Hosts to a Cluster
To add new hosts to your cluster, browse to the Hosts page and select Actions >
+Add New Hosts
. The Add Host Wizard
provides a sequence of prompts similar to those in the Ambari Install Wizard. Follow
the prompts, providing information similar to that provided to define the first set
of hosts in your cluster.

Managing Services
Use Services
to monitor and manage selected services running in your Hadoop cluster.
All services installed in your cluster are listed in the leftmost Services
panel.

Services supports the following tasks:
Starting and Stopping All Services
To start or stop all listed services at once, select Actions
, then choose Start All
or Stop All
, as shown in the following example:

Selecting a Service
Selecting a service name from the list shows current summary, alert, and health information for the selected service. To refresh the monitoring panels and show information about a different service, select a different service name from the list.
Notice the colored dot next to each service name, indicating service operating status and a small, red, numbered rectangle indicating any alerts generated for the service.
Adding a Service
The Ambari install wizard installs all available Hadoop services by default. You may
choose to deploy only some services initially, then add other services at later times.
For example, many customers deploy only core Hadoop services initially. Add Service
supports deploying additional services without interrupting operations in your Hadoop
cluster. When you have deployed all available services, Add Service
displays disabled.
For example, if you are using HDP 2.2 Stack and did not install Falcon or Storm, you
can use the Add Service
capability to add those services to your cluster.
To add a service, select Actions > Add Service
, then complete the following procedure using the Add Service Wizard.
Adding a Service to your Hadoop cluster
This example shows the Falcon service selected for addition.
-
Choose
Services
.Choose an available service. Alternatively, choose all to add all available services to your cluster. Then, choose Next. The Add Service wizard displays installed services highlighted green and check-marked, not available for selection.
For more information about installing Ranger, see Installing Ranger.
For more information about Installing Spark, see Installing Spark. -
In
Assign Masters
, confirm the default host assignment. Alternatively, choose a different host machine to which master components for your selected service will be added. Then, choose Next.The Add Services Wizard indicates hosts on which the master components for a chosen service will be installed. A service chosen for addition shows a grey check mark.
Using the drop-down, choose an alternate host name, if necessary.-
A green label located on the host to which its master components will be added, or
-
An active drop-down list on which available host names appear.
-
-
In
Assign Slaves and Clients
, accept the default assignment of slave and client components to hosts. Then, choose Next.Alternatively, select hosts on which you want to install slave and client components. You must select at least one host for the slave of each service being added.
Host Roles Required for Added Services
Service Added
Host Role Required
YARN
NodeManager
HBase
RegionServer
The Add Service Wizard skips and disables the Assign Slaves and Clients step for a service requiring no slave nor client assignment.
-
In
Customize Services
, accept the default configuration properties.Alternatively, edit the default values for configuration properties, if necessary. Choose Override to create a configuration group for this service. Then, choose Next.
-
In Review, make sure the configuration settings match your intentions. Then, choose Deploy.
-
Monitor the progress of installing, starting, and testing the service. When the service installs and starts successfully, choose Next.
-
Summary displays the results of installing the service. Choose Complete.
-
Restart any other components having stale configurations.
Editing Service Config Properties
Select a service, then select Configs
to view and update configuration properties for the selected service. For example,
select MapReduce2, then select Configs. Expand a config category to view configurable
service properties. For example, select General to configure Default virtual memory
for a job's map task.

Viewing Summary, Alert, and Health Information
After you select a service, the Summary
tab displays basic information about the selected service.

Select one of the View Host
links, as shown in the following example, to view components and the host on which
the selected service is running.

Alerts and Health Checks
On each Service page, in the Summary area, click Alerts
to see a list of all health checks and their status for the selected service. Critical
alerts are shown first. Click the text title of each alert message in the list to
see the alert definition. For example, On the HBase > Services, click Alerts. Then,
in Alerts for HBase, click HBase Master Process.

Analyzing Service Metrics
Review visualizations in Metrics
that chart common metrics for a selected service. Services > Summary
displays metrics widgets for HDFS, HBase, Storm services. For more information about
using metrics widgets, see Scanning System Metrics.
Performing Service Actions
Manage a selected service on your cluster by performing service actions. In Services
, select the Service Actions
drop-down menu, then choose an option. Available options depend on the service you
have selected. For example, HDFS service action options include:

Optionally, choose Turn On Maintenance Mode
to suppress alerts generated by a service before performing a service action. Maintenance
Mode suppresses alerts and status indicator changes generated by the service, while
allowing you to start, stop, restart, move, or perform maintenance tasks on the service.
For more information about how Maintenance Mode affects bulk operations for host components,
see Setting Maintenance Mode.
Monitoring Background Operations
Optionally, use Background Operations to monitor progress and completion of bulk operations such as rolling restarts.
Background Operations opens by default when you run a job that executes bulk operations.
-
Select the right-arrow for each operation to show restart operation progress on each host.
-
After restarts complete, Select the right-arrow, or a host name, to view log files and any error messages generated on the selected host.
-
Select links at the upper-right to copy or open text files containing log and error information.
Optionally, select the option to not show the bulk operations dialog.
Using Quick Links
Select Quick Links
options to access additional sources of information about a selected service. For
example, HDFS Quick Links options include the native NameNode GUI, NameNode logs,
the NameNode JMX output, and thread stacks for the HDFS service. Quick Links are not
available for every service.

Rolling Restarts
When you restart multiple services, components, or hosts, use rolling restarts to distribute the task; minimizing cluster downtime and service disruption. A rolling restart stops, then starts multiple, running slave components such as DataNodes, NodeManagers, RegionServers, or Supervisors, using a batch sequence. You set rolling restart parameter values to control the number of, time between, tolerance for failures, and limits for restarts of many components across large clusters.
To run a rolling restart:
-
Select a Service, then link to a lists of specific components or hosts that Require Restart.
-
Select Restart, then choose a slave component option.
-
Review and set values for Rolling Restart Parameters.
-
Optionally, reset the flag to only restart components with changed configurations.
-
Choose Trigger Restart.
Use Monitor Background Operations to monitor progress of rolling restarts.
Setting Rolling Restart Parameters
When you choose to restart slave components, use parameters to control how restarts of components roll. Parameter values based on ten percent of the total number of components in your cluster are set as default values. For example, default settings for a rolling restart of components in a 3-node cluster restarts one component at a time, waits two minutes between restarts, will proceed if only one failure occurs, and restarts all existing components that run this service.
If you trigger a rolling restart of components, Restart components with stale configs defaults to true. If you trigger a rolling restart of services, Restart services with stale configs defaults to false.

Rolling restart parameter values must satisfy the following criteria:
Validation Rules for Rolling Restart Parameters
Parameter |
Required |
Value |
Description |
---|---|---|---|
Batch Size |
Yes |
Must be an integer > 0 |
Number of components to include in each restart batch. |
Wait Time |
Yes |
Must be an integer > = 0 |
Time (in seconds) to wait between queuing each batch of components. |
Tolerate up to x failures |
Yes |
Must be an integer > = 0 |
Total number of restart failures to tolerate, across all batches, before halting the restarts and not queuing batches. |
Aborting a Rolling Restart
To abort future restart operations in the batch, choose Abort Rolling Restart.

Refreshing YARN Capacity Scheduler
After you modify the Capacity Scheduler configuration, YARN supports refreshing the queues without requiring you to restart your ResourceManager. The “refresh” operation is valid if you have made no destructive changes to your configuration. Removing a queue is an example of a destructive change.
How to refresh the YARN Capacity Scheduler
This topic describes how to refresh the Capacity Scheduler in cases where you have added or modified existing queues.
-
In Ambari Web, browse to
Services > YARN > Summary
. -
Select
Service Actions
, then chooseRefresh YARN Capacity Scheduler
. -
Confirm you would like to perform this operation.
The refresh operation is submitted to the YARN ResourceManager.
Rebalancing HDFS
HDFS provides a “balancer” utility to help balance the blocks across DataNodes in the cluster.
How to rebalance HDFS
This topic describes how you can initiate an HDFS rebalance from Ambari.
-
. In Ambari Web, browse to
Services > HDFS > Summary
. -
Select
Service Actions
, then chooseRebalance HDFS
. -
Enter the Balance Threshold value as a percentage of disk capacity.
-
Click
Start
to begin the rebalance. -
You can check rebalance progress or cancel a rebalance in process by opening the Background Operations dialog.
Managing Service High Availability
Ambari provides the ability to configure the High Availability features available with the HDP Stack services. This section describes how to enable HA for the various Stack services.
NameNode High Availability
To ensure that a NameNode in your cluster is always available if the primary NameNode
host fails, enable and set up NameNode High Availability on your cluster using Ambari
Web.
Follow the steps in the Enable NameNode HA Wizard.
For more information about using the Enable NameNode HA Wizard, see How to Configure NameNode High Availability.
How To Configure NameNode High Availability
-
Check to make sure you have at least three hosts in your cluster and are running at least three ZooKeeper servers.
-
In Ambari Web, select
Services > HDFS > Summary.
-
Select Service Actions and choose Enable NameNode HA.
-
The Enable HA Wizard launches. This wizard describes the set of automated and manual steps you must take to set up NameNode high availability.
-
Get Started : This step gives you an overview of the process and allows you to select a Nameservice ID. You use this Nameservice ID instead of the NameNode FQDN once HA has been set up. Click
Next
to proceed. -
Select Hosts : Select a host for the additional NameNode and the JournalNodes. The wizard suggest options that you can adjust using the drop-down lists. Click
Next
to proceed. -
Review : Confirm your host selections and click
Next
. -
Create Checkpoints : Follow the instructions in the step. You need to log in to your current NameNode host to run the commands to put your NameNode into safe mode and create a checkpoint. When Ambari detects success, the message on the bottom of the window changes. Click
Next
. -
Configure Components : The wizard configures your components, displaying progress bars to let you track the steps. Click
Next
to continue. -
Initialize JournalNodes : Follow the instructions in the step. You need to login to your current NameNode host to run the command to initialize the JournalNodes. When Ambari detects success, the message on the bottom of the window changes. Click Next.
-
Start Components : The wizard starts the ZooKeeper servers and the NameNode, displaying progress bars to let you track the steps. Click
Next
to continue. -
Initialize Metadata : Follow the instructions in the step. For this step you must log in to both the current NameNode and the additional NameNode. Make sure you are logged in to the correct host for each command. Click
Next
when you have completed the two commands. A Confirmation pop-up window displays, reminding you to do both steps. ClickOK
to confirm. -
Finalize HA Setup : The wizard the setup, displaying progress bars to let you track the steps. Click
Done
to finish the wizard. After the Ambari Web GUI reloads, you may see some alert notifications. Wait a few minutes until the services come back up. If necessary, restart any components using Ambari Web. -
If you are using Hive, you must manually change the Hive Metastore FS root to point to the Nameservice URI instead of the NameNode URI. You created the Nameservice ID in the Get Started step.
-
Check the current FS root. On the Hive host:
hive --config /etc/hive/conf.server --service metatool -listFSRoot
The output looks similar to the following:
Listing FS Roots... hdfs://<namenode-host>/apps/hive/warehouse
-
Use this command to change the FS root:
$ hive --config /etc/hive/conf.server --service metatool -updateLocation <new-location><old-location>
$ hive --config /etc/hive/conf.server --service metatool -updateLocation hdfs://mycluster/apps/hive/warehouse hdfs://c6401.ambari.apache.org/apps/hive/warehouse
The output looks similar to the following:
Successfully updated the following locations...
Updated X records in SDS table
-
-
Adjust the ZooKeeper Failover Controller retries setting for your environment.
-
Browse to
Services > HDFS > Configs >
core-site.
-
Set
ha.failover-controller.active-standby-elector.zk.op.retries=120
-
How to Roll Back NameNode HA
To roll back NameNode HA to the previous non-HA state use the following step-by-step manual process, depending on your installation.
Stop HBase
-
From Ambari Web, go to the Services view and select HBase.
-
Choose
Service Actions > Stop
. -
Wait until HBase has stopped completely before continuing.
Checkpoint the Active NameNode
If HDFS has been in use after you enabled NameNode HA, but you wish to revert back to a non-HA state, you must checkpoint the HDFS state before proceeding with the rollback.
If the Enable NameNode HA
wizard failed and you need to revert back, you can skip this step and move on to
Stop All Services.
-
If Kerberos security has not been enabled on the cluster:
On the Active NameNode host, execute the following commands to save the namespace. You must be the HDFS service user to do this.
sudo su -l <HDFS_USER> -c 'hdfs dfsadmin -safemode enter' sudo su -l <HDFS_USER> -c 'hdfs dfsadmin -saveNamespace'
-
If Kerberos security has been enabled on the cluster:
sudo su -l <HDFS_USER> -c 'kinit -kt /etc/security/keytabs/nn.service.keytab nn/<HOSTNAME>@<REALM>;hdfs dfsadmin -safemode enter' sudo su -l <HDFS_USER> -c 'kinit -kt /etc/security/keytabs/nn.service.keytab nn/<HOSTNAME>@<REALM>;hdfs dfsadmin -saveNamespace'
Where
<HDFS_USER>
is the HDFS service user; for example hdfs,<HOSTNAME>
is the Active NameNode hostname, and<REALM>
is your Kerberos realm.
Stop All Services
Browse to Ambari Web > Services
, then choose Stop All
in the Services navigation panel. You must wait until all the services are completely
stopped.
Prepare the Ambari Server Host for Rollback
Log into the Ambari server host and set the following environment variables to prepare for the rollback procedure:
Variable |
Value |
---|---|
export AMBARI_USER=AMBARI_USERNAME |
Substitute the value of the administrative user for Ambari Web. The default value is admin. |
export AMBARI_PW=AMBARI_PASSWORD |
Substitute the value of the administrative password for Ambari Web. The default value is admin. |
export AMBARI_PORT=AMBARI_PORT |
Substitute the Ambari Web port. The default value is 8080. |
export AMBARI_PROTO=AMBARI_PROTOCOL |
Substitute the value of the protocol for connecting to Ambari Web. Options are http or https. The default value is http. |
export CLUSTER_NAME=CLUSTER_NAME |
Substitute the name of your cluster, set during the Ambari Install Wizard process. For example: mycluster. |
export NAMENODE_HOSTNAME=NN_HOSTNAME |
Substitute the FQDN of the host for the non-HA NameNode. For example: nn01.mycompany.com. |
export ADDITIONAL_NAMENODE_HOSTNAME=ANN_HOSTNAME |
Substitute the FQDN of the host for the additional NameNode in your HA setup. |
export SECONDARY_NAMENODE_HOSTNAME=SNN_HOSTNAME |
Substitute the FQDN of the host for the standby NameNode for the non-HA setup. |
export JOURNALNODE1_HOSTNAME=JOUR1_HOSTNAME |
Substitute the FQDN of the host for the first Journal Node. |
export JOURNALNODE2_HOSTNAME=JOUR2_HOSTNAME |
Substitute the FQDN of the host for the second Journal Node. |
export JOURNALNODE3_HOSTNAME=JOUR3_HOSTNAME |
Substitute the FQDN of the host for the third Journal Node. |
Double check that these environment variables are set correctly.
Restore the HBase Configuration
If you have installed HBase, you may need to restore a configuration to its pre-HA state.
-
To check if your current HBase configuration needs to be restored, on the Ambari Server host:
/var/lib/ambari-server/resources/scripts/configs.sh -u <AMBARI_USER> -p <AMBARI_PW> -port <AMBARI_PORT> get localhost <CLUSTER_NAME> hbase-site
Where the environment variables you set up in Prepare the Ambari Server Host for Rollback substitute for the variable names.
Look for the configuration property
hbase.rootdir
. If the value is set to the NameService ID you set up using theEnable NameNode HA
wizard, you need to revert thehbase-site
configuration set up back to non-HA values. If it points instead to a specific NameNode host, it does not need to be rolled back and you can go on to Delete ZooKeeper Failover Controllers.For example:
"hbase.rootdir":"hdfs://<name-service-id>:8020/apps/hbase/data"
The hbase.rootdir property points to the NameService ID and the value needs to be rolled back"hbase.rootdir":"hdfs://<nn01.mycompany.com>:8020/apps/hbase/data"
The hbase.rootdir property points to a specific NameNode host and not a NameService ID. This does not need to be rolled back. -
If you need to roll back the
hbase.rootdir
value, on the Ambari Server host, use theconfig.sh
script to make the necessary change:/var/lib/ambari-server/resources/scripts/configs.sh -u <AMBARI_USER> -p
<AMBARI_PW> -port <AMBARI_PORT> set localhost <CLUSTER_NAME> hbase-site hbase.rootdir hdfs://<NAMENODE_HOSTNAME>:8020/apps/hbase/data
Where the environment variables you set up in Prepare the Ambari Server Host for Rollback substitute for the variable names.
-
Verify that the
hbase.rootdir
property has been restored properly. On the Ambari Server host:/var/lib/ambari-server/resources/scripts/configs.sh -u <AMBARI_USER> -p <AMBARI_PW> -port <AMBARI_PORT> get localhost <CLUSTER_NAME> hbase-site
The
hbase.rootdir
property should now be set to the NameNode hostname, not the NameService ID.
Delete ZooKeeper Failover Controllers
You may need to delete ZooKeeper (ZK) Failover Controllers.
-
To check if you need to delete ZK Failover Controllers, on the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/host_components?
HostRoles/component_name=ZKFC
If this returns an empty
items
array, you may proceed to Modify HDFS Configuration. Otherwise you must use the following DELETE commands: -
To delete all ZK Failover Controllers, on the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X DELETE <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/hosts/<NAMENODE_HOSTNAME>/host_components/ZKFC curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X DELETE <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/hosts/<ADDITIONAL_NAMENODE_HOSTNAME>/host_components/ZKFC
-
Verify that the ZK Failover Controllers have been deleted. On the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/host_components?HostRoles/component_name=ZKFC
This command should return an empty
items
array.
Modify HDFS Configurations
You may need to modify your hdfs-site
configuration and/or your core-site
configuration.
-
To check if you need to modify your
hdfs-site
configuration, on the Ambari Server host:/var/lib/ambari-server/resources/scripts/configs.sh -u <AMBARI_USER> -p <AMBARI_PW> -port <AMBARI_PORT> get localhost <CLUSTER_NAME> hdfs-site
If you see any of the following properties, you must delete them from your configuration.
-
dfs.nameservices
-
dfs.client.failover.proxy.provider.<NAMESERVICE_ID>
-
dfs.ha.namenodes.<NAMESERVICE_ID>
-
dfs.ha.fencing.methods
-
dfs.ha.automatic-failover.enabled
-
dfs.namenode.http-address.<NAMESERVICE_ID>.nn1
-
dfs.namenode.http-address.<NAMESERVICE_ID>.nn2
-
dfs.namenode.rpc-address.<NAMESERVICE_ID>.nn1
-
dfs.namenode.rpc-address.<NAMESERVICE_ID>.nn2
-
dfs.namenode.shared.edits.dir
-
dfs.journalnode.edits.dir
-
dfs.journalnode.http-address
-
dfs.journalnode.kerberos.internal.spnego.principal
-
dfs.journalnode.kerberos.principal
-
dfs.journalnode.keytab.file
Where
<NAMESERVICE_ID>
is the NameService ID you created when you ran the Enable NameNode HA wizard.
-
-
To delete these properties, execute the following for each property you found. On the Ambari Server host:
/
var/lib/ambari-server/resources/scripts/configs.sh -u <AMBARI_USER> -p <AMBARI_PW> -port <AMBARI_PORT> delete localhost <CLUSTER_NAME> hdfs-site property_name
Where you replace
property_name
with the name of each of the properties to be deleted. -
Verify that all of the properties have been deleted. On the Ambari Server host:
/var/lib/ambari-server/resources/scripts/configs.sh -u <AMBARI_USER> -p <AMBARI_PW> -port <AMBARI_PORT> get localhost <CLUSTER_NAME> hdfs-site
None of the properties listed above should be present.
-
To check if you need to modify your
core-site
configuration, on the Ambari Server host:/var/lib/ambari-server/resources/scripts/configs.sh -u <AMBARI_USER> -p <AMBARI_PW> -port <AMBARI_PORT> get localhost <CLUSTER_NAME> core-site
-
If you see the property
ha.zookeeper.quorum
, it must be deleted. On the Ambari Server host:/var/lib/ambari-server/resources/scripts/configs.sh -u <AMBARI_USER> -p <AMBARI_PW> -port <AMBARI_PORT> delete localhost <CLUSTER_NAME> core-site ha.zookeeper.quorum
-
If the property
fs.defaultFS
is set to the NameService ID, it must be reverted back to its non-HA value. For example:"fs.defaultFS":"hdfs://<name-service-id>" The property fs.defaultFS needs to be modified as it points to a NameService ID "fs.defaultFS":"hdfs://<nn01.mycompany.com>"
The propertyfs.defaultFS
does not need to be changed as it points to a specific NameNode, not to a NameService ID -
To revert the property
fs.defaultFS
to the NameNode host value, on the Ambari Server host:/var/lib/ambari-server/resources/scripts/configs.sh -u <AMBARI_USER> -p <AMBARI_PW> -port <AMBARI_PORT> set localhost <CLUSTER_NAME> core-site fs.defaultFS hdfs://<NAMENODE_HOSTNAME>
-
Verify that the
core-site
properties are now properly set. On the Ambari Server host:/var/lib/ambari-server/resources/scripts/configs.sh -u <AMBARI_USER> -p <AMBARI_PW> -port <AMBARI_PORT> get localhost <CLUSTER_NAME> core-site
The property
fs.defaultFS
should be set to point to the NameNode host and the propertyha.zookeeper.quorum
should not be there.
Recreate the Standby NameNode
You may need to recreate your standby NameNode.
-
To check to see if you need to recreate the standby NameNode, on the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X GET <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/host_components?HostRoles/component_name=SECONDARY_NAMENODE
If this returns an empty
items
array, you must recreate your standby NameNode. Otherwise you can go on to Re-enable Standby NameNode. -
Recreate your standby NameNode. On the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X POST -d '{"host_components" : [{"HostRoles":{"component_name":"SECONDARY_NAMENODE"}] }' <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/hosts?Hosts/host_name=<SECONDARY_NAMENODE_HOSTNAME>
-
Verify that the standby NameNode now exists. On the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X GET <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/host_components?HostRoles/component_name=SECONDARY_NAMENODE
This should return a non-empty
items
array containing the standby NameNode.
Re-enable the Standby NameNode
To re-enable the standby NameNode, on the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X '{"RequestInfo":{"context":"Enable
Secondary NameNode"},"Body":{"HostRoles":{"state":"INSTALLED"}}}'<AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/hosts/<SECONDARY_NAMENODE_HOSTNAME}/host_components/SECONDARY_NAMENODE
-
If this returns 200, go to Delete All JournalNodes.
-
If this returns 202, wait a few minutes and run the following command on the Ambari Server host:
curl -u <AMBARI_USER>:${AMBARI_PW -H "X-Requested-By: ambari" -i -X "<AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/
<CLUSTER_NAME>/host_components?HostRoles/component_name=SECONDARY_NAMENODE&fields=HostRoles/state"
When
"state" : "INSTALLED"
is in the response, go on to the next step.
Delete All JournalNodes
You may need to delete any JournalNodes.
-
To check to see if you need to delete JournalNodes, on the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X GET <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/host_components?HostRoles/component_name=JOURNALNODE
If this returns an empty
items
array, you can go on to Delete the Additional NameNode. Otherwise you must delete the JournalNodes. -
To delete the JournalNodes, on the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X DELETE <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/hosts/<JOURNALNODE1_HOSTNAME>/host_components/JOURNALNODE curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X DELETE <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/hosts/<JOURNALNODE2_HOSTNAME>/host_components/JOURNALNODE curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X DELETE <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/hosts/<JOURNALNODE3_HOSTNAME>/host_components/JOURNALNODE
-
Verify that all the JournalNodes have been deleted. On the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X GET <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/host_components?HostRoles/component_name=JOURNALNODE
This should return an empty
items
array.
Delete the Additional NameNode
You may need to delete your Additional NameNode.
-
To check to see if you need to delete your Additional NameNode, on the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X GET <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/host_components?HostRoles/component_name=NAMENODE
If the
items
array contains two NameNodes, the Additional NameNode must be deleted. -
To delete the Additional NameNode that was set up for HA, on the Ambari Server host:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X DELETE <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/hosts/<ADDITIONAL_NAMENODE_HOSTNAME>/host_components/NAMENODE
-
Verify that the Additional NameNode has been deleted:
curl -u <AMBARI_USER>:<AMBARI_PW> -H "X-Requested-By: ambari" -i -X GET <AMBARI_PROTO>://localhost:<AMBARI_PORT>/api/v1/clusters/<CLUSTER_NAME>/host_components?HostRoles/component_name=NAMENODE
This should return an
items
array that shows only one NameNode.
Verify the HDFS Components
Make sure you have the correct components showing in HDFS.
-
Go to
Ambari Web UI > Services
, then selectHDFS
. -
Check the Summary panel and make sure that the first three lines look like this:
-
NameNode
-
SNameNode
-
DataNodes
You should not see any line for JournalNodes.
-
Start HDFS
-
In the
Ambari Web UI
, selectService Actions
, then chooseStart
.Wait until the progress bar shows that the service has completely started and has passed the service checks.
If HDFS does not start, you may need to repeat the previous step. -
To start all of the other services, select
Actions > Start All
in theServices
navigation panel.
ResourceManager High Availability
The following topic explains How to Configure ResourceManager High Availability.
How to Configure ResourceManager High Availability
-
Check to make sure you have at least three hosts in your cluster and are running at least three ZooKeeper servers.
-
In Ambari Web, browse to
Services > YARN > Summary
. SelectService Actions
and chooseEnable ResourceManager HA
. -
The Enable ResourceManager HA Wizard launches. The wizard describes a set of automated and manual steps you must take to set up ResourceManager High Availability.
-
Get Started: This step gives you an overview of enabling ResourceManager HA. Click
Next
to proceed. -
Select Host: The wizard shows you the host on which the current ResourceManager is installed and suggests a default host on which to install an additional ResourceManager. Accept the default selection, or choose an available host. Click
Next
to proceed. -
Review Selections: The wizard shows you the host selections and configuration changes that will occur to enable ResourceManager HA. Expand YARN, if necessary, to review all the YARN configuration changes. Click
Next
to approve the changes and start automatically configuring ResourceManager HA. -
Configure Components: The wizard configures your components automatically, displaying progress bars to let you track the steps. After all progress bars complete, click
Complete
to finish the wizard.
HBase High Availability
During the HBase service install, depending on your component assignment, Ambari installs and configures one HBase Master component and multiple RegionServer components. To setup high availability for the HBase service, you can run two or more HBase Master components by adding an HBase Master component. Once running two or more HBase Masters, HBase uses ZooKeeper for coordination of the active Master.
Adding an HBase Master Component
-
In Ambari Web, browse to
Services > HBase
. -
In Service Actions, select the
+ Add HBase Master
option. -
Choose the host to install the additional HBase Master, then choose Confirm Add.
Ambari installs the new HBase Master and reconfigure HBase to handle multiple Master instances.
Hive High Availability
The Hive service has multiple, associated components. The primary Hive components are: Hive Metastore and HiveServer2. To setup high availability for the Hive service, you can run two or more of each of those components.
Adding a Hive Metastore Component
-
In Ambari Web, browse to
Services > Hive
. -
In Service Actions, select the
+ Add Hive Metastore
option. -
Choose the host to install the additional Hive Metastore, then choose Confirm Add.
-
Ambari installs the component and reconfigures Hive to handle multiple Hive Metastore instances.
Adding a HiveServer2 Component
-
In Ambari Web, browse to the host where you would like to install another HiveServer2.
-
On the Host page, choose
+Add
. -
Select
HiveServer2
from the list. -
Ambari installs the new HiveServer2.
Ambari installs the component and reconfigures Hive to handle multiple Hive Metastore instances.
Oozie High Availability
To setup high availability for the Oozie service, you can run two or more instances of the Oozie Server component.
Adding an Oozie Server Component
-
In Ambari Web, browse to the host where you would like to install another Oozie Server.
-
On the Host page, click the “+Add” button.
-
Select “Oozie Server” from the list and Ambari will install the new Oozie Server.
-
After configuring your external Load Balancer, update the oozie configuration.
-
Browse to Services > Oozie > Configs and in oozie-site add the following:
Property
Value
oozie.zookeeper.connection.string
List of ZooKeeper hosts with ports. For example:
c6401.ambari.apache.org:2181,c6402.ambari.apache.org:2181,c6403.ambari.apache.org:2181oozie.services.ext
org.apache.oozie.service.ZKLocksService,org.apache.oozie.service.ZKXLogStreamingService,org.apache.oozie.service.ZKJobsConcurrencyService
oozie.base.url
http://<loadbalancer.hostname>:11000/oozie
-
In oozie-env, uncomment OOZIE_BASE_URL property and change value to point to the Load Balancer. For example:
export OOZIE_BASE_URL="http://<loadbalance.hostname>:11000/oozie"
-
Restart Oozie service for the changes to take affect.
-
Update HDFS configs for the Oozie proxy user. Browse to Services > HDFS > Configs and in core-site update the hadoop.proxyuser.oozie.hosts property to include the newly added Oozie Server host. Hosts should be comma separated.
-
Restart all needed services.
Managing Configurations
Use Ambari Web to manage your HDP component configurations. Select any of the following topics:
Configuring Services
Select a service, then select Configs
to view and update configuration properties for the selected service. For example,
select MapReduce2, then select Configs. Expand a config category to view configurable
service properties.
Updating Service Properties
-
Expand a configuration category.
-
Edit values for one or more properties that have the Override option.
Edited values, also called stale configs, show an Undo option.
-
Choose Save.
Restarting components
After editing and saving a service configuration, Restart indicates components that
you must restart.
Select the Components or Hosts links to view details about components or hosts requiring
a restart.
Then, choose an option appearing in Restart. For example, options to restart YARN
components include:

Using Host Config Groups
Ambari initially assigns all hosts in your cluster to one, default configuration group
for each service you install. For example, after deploying a three-node cluster with
default configuration settings, each host belongs to one configuration group that
has default configuration settings for the HDFS service. In Configs, select Manage Config Groups
, to create new groups, re-assign hosts, and override default settings for host components
you assign to each group.

To create a Configuration Group:
-
Choose
Add New Configuration Group
. -
Name and describe the group, then choose Save.
-
Select a Config Group, then choose Add Hosts to Config Group.
-
Select Components and choose from available Hosts to add hosts to the new group.
Select Configuration Group Hosts enforces host membership in each group, based on installed components for the selected service.
-
Choose OK.
-
In Manage Configuration Groups, choose Save.
To edit settings for a configuration group:
-
In Configs, choose a Group.
-
Select a Config Group, then expand components to expose settings that allow Override.
-
Provide a non-default value, then choose Override or Save.
Configuration groups enforce configuration properties that allow override, based on installed components for the selected service and group.
-
Override prompts you to choose one of the following options:
-
Select an existing configuration group (to which the property value override provided in step 3 will apply), or
-
Create a new configuration group (which will include default properties, plus the property override provided in step 3).
-
Then, choose
OK
.
-
-
In Configs, choose Save.
Customizing Log Settings
Ambari Web displays default logging properties in Service Configs > Custom log 4j Properties
. Log 4j properties control logging activities for the selected service.

Restarting components in the service pushes the configuration properties displayed in Custom log 4j Properties to each host running components for that service. If you have customized logging properties that define how activities for each service are logged, you will see refresh indicators next to each service name after upgrading to Ambari 1.5.0 or higher. Make sure that logging properties displayed in Custom log 4j Properties include any customization. Optionally, you can create configuration groups that include custom logging properties. For more information about saving and overriding configuration settings, see Editing Service Config Properties.
Downloading Client Configs
For Services that include client components (for example Hadoop Client or Hive Client), you can download the client configuration files associated with that client from Ambari.
-
In Ambari Web, browse to the Service with the client for which you want the configurations.
-
Choose
Service Actions
. -
Choose
Download Client Configs
. You are prompted for a location to save the client configs bundle. -
Save the bundle.
Service Configuration Versions
Ambari provides the ability to manage configurations associated with a Service. You can make changes to configurations, see a history of changes, compare + revert changes and push configuration changes to the cluster hosts.
Basic Concepts
It’s important to understand how service configurations are organized and stored in
Ambari. Properties are grouped into Configuration Types (config types). A set of config
types makes up the set of configurations for a service.
For example, the HDFS Service includes the following config types: hdfs-site, core-site,
hdfs-log4j, hadoop-env, hadoop-policy. If you browse to Services > HDFS > Configs
, the configuration properties for these config types are available for edit.
Versioning of configurations is performed at the service-level. Therefore, when you
modify a configuration property in a service, Ambari will create a Service Config
Version. The figure below shows V1 and V2 of a Service Configuration Version with
a change to a property in Config Type A. After making the property change to Config
Type A in V1, V2 is created.

Terminology
The following table lists configuration versioning terms and concepts that you should know.
Term |
Description |
---|---|
Configuration Property |
Configuration property managed by Ambari, such as NameNode heapsize or replication factor. |
Configuration Type (Config Type) |
Group of configuration properties. For example: hdfs-site is a Config Type. |
Service Configurations |
Set of configuration types for a particular service. For example: hdfs-site and core-site Config Types are part of the HDFS Service Configuration. |
Change Notes |
Optional notes to save with a service configuration change. |
Service Config Version (SCV) |
Particular version of configurations for a specific service. Ambari saves a history of service configuration versions. |
Host Config Group (HCG) |
Set of configuration properties to apply to a specific set of hosts. Each service has a default Host Config Group, and custom config groups can be created on top of the default configuration group to target property overrides to one or more hosts in the cluster. See Managing Configuration Groups for more information. |
Saving a Change
-
Make the configuration property change.
-
Choose Save.
-
You are prompted to enter notes that describe the change.
-
Click Save to confirm your change. Cancel will not save but instead returns you to the configuration page to continuing editing.
To revert the changes you made and not save, choose Discard.
To return to the configuration page and continue editing without saving changes, choose Cancel.
Viewing History
Service Config Version history is available from Ambari Web in two places: On the
Dashboard page under the Config History tab; and on each Service page under the Configs
tab.
The Dashboard > Config History
tab shows a list of all versions across services with each version number and the
date and time the version was created. You can also see which user authored the change
with the notes entered during save. Using this table, you can filter, sort and search
across versions.

The most recent configuration changes are shown on the Service > Configs
tab. Users can navigate the version scrollbar left-right to see earlier versions.
This provides a quick way to access the most recent changes to a service configuration.

Click on any version in the scrollbar to view, and hover to display an option menu which allows you compare versions and perform a revert. Performing a revert makes any config version that you select the current version.

Comparing Versions
When navigating the version scroll area on the Services > Configs
tab, you can hover over a version to display options to view, compare or revert.

- To perform a compare between two service configuration versions:
-
Navigate to a specific configuration version. For example “V6”.
-
Using the version scrollbar, find the version would you like to compare against “V6”. For example, if you want to compare V6 to V2, find V2 in the scrollbar.
-
Hover over the version to display the option menu. Click “Compare”.
-
Ambari displays a comparison of V6 to V2, with an option to revert to V2.
-
Ambari also filters the display by only “Changed properties”. This option is available under the Filter control.

Reverting a Change
You can revert to an older service configuration version by using the “Make Current” feature. The “Make Current” will actually create a new service configuration version with the configuration properties from the version you are reverting -- it is effectively a “clone”. After initiating the Make Current operation, you are prompted to enter notes for the new version (i.e. the clone) and save. The notes text will include text about the version being cloned.

There are multiple methods to revert to a previous configuration version:
-
View a specific version and click the “Make V* Current” button.
-
Use the version navigation dropdown and click the “Make Current” button.
-
Hover on a version in the version scrollbar and click the “Make Current” button.
-
Perform a comparison and click the “Make V* Current” button.
Versioning and Host Config Groups
Service configuration versions are scoped to a host config group. For example, changes made in the default group can be compared and reverted in that config group. Same with custom config groups.
The following example describes a flow where you have multiple host config groups and create service configuration versions in each config group.





Administering the Cluster
From the cluster dashboard, use the Admin options to view information about Managing Stack and Versions, Service Accounts, and to Enable Kerberos security.
Managing Stack and Versions
The Stack section includes information about the Services installed and available in the cluster
Stack. Browse the list of Services and click Add Service to start the wizard to install Services into your cluster.
The Versions section shows what version of software is currently running and installed in the
cluster. This section also exposes the capability to perform an automated cluster
upgrade for maintenance and patch releases for the Stack. This capability is available
for HDP 2.2 Stack only. If you have a cluster running HDP 2.2, you can perform Stack
upgrades to later maintenance and patch releases. For example: you can upgrade from
the GA release of HDP 2.2 (which is HDP 2.2.0.0) to the first maintenance release
of HDP 2.2 (which is HDP 2.2.4.2).
The process for managing versions and performing an upgrade is comprised of three main steps:
-
Register a Version into Ambari
-
Install the Version into the Cluster
-
Perform Upgrade to the New Version
Register a Version
Ambari can manage multiple versions of Stack software.
To register a new version:
-
On the Versions tab, click
Manage Versions
. -
Proceed to register a new version by clicking
+ Register Version
. -
Enter a two-digit version number. For example, enter 4.2, (which makes the version HDP-2.2.4.2).
-
Select one or more OS families and enter the respective Base URLs.
-
Click
Save
. -
You can click “Install On...” or you can browse back to
Admin > Stack and Versions > Versions
tab. You will see the version current running and the version you just registered. Proceed to Install the Version.
Install the Version
To install a version in the cluster:
-
On the versions tab, click
Install Packages
. -
Click OK to confirm.
-
The Install version operation will start and the new version will be installed on all hosts.
-
You can browse to Hosts and to each
Host > Versions
tab to see the new version is installed. Proceed to Perform Upgrade.
Perform Upgrade
Once your target version has been registered into Ambari, installed on all hosts in the cluster and you meet the Prerequisites you are ready to perform an upgrade.
The perform upgrade process switches over the services in the cluster to a new version
in a rolling fashion. The process follows the flow below. Starting with ZooKeeper
and the Core Master components, ending with a Finalize step. To ensure the process
runs smoothly, this process includes some manual prompts for you to perform cluster
verification and testing along the way. You will be prompted when your input is required.

Upgrade Prerequisites
To perform an automated cluster upgrade from Ambari, your cluster must meet the following prerequisites:
Item |
Requirement |
Description |
---|---|---|
Cluster |
Stack Version |
Must be running HDP 2.2 Stack. This capability is not available for HDP 2.0 or 2.1 Stacks. |
Version |
New Version |
All hosts must have the new version installed. |
HDFS |
NameNode HA |
NameNode HA must be enabled and working properly. See the Ambari User’s Guide for more information Configuring NameNode High Availability. |
HDFS |
Decommission |
No components should be in decommissioning or decommissioned state. |
YARN |
YARN WPR |
Work Preserving Restart must be configured. |
Hosts |
Heartbeats |
All Ambari Agents must be heartbeating to Ambari Server. Any hosts that are not heartbeating must be in Maintenance Mode. |
Hosts |
Maintenance Mode |
Any hosts in Maintenance Mode must not be hosting any Service master components. |
Services |
Services Started |
All Services must be started. |
Services |
Maintenance Mode |
No Services can be in Maintenance Mode. |
To perform an upgrade to a new version.
-
On the versions tab, click
Perform Upgrade
on the new version. -
Follow the steps on the wizard.
Service Accounts
To view the list of users and groups used by the cluster services, choose Admin > Service Accounts
.

Kerberos
If Kerberos has not been enabled in your cluster, click the Enable Kerberos button to launch the Kerberos wizard. For more information on configuring Kerberos in your cluster, see the Ambari Security Guide. Once Kerberos is enabled, you can:
How To Regenerate Keytabs
-
Browse to
Admin > Kerberos
. -
Click the
Regenerate Kerberos
button. -
Confirm your selection to proceed.
-
Optionally, you can regenerate keytabs for only those hosts that are missing keytabs. For example, hosts that were not online/available from Ambari when enabling Kerberos.
-
Once you confirm, Ambari will connect to the KDC and regenerate the keytabs for the Service and Ambari principals in the cluster.
-
Once complete, you must restart all services for the new keytabs to be used.
How To Disable Kerberos
-
Browse to
Admin > Kerberos
. -
Click the
Disable Kerberos
button. -
Confirm your selection to proceed. Cluster services will be stopped and the Ambari Kerberos security settings will be reset.
-
To re-enable Kerberos, click Enable Kerberos and follow the wizard steps. For more information on configuring Kerberos in your cluster, see the Ambari Security Guide.
Monitoring and Alerts
Ambari monitors cluster health and can alert you in the case of certain situations to help you identify and troubleshoot problems. You manage how alerts are organized, under which conditions notifications are sent, and by which method. This section provides information on:
Managing Alerts
Ambari predefines a set of alerts that monitor the cluster components and hosts. Each alert is defined by an Alert Definition, which specifies the checking interval and thresholds (which are dependent on the Alert Type). When a cluster is created or modified, Ambari reads the Alert Definitions and creates Alert Instances for the specific components to watch.
Terms and Definitions
The following basic terms help describe the key concepts associated with Ambari Alerts:
Terminology
Term |
Definition |
---|---|
Alert Definition |
Defines the alert including the description, check interval, type and thresholds. |
Type |
The type of alert, such as PORT or METRIC. |
State |
Indicates the state of an alert definition. Enabled or disabled. When disabled, no alert instances are created. |
Alert Instance |
Represents the specific alert instances based on an alert definition. For example, the alert definition for DataNode process will have an alert instance per DataNode in the cluster. |
Status |
An alert instance status is defined by severity. The most common severity levels are OK, WARN, CRIT but there are also severities for UNKNOWN and NONE. See “Alert Instances” for more information. |
Threshold |
The thresholds assigned to each status. |
Alert Group |
Grouping of alert definitions, useful for handling notifications targets. |
Notification |
A notification target for when an alert instance status changes. Methods of notification include EMAIL and SNMP. |
Alert Definitions and Instances
An Alert Definition includes name, description and check interval, as well as configurable
thresholds for each status (depending on the Alert Type).
The following table lists the types of alerts, their possible status and if the thresholds
are configurable:
Alert Types
Type |
Description |
Status |
Thresholds Configurable |
Units |
---|---|---|---|---|
PORT |
Watches a port based on a configuration property as the uri. Example: Hive Metastore Process |
OK, WARN, CRIT |
Yes |
seconds |
METRIC |
Watches a metric based on a configuration property. Example: ResourceManager RPC Latency |
OK, WARN, CRIT |
Yes |
variable |
AGGREGATE |
Aggregate of status for another alert definition. Example: percentage NodeManagers Available |
OK, WARN, CRIT |
Yes |
percentage |
WEB |
Watches a Web UI and adjusts status based on response. Example: App Timeline Web UI |
OK, WARN, CRIT |
No |
n/a |
SCRIPT |
Uses a custom script to handle checking. Example: NodeManager Health Summary |
OK, CRIT |
No |
n/a |
How To Change an Alert
-
Browse to the Alerts section in Ambari Web.
-
Find the alert definition to modify and click to view the definition details.
-
Click to Edit the description, check interval or thresholds.
-
Changes will take effect on all alert instances at the next interval check.
How To View a List of Alert Instances
-
Browse to the Alerts section in Ambari Web.
-
Find the alert definition and click to view the definition details.
-
The list of alert instances is shown.
-
Alternatively, you can browse to a specific host via the Hosts section of Ambari Web to view the list of alert instances specific to that host.
How To Enable or Disable an Alert
-
Browse to the Alerts section in Ambari Web.
-
Find the alert definition. Click to enable/disable.
-
Alternatively, you can click to view the definition details and click to enable/disable.
-
When disabled, not alert instances are in effect, therefore no alerts will be reported or dispatched for the alert definition.
Configuring Notifications
With Alert Groups and Notifications, you can create groups of alerts and setup notification
targets for each group. This way, you can notify different parties interested in certain
sets of alerts via different methods. For example, you might want your Hadoop Operations
team to receive all alerts via EMAIL, regardless of status. And at the same time,
have your System Administration team receive all RPC and CPU related alerts that are
Critical only via SNMP. To achieve this scenario, you would have an Alert Notification
that handles Email for all alert groups for all severity levels, and you would have
a different Alert Notification group that handles SNMP on critical severity for an
Alert Group that contains the RPC and CPU alerts.
Ambari defines a set of default Alert Groups for each service installed in the cluster.
For example, you will see a group for HDFS Default. These groups cannot be deleted
and the alerts in these groups are not modifiable. If you choose not to use these
groups, just do not set a notification target for them.
- Creating or Editing Notifications
-
Browse to the Alerts section in Ambari Web.
-
Under the Actions menu, click Manage Notifications.
-
The list of existing notifications is shown.
-
Click + to “Create new Alert Notification”. The Create Alert Notification is displayed.
-
Enter the notification name, select that groups the notification should be assigned to (all or a specific set), select the Severity levels that this notification responds to, include a description, and choose the method for notification (EMAIL or SNMP).
-
For EMAIL: you will need to provide information about your SMTP infrastructure such as SMTP Server, Port, To/From address and if authentication is required to relay messages through the server. You can add custom properties to the SMTP configuration based on the Javamail SMTP options.
-
For SNMP: you will need to select the SNMP version, OIDs, community and port.
-
-
After completing the notification, click Save.
- Creating or Editing Alert Groups
-
Browse to the Alerts section in Ambari Web.
-
From the Actions menu, choose Manage Alert Groups
-
The list of existing groups (default and custom) is shown.
-
Choose + to “Create Alert Group”. Enter the Group a name and click Save.
-
By clicking on the custom group in the list, you can add or delete alert definitions from this group, and change the notification targets for the group.
List of Predefined Alerts
HDFS Service Alerts
Alert |
Description |
Potential Causes |
Possible Remedies |
---|---|---|---|
NameNode Blocks health |
This service-level alert is triggered if the number of corrupt or missing blocks exceeds the configured critical threshold. |
Some DataNodes are down and the replicas that are missing blocks are only on those
DataNodes. |
For critical data, use a replication factor of 3. |
NameNode process |
This host-level alert is triggered if the NameNode process cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds. |
The NameNode process is down on the HDFS master host. |
Check for any errors in the logs (/var/log/hadoop/hdfs/)and restart the NameNode host/process
using the HMC Manage Services tab. |
DataNode Storage |
This host-level alert is triggered if storage capacity is full on the DataNode (90% critical). It checks the DataNode JMX Servlet for the Capacity and Remaining properties. |
Cluster storage is full. |
If cluster still has storage, use Balancer to distribute the data to relatively less-used
datanodes. |
DataNode process |
This host-level alert is triggered if the individual DataNode processes cannot be established to be up and listening on the network for the configured critical threshold, given in seconds. |
DataNode process is down or not responding. |
Check for dead DataNodes in Ambari Web. |
DataNode Web UI |
This host-level alert is triggered if the DataNode Web UI is unreachable. |
The DataNode process is not running. |
Check whether the DataNode process is running. |
NameNode host CPU utilization |
This host-level alert is triggered if CPU utilization of the NameNode exceeds certain thresholds (200% warning, 250% critical). It checks the NameNode JMX Servlet for the SystemCPULoad property. This information is only available if you are running JDK 1.7. |
Unusually high CPU utilization: Can be caused by a very unusual job/query workload, but this is generally the sign of an issue in the daemon. |
Use the top command to determine which processes are consuming excess CPU. |
NameNode Web UI |
This host-level alert is triggered if the NameNode Web UI is unreachable. |
The NameNode process is not running. |
Check whether the NameNode process is running. |
Percent DataNodes with Available Space |
This service-level alert is triggered if the storage if full on a certain percentage of DataNodes (10% warn, 30% critical). It aggregates the result from the check_datanode_storage.php plug-in. |
Cluster storage is full. |
If cluster still has storage, use Balancer to distribute the data to relatively less
used DataNodes. |
Percent DataNodes Available |
This alert is triggered if the number of down DataNodes in the cluster is greater than the configured critical threshold. It uses the check_aggregate plug-in to aggregate the results of Data node process checks. |
DataNodes are down |
Check for dead DataNodes in Ambari Web. |
NameNode RPC latency |
This host-level alert is triggered if the NameNode operations RPC latency exceeds the configured critical threshold. Typically an increase in the RPC processing time increases the RPC queue length, causing the average queue wait time to increase for NameNode operations. |
A job or an application is performing too many NameNode operations. |
Review the job or the application for potential bugs causing it to perform too many NameNode operations. |
NameNode Last Checkpoint |
This alert will trigger if the last time that the NameNode performed a checkpoint was too long ago or if the number of uncommitted transactions is beyond a certain threshold. |
Too much time elapsed since last NameNode checkpoint. |
Set NameNode checkpoint. |
Secondary NameNode Process |
If the Secondary NameNode process cannot be confirmed to be up and listening on the network. This alert is not applicable when NameNode HA is configured. |
The Secondary NameNode is not running. |
Check that the Secondary DataNode process is running. |
NameNode Directory Status |
This alert checks if the NameNode NameDirStatus metric reports a failed directory. |
One or more of the directories are reporting as not healthy. |
Check the NameNode UI for information about unhealthy directories. |
HDFS capacity utilization |
This service-level alert is triggered if the HDFS capacity utilization exceeds the configured critical threshold (80% warn, 90% critical). It checks the NameNode JMX Servlet for the CapacityUsed and CapacityRemaining properties. |
Cluster storage is full. |
Delete unnecessary data. |
DataNode Health Summary |
This service-level alert is triggered if there are unhealthy DataNodes. |
A DataNode is in an unhealthy state. |
Check the NameNode UI for the list of dead DataNodes. |
NameNode HA Alerts
Alert |
Description |
Potential Causes |
Possible Remedies |
---|---|---|---|
JournalNode process |
This host-level alert is triggered if the individual JournalNode process cannot be established to be up and listening on the network for the configured critical threshold, given in seconds. |
The JournalNode process is down or not responding. |
Check if the JournalNode process is dead. |
NameNode High Availability Health |
This service-level alert is triggered if either the Active NameNode or Standby NameNode are not running. |
The Active, Standby or both NameNode processes are down. |
On each host running NameNode, check for any errors in the logs (/var/log/hadoop/hdfs/)
and restart the NameNode host/process using Ambari Web. |
ZooKeeper Failover Controller process |
This alert is triggered if the ZooKeeper Failover Controller process cannot be confirmed to be up and listening on the network. |
The ZKFC process is down or not responding. |
Check if the ZKFC process is running. |
YARN Alerts
Alert |
Description |
Potential Causes |
Possible Remedies |
---|---|---|---|
Percent NodeManagers Available |
This alert is triggered if the number of down NodeManagers in the cluster is greater than the configured critical threshold. It aggregates the results of DataNode process alert checks. |
NodeManagers are down. |
Check for dead NodeManagers. |
ResourceManager Web UI |
This host-level alert is triggered if the ResourceManager Web UI is unreachable. |
The ResourceManager process is not running. |
Check if the ResourceManager process is running. |
ResourceManager RPC latency |
This host-level alert is triggered if the ResourceManager operations RPC latency exceeds the configured critical threshold. Typically an increase in the RPC processing time increases the RPC queue length, causing the average queue wait time to increase for ResourceManager operations. |
A job or an application is performing too many ResourceManager operations. |
Review the job or the application for potential bugs causing it to perform too many ResourceManager operations. |
ResourceManager CPU utilization |
This host-level alert is triggered if CPU utilization of the ResourceManager exceeds certain thresholds (200% warning, 250% critical). It checks the ResourceManager JMX Servlet for the SystemCPULoad property. This information is only available if you are running JDK 1.7. |
Unusually high CPU utilization: Can be caused by a very unusual job/query workload, but this is generally the sign of an issue in the daemon. |
Use the top command to determine which processes are consuming excess CPU. |
NodeManager Web UI |
This host-level alert is triggered if the NodeManager process cannot be established to be up and listening on the network for the configured critical threshold, given in seconds. |
NodeManager process is down or not responding. |
Check if the NodeManager is running. |
NodeManager health |
This host-level alert checks the node health property available from the NodeManager component. |
Node Health Check script reports issues or is not configured. |
Check in the NodeManager logs (/var/log/hadoop/yarn) for health check errors and restart
the NodeManager, and restart if necessary. |
MapReduce2 Alerts
Alert |
Description |
Potential Causes |
Possible Remedies |
---|---|---|---|
HistoryServer Web UI |
This host-level alert is triggered if the HistoryServer Web UI is unreachable. |
The HistoryServer process is not running. |
Check if the HistoryServer process is running. |
HistoryServer RPC latency |
This host-level alert is triggered if the HistoryServer operations RPC latency exceeds the configured critical threshold. Typically an increase in the RPC processing time increases the RPC queue length, causing the average queue wait time to increase for NameNode operations. |
A job or an application is performing too many HistoryServer operations. |
Review the job or the application for potential bugs causing it to perform too many HistoryServer operations. |
HistoryServer CPU utilization |
This host-level alert is triggered if the percent of CPU utilization on the HistoryServer exceeds the configured critical threshold. |
Unusually high CPU utilization: Can be caused by a very unusual job/query workload, but this is generally the sign of an issue in the daemon. |
Use the top command to determine which processes are consuming excess CPU. |
HistoryServer process |
This host-level alert is triggered if the HistoryServer process cannot be established to be up and listening on the network for the configured critical threshold, given in seconds. |
HistoryServer process is down or not responding. |
Check the HistoryServer is running. |
HBase Service Alerts
Alert |
Description |
Potential Causes |
Possible Remedies |
---|---|---|---|
Percent RegionServers live |
This service-level alert is triggered if the configured percentage of Region Server processes cannot be determined to be up and listening on the network for the configured critical threshold. The default setting is 10% to produce a WARN alert and 30% to produce a CRITICAL alert. It aggregates the results of RegionServer process down checks. |
Misconfiguration or less-than-ideal configuration caused the RegionServers to crash. |
Check the dependent services to make sure they are operating correctly. |
HBase Master process |
This alert is triggered if the HBase master processes cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds. |
The HBase master process is down. |
Check the dependent services. |
HBase Master Web UI |
This host-level alert is triggered if the HBase Master Web UI is unreachable. |
The HBase Master process is not running. |
Check if the Master process is running. |
HBase Master CPU utilization |
This host-level alert is triggered if CPU utilization of the HBase Master exceeds certain thresholds (200% warning, 250% critical). It checks the HBase Master JMX Servlet for the SystemCPULoad property. This information is only available if you are running JDK 1.7. |
Unusually high CPU utilization: Can be caused by a very unusual job/query workload, but this is generally the sign of an issue in the daemon. |
Use the top command to determine which processes are consuming excess CPU |
RegionServer process |
This host-level alert is triggered if the RegionServer processes cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds. |
The RegionServer process is down on the host. |
Check for any errors in the logs (/var/log/hbase/) and restart the RegionServer process
using Ambari Web. |
Hive Alerts
Alert |
Description |
Potential Causes |
Possible Remedies |
---|---|---|---|
HiveServer2 Process |
This host-level alert is triggered if the HiveServer cannot be determined to be up and responding to client requests. |
HiveServer2 process is not running. |
Using Ambari Web, check status of HiveServer2 component. Stop and then restart. |
Hive Metastore Process |
This host-level alert is triggered if the Hive Metastore process cannot be determined to be up and listening on the network for the configured critical threshold, given in seconds. |
The Hive Metastore service is down. |
Using Ambari Web, stop the Hive service and then restart it. |
WebHCat Server status |
This host-level alert is triggered if the WebHCat server cannot be determined to be up and responding to client requests. |
The WebHCat server is down. |
Restart the WebHCat server using Ambari Web. |
Oozie Alerts
Alert |
Description |
Potential Causes |
Possible Remedies |
---|---|---|---|
Oozie status |
This host-level alert is triggered if the Oozie server cannot be determined to be up and responding to client requests. |
The Oozie server is down. |
Restart the Oozie service using Ambari Web. |
ZooKeeper Alerts
Alert |
Description |
Potential Causes |
Possible Remedies |
---|---|---|---|
Percent ZooKeeper Servers Available |
This service-level alert is triggered if the configured percentage of ZooKeeper processes cannot be determined to be up and listening on the network for the configured critical threshold, given in seconds. It aggregates the results of Zookeeper process checks. |
The majority of your ZooKeeper servers are down and not responding. |
Check the dependent services to make sure they are operating correctly. |
ZooKeeper Server process |
This host-level alert is triggered if the ZooKeeper server process cannot be determined to be up and listening on the network for the configured critical threshold, given in seconds. |
The ZooKeeper server process is down on the host. |
Check for any errors in the ZooKeeper logs (/var/log/hbase/) and restart the ZooKeeper
process using Ambari Web. |
Ambari Alerts
Alert |
Description |
Potential Causes |
Possible Remedies |
---|---|---|---|
Ambari Agent Disk Usage |
This host-level alert is triggered if the amount of disk space used on a host goes above specific thresholds. The default values are 50% for WARNING and 80% for CRITICAL. |
The host is running out of disk space. |
Check logs and temporary directories for items to remove. |
Installing HDP Using Ambari
This section describes the information and materials you should get ready to install a HDP cluster using Ambari. Ambari provides an end-to-end management and monitoring solution for your HDP cluster. Using the Ambari Web UI and REST APIs, you can deploy, operate, manage configuration changes, and monitor services for all nodes in your cluster from a central point.
Determine Stack Compatibility
Use this table to determine whether your Ambari and HDP stack versions are compatible.
Ambari |
HDP 2.2[1] |
HDP 2.1[2] |
HDP 2.0[3] |
HDP1.3 |
---|---|---|---|---|
2.0.0 |
x |
x |
x |
|
1.7.0 |
x |
x |
x |
x |
1.6.1 |
x |
x |
x |
|
1.6.0 |
x |
x |
x |
|
1.5.1 |
x |
x |
x |
|
1.5.0 |
x |
x |
||
1.4.4.23 |
x |
x |
||
1.4.3.38 |
x |
x |
||
1.4.2.104 |
x |
x |
||
1.4.1.61 |
x |
x |
||
1.4.1.25 |
x |
x |
||
1.2.5.17 |
x |
-
Installing Accumulo, Hue, and Solr services, see Installing HDP Manually.
-
Installing Spark, see Installing Spark.
-
Installing Ranger, see Installing Ranger.
Meet Minimum System Requirements
To run Hadoop, your system must meet the following minimum requirements:
Hardware Recommendations
There is no single hardware requirement set for installing Hadoop.
For more information about hardware components that may affect your installation, see Hardware Recommendations For Apache Hadoop.
Operating Systems Requirements
The following, 64-bit operating systems are supported:
-
Red Hat Enterprise Linux (RHEL) v6.x
-
Red Hat Enterprise Linux (RHEL) v5.x (deprecated)
-
CentOS v6.x
-
CentOS v5.x (deprecated)
-
Oracle Linux v6.x
-
Oracle Linux v5.x (deprecated)
-
SUSE Linux Enterprise Server (SLES) v11, SP1 and SP3
-
Ubuntu Precise v12.04
Browser Requirements
The Ambari Install Wizard runs as a browser-based Web application. You must have a machine capable of running a graphical browser to use this tool. The minimum required browser versions are:
-
Windows (Vista, 7, 8)
-
Internet Explorer 9.0
-
Firefox 18
-
Google Chrome 26
-
-
Mac OS X (10.6 or later)
-
Firefox 18
-
Safari 5
-
Google Chrome 26
-
-
Linux (RHEL, CentOS, SLES, Oracle Linux, Ubuntu)
-
Firefox 18
-
Google Chrome 26
-
On any platform, we recommend updating your browser to the latest, stable version.
Software Requirements
On each of your hosts:
-
yum and rpm (RHEL/CentOS/Oracle Linux)
-
zypper and php_curl (SLES)
-
apt (Ubuntu)
-
scp, curl, unzip, tar, and wget
-
OpenSSL (v1.01, build 16 or later)
-
python v2.6
JDK Requirements
The following Java runtime environments are supported:
-
Oracle JDK 1.7_67 64-bit (default)
-
Oracle JDK 1.6_31 64-bit (DEPRECATED)
-
OpenJDK 7 64-bit (not supported on SLES) To install OpenJDK 7 for RHEL, run the following command on all hosts:
yum install java-1.7.0-openjdk
Database Requirements
Ambari requires a relational database to store information about the cluster configuration and topology. If you install HDP Stack with Hive or Oozie, they also require a relational database. The following table outlines these database requirements:
Component |
Description |
---|---|
Ambari |
By default, will install an instance of PostgreSQL on the Ambari Server host. Optionally, to use an existing instance of PostgreSQL, MySQL or Oracle. For further information, see Using Non-Default Databases - Ambari. |
Hive |
By default (on RHEL/CentOS/Oracle Linux 6), Ambari will install an instance of MySQL on the Hive Metastore host. Otherwise, you need to use an existing instance of PostgreSQL, MySQL or Oracle. See Using Non-Default Databases - Hive for more information. |
Oozie |
By default, Ambari will install an instance of Derby on the Oozie Server host. Optionally, to use an existing instance of PostgreSQL, MySQL or Oracle, see Using Non-Default Databases - Oozie for more information. |
Memory Requirements
The Ambari host should have at least 1 GB RAM, with 500 MB free.
The Ambari Metrics Collector host should have the following memory and disk space
available based on cluster size:
Number of hosts |
Memory Available |
Disk Space |
---|---|---|
1 |
1024 |
10 GB |
10 |
1024 |
20 GB |
50 |
2048 |
50 GB |
100 |
4096 |
100 GB |
300 |
4096 |
100 GB |
500 |
8096 |
200 GB |
1000 |
12288 |
200 GB |
2000 |
16384 |
500 GB |
To check available memory on any host, run
free -m
Package Size and Inode Count Requirements
*Size and Inode values are approximate
Size |
Inodes |
|
---|---|---|
Ambari Server |
100MB |
5,000 |
Ambari Agent |
8MB |
1,000 |
Ambari Metrics Collector |
225MB |
4,000 |
Ambari Metrics Monitor |
1MB |
100 |
Ambari Metrics Hadoop Sink |
8MB |
100 |
After Ambari Server Setup |
N/A |
4,000 |
After Ambari Server Start |
N/A |
500 |
After Ambari Agent Start |
N/A |
200 |
Check the Maximum Open File Descriptors
The recommended maximum number of open file descriptors is 10000, or more.
To check the current value set for the maximum number of open file descriptors, execute
the following shell commands on each host:
ulimit -Sn
ulimit -Hn
Collect Information
Before deploying an HDP cluster, you should collect the following information:
-
The fully qualified domain name (FQDN) of each host in your system. The Ambari install wizard supports using IP addresses. You can use
hostname -f
to check or verify the FQDN of a host. -
A list of components you want to set up on each host.
-
The base directories you want to use as mount points for storing:
-
NameNode data
-
DataNodes data
-
Secondary NameNode data
-
Oozie data
-
YARN data (Hadoop version 2.x)
-
ZooKeeper data, if you install ZooKeeper
-
Various log, pid, and db files, depending on your install type
-
Prepare the Environment
To deploy your Hadoop instance, you need to prepare your deployment environment:
Check Existing Package Versions
During installation, Ambari overwrites current versions of some packages required by Ambari to manage a Hadoop cluster. Package versions other than those that Ambari installs can cause problems running the installer. Remove any package versions that do not match the following ones:
RHEL/CentOS/Oracle Linux 6
Component - Description |
Files and Versions |
---|---|
Ambari Server Database |
postgresql 8.4.13-1.el6_3, postgresql-libs 8.4.13-1.el6_3, postgresql-server 8.4.13-1.el6_3 |
Ambari Agent - Installed on each host in your cluster. Communicates with the Ambari Server to execute commands. |
None |
SLES 11
Component - Description |
Files and Versions |
---|---|
Ambari Server Database |
postgresql 8.3.5-1, postgresql-server 8.3.5-1, postgresql-libs 8.3.5-1 |
Ambari Agent - Installed on each host in your cluster. Communicates with the Ambari Server to execute commands. |
None |
UBUNTU 12
Component - Description |
Files and Versions |
---|---|
Ambari Server Database |
libpq5 postgresql postgresql-9.1 postgresql-client-9.1 postgresql-client-common postgresql-common ssl-cert |
Ambari Agent - Installed on each host in your cluster. Communicates with the Ambari Server to execute commands. |
zlibc_0.9k-4.1_amd64 |
RHEL/CentOS/Oracle Linux 5 (DEPRECATED)
Component - Description |
Files and Versions |
---|---|
Ambari Server Database |
libffi 3.0.5-1.el5, python26 2.6.8-2.el5, python26-libs 2.6.8-2.el5, postgresql 8.4.13-1.el6_3, postgresql-libs 8.4.13-1.el6_3, postgresql-server 8.4.13-1.el6_3 |
Ambari Agent - Installed on each host in your cluster. Communicates with the Ambari Server to execute commands. |
libffi 3.0.5-1.el5, python26 2.6.8-2.el5, python26-libs 2.6.8-2.el5 |
Set Up Password-less SSH
To have Ambari Server automatically install Ambari Agents on all your cluster hosts, you must set up password-less SSH connections between the Ambari Server host and all other hosts in the cluster. The Ambari Server host uses SSH public key authentication to remotely access and install the Ambari Agent.
-
Generate public and private SSH keys on the Ambari Server host.
ssh-keygen
-
Copy the SSH Public Key (id_rsa.pub) to the root account on your target hosts.
.ssh/id_rsa .ssh/id_rsa.pub
-
Add the SSH Public Key to the authorized_keys file on your target hosts.
cat id_rsa.pub >> authorized_keys
-
Depending on your version of SSH, you may need to set permissions on the .ssh directory (to 700) and the authorized_keys file in that directory (to 600) on the target hosts.
chmod 700 ~/.ssh chmod 600 ~/.ssh/authorized_keys
-
From the Ambari Server, make sure you can connect to each host in the cluster using SSH, without having to enter a password.
ssh root@<remote.target.host>
where<remote.target.host>
has the value of each host name in your cluster. -
If the following warning message displays during your first connection:
Are you sure you want to continue connecting (yes/no)?
EnterYes
. -
Retain a copy of the SSH Private Key on the machine from which you will run the web-based Ambari Install Wizard.
Set up Service User Accounts
Each HDP service requires a service user account. The Ambari Install wizard creates new and preserves any existing service user accounts, and uses these accounts when configuring Hadoop services. Service user account creation applies to service user accounts on the local operating system and to LDAP/AD accounts.
For more information about customizing service user accounts for each HDP service, see Defining Service Users and Groups for a HDP 2.x Stack.
Enable NTP on the Cluster and on the Browser Host
The clocks of all the nodes in your cluster and the machine that runs the browser
through which you access the Ambari Web interface must be able to synchronize with
each other.
To check that the NTP service is on, run the following command on each host:
chkconfig --list ntpd
To set the NTP service to start on reboot, run the following command on each host:
chkconfig ntpd on
To turn on the NTP service, run the following command on each host:
service ntpd start
Check DNS
All hosts in your system must be configured for both forward and and reverse DNS.
If you are unable to configure DNS in this way, you should edit the /etc/hosts file on every host in your cluster to contain the IP address and Fully Qualified Domain Name of each of your hosts. The following instructions are provided as an overview and cover a basic network setup for generic Linux hosts. Different versions and flavors of Linux might require slightly different commands and procedures. Please refer to the documentation for the operating system(s) deployed in your environment.
Edit the Host File
-
Using a text editor, open the hosts file on every host in your cluster. For example:
vi /etc/hosts
-
Add a line for each host in your cluster. The line should consist of the IP address and the FQDN. For example:
1.2.3.4
<fully.qualified.domain.name>
Set the Hostname
-
Confirm that the hostname is set by running the following command:
hostname -f
This should return the <fully.qualified.domain.name> you just set.
-
Use the "hostname" command to set the hostname on each host in your cluster. For example:
hostname
<fully.qualified.domain.name>
Edit the Network Configuration File
-
Using a text editor, open the network configuration file on every host and set the desired network configuration for each host. For example:
vi /etc/sysconfig/network
-
Modify the HOSTNAME property to set the fully qualified domain name.
NETWORKING=yes NETWORKING_IPV6=yes HOSTNAME=
<fully.qualified.domain.name>
Configuring iptables
For Ambari to communicate during setup with the hosts it deploys to and manages, certain
ports must be open and available. The easiest way to do this is to temporarily disable
iptables, as follows:chkconfig iptables off
/etc/init.d/iptables stop
You can restart iptables after setup is complete. If the security protocols in your
environment prevent disabling iptables, you can proceed with iptables enabled, if
all required ports are open and available. For more information about required ports,
see Configuring Network Port Numbers.
Ambari checks whether iptables is running during the Ambari Server setup process.
If iptables is running, a warning displays, reminding you to check that required ports
are open and available. The Host Confirm step in the Cluster Install Wizard also issues
a warning for each host that has iptables running.
Disable SELinux and PackageKit and check the umask Value
-
You must temporarily disable SELinux for the Ambari setup to function. On each host in your cluster,
setenforce 0
-
On an installation host running RHEL/CentOS with PackageKit installed, open
/etc/yum/pluginconf.d/refresh-packagekit.conf
using a text editor. Make the following change:enabled=0
-
UMASK (User Mask or User file creation MASK) sets the default permissions or base permissions granted when a new file or folder is created on a Linux machine. Most Linux distros set 022 as the default umask value. A umask value of 022 grants read, write, execute permissions of 755 for new files or folders. A umask value of 027 grants read, write, execute permissions of 750 for new files or folders. Ambari supports a umask value of 022 or 027. For example, to set the umask value to 022, run the following command as root on all hosts,
vi /etc/profile
then, append the following line:umask 022
Using a Local Repository
If your cluster is behind a fire wall that prevents or limits Internet access, you can install Ambari and a Stack using local repositories. This section describes how to:
-
Set up a local repository having:
Obtaining the Repositories
This section describes how to obtain:
Ambari Repositories
If you do not have Internet access for setting up the Ambari repository, use the link appropriate for your OS family to download a tarball that contains the software.
RHEL/CentOS/Oracle Linux 6
wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/ambari-2.0.0-centos6.tar.gz
SLES 11
wget -nv http://public-repo-1.hortonworks.com/ambari/suse11/ambari-2.0.0-suse11.tar.gz
UBUNTU 12
wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu12/ambari-2.0.0-ubuntu12.tar.gz
RHEL/CentOS/ORACLE Linux 5 (DEPRECATED)
wget -nv http://public-repo-1.hortonworks.com/ambari/centos5/ambari-2.0.0-centos5.tar.gz
If you have temporary Internet access for setting up the Ambari repository, use the link appropriate for your OS family to download a repository that contains the software.
RHEL/CentOS/Oracle Linux 6
wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.0.0/ambari.repo
SLES 11
wget -nv http://public-repo-1.hortonworks.com/ambari/suse11/2.x/updates/2.0.0/ambari.repo
UBUNTU 12
wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu12/2.x/updates/2.0.0/ambari.list
RHEL/CentOS/ORACLE Linux 5 (DEPRECATED)
wget -nv http://public-repo-1.hortonworks.com/ambari/centos5/2.x/updates/2.0.0/ambari.repo
HDP Stack Repositories
If you do not have Internet access to set up the Stack repositories, use the link appropriate for your OS family to download a tarball that contains the HDP Stack version you plan to install.
RHEL/CentOS/Oracle Linux 6
wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/HDP-2.2.4.2-centos6-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos6/HDP-UTILS-1.1.0.20-centos6.tar.gz
SLES 11SP3
wget -nv http://public-repo-1.hortonworks.com/HDP/suse11sp3/HDP-2.2.4.2-suse11sp3-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/suse11sp3/HDP-UTILS-1.1.0.20-suse11sp3.tar.gz
UBUNTU 12
wget -nv http://public-repo-1.hortonworks.com/HDP/ubuntu12/HDP-2.2.4.2-ubuntu12-deb.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/ubuntu12/HDP-UTILS-1.1.0.20-ubuntu12.tar.gz
RHEL/CentOS/ORACLE Linux 5 (DEPRECATED)
wget -nv http://public-repo-1.hortonworks.com/HDP/centos5/HDP-2.2.4.2-centos5-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos5/HDP-UTILS-1.1.0.20-centos5.tar.gz
RHEL/CentOS/Oracle Linux 6
wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/HDP-2.1.10.0-centos6-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.19/repos/centos6/HDP-UTILS-1.1.0.19-centos6.tar.gz
SLES 11
wget -nv http://public-repo-1.hortonworks.com/HDP/suse11sp3/HDP-2.1.10.0-suse11sp3-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.19/repos/suse11/HDP-UTILS-1.1.0.19-suse11.tar.gz
UBUNTU 12
wget -nv http://public-repo-1.hortonworks.com/HDP/ubuntu12/HDP-2.1.10.0-ubuntu12-tars-tarball.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.19/repos/ubuntu12/
RHEL/CentOS/ORACLE Linux 5 (DEPRECATED)
wget -nv http://public-repo-1.hortonworks.com/HDP/centos5/HDP-2.1.10.0-centos5-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.19/repos/centos5/HDP-UTILS-1.1.0.17-centos5.tar.gz
RHEL/CentOS/Oracle Linux 6
wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/HDP-2.0.13.0-centos6-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.17/repos/centos6/HDP-UTILS-1.1.0.17-centos6.tar.gz
SLES 11
wget -nv http://public-repo-1.hortonworks.com/HDP/suse11/HDP-2.0.13.0-suse11-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.17/repos/suse11/HDP-UTILS-1.1.0.17-suse11.tar.gz
RHEL/CentOS/ORACLE Linux 5 (DEPRECATED)
wget -nv http://public-repo-1.hortonworks.com/HDP/centos5/HDP-2.0.13.0-centos5-rpm.tar.gz
wget -nv http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.17/repos/centos5/HDP-UTILS-1.1.0.17-centos5.tar.gz
If you have temporary Internet access for setting up the Stack repositories, use the link appropriate for your OS family to download a repository that contains the HDP Stack version you plan to install.
RHEL/CentOS/Oracle Linux 6
wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.2.4.2/hdp.repo
-O /etc/yum.repos.d/HDP.repo
SLES 11SP3
wget -nv http://public-repo-1.hortonworks.com/HDP/suse11sp3/2.x/updates/2.2.4.2/hdp.repo
-O /etc/zypp/repos.d/HDP.repo
UBUNTU 12
wget -nv http://public-repo-1.hortonworks.com/HDP/ubuntu12/2.x/updates/2.2.4.2/hdp.list
-O /etc/apt/sources.list.d/HDP.list
RHEL/CentOS/ORACLE Linux 5 (DEPRECATED)
wget -nv http://public-repo-1.hortonworks.com/HDP/centos5/2.x/updates/2.2.4.2/hdp.repo
-O /etc/yum.repos.d/HDP.repo
RHEL/CentOS/Oracle Linux 6
wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.1.10.0/hdp.repo
-O /etc/yum.repos.d/HDP.repo
SLES 11SP3
wget -nv http://public-repo-1.hortonworks.com/HDP/suse11sp3/2.x/updates/2.1.10.0/hdp.repo
-O /etc/zypp/repos.d/HDP.repo
UBUNTU 12
wget -nv http://public-repo-1.hortonworks.com/HDP/ubuntu12/2.x/updates/2.1.10.0/hdp.list
/etc/apt/sources.list.d/HDP.list
RHEL/CentOS/ORACLE Linux 5 (DEPRECATED)
wget -nv http://public-repo-1.hortonworks.com/HDP/centos5/2.x/updates/2.1.10.0/hdp.repo
-O /etc/yum.repos.d/hdp.repo
RHEL/CentOS/Oracle Linux 6
wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.0.13.0/hdp.repo
-O /etc/yum.repos.d/HDP.repo
SLES 11
wget -nv http://public-repo-1.hortonworks.com/HDP/suse11/2.x/updates/2.0.13.0/hdp.repo
-O /etc/zypp/repos.d/HDP.repo
RHEL/CentOS/ORACLE 5 (DEPRECATED)
wget -nv http://public-repo-1.hortonworks.com/HDP/centos5/2.x/updates/2.0.13.0/hdp.repo
-O /etc/yum.repos.d/HDP.repo
Setting Up a Local Repository
Based on your Internet access, choose one of the following options:
-
No Internet Access
This option involves downloading the repository tarball, moving the tarball to the selected mirror server in your cluster, and extracting to create the repository.
-
Temporary Internet Access
This option involves using your temporary Internet access to sync (using reposync) the software packages to your selected mirror server and creating the repository.
Both options proceed in a similar, straightforward way. Setting up for each option presents some key differences, as described in the following sections:
Getting Started Setting Up a Local Repository
To get started setting up your local repository, complete the following prerequisites:
-
Select an existing server in, or accessible to the cluster, that runs a supported operating system.
-
Enable network access from all hosts in your cluster to the mirror server.
-
Ensure the mirror server has a package manager installed such as yum (RHEL / CentOS / Oracle Linux), zypper (SLES), or apt-get (Ubuntu).
-
Optional: If your repository has temporary Internet access, and you are using RHEL/CentOS/Oracle Linux as your OS, install yum utilities:
yum install yum-utils createrepo
-
Create an HTTP server.
-
On the mirror server, install an HTTP server (such as Apache httpd) using the instructions provided here .
-
Activate this web server.
-
Ensure that any firewall settings allow inbound HTTP access from your cluster nodes to your mirror server.
-
-
On your mirror server, create a directory for your web server.
-
For example, from a shell window, type:
-
For RHEL/CentOS/Oracle Linux:
mkdir -p /var/www/html/
-
For SLES:
mkdir -p /srv/www/htdocs/rpms
-
For Ubuntu:
mkdir -p /var/www/html/
-
-
If you are using a symlink, enable the
followsymlinks
on your web server.
-
Setting Up a Local Repository with No Internet Access
After completing the Getting Started Setting up a Local Repository procedure, finish setting up your repository by completing the following steps:
-
Obtain the tarball for the repository you would like to create. For options, see Obtaining the Repositories.
-
Copy the repository tarballs to the web server directory and untar.
-
Browse to the web server directory you created.
-
For RHEL/CentOS/Oracle Linux:
cd /var/www/html/
-
For SLES:
cd /srv/www/htdocs/rpms
-
For Ubuntu:
cd /var/www/html/
-
-
Untar the repository tarballs to the following locations: where <web.server>, <web.server.directory>, <OS>, <version>, and <latest.version> represent the name, home directory, operating system type, version, and most recent release version, respectively.
Untar Locations for a Local Repository - No Internet Access
Repository Content
Repository Location
Ambari Repository
Untar under <web.server.directory>
HDP Stack Repositories
Create directory and untar under <web.server.directory>/hdp
-
-
Confirm you can browse to the newly created local repositories.
URLs for a Local Repository - No Internet Access
Repository
URL
Ambari Base URL
http://<web.server>/ambari/<OS>/2.x/updates/2.0.0
HDP Base URL
http://<web.server>/hdp/HDP/<OS>/2.x/updates/<latest.version>
HDP-UTILS Base URL
http://<web.server>/hdp/HDP-UTILS-<version>/repos/<OS>
where <web.server> = FQDN of the web server host, and <OS> is centos5, centos6, sles11, or ubuntu12.
-
Optional: If you have multiple repositories configured in your environment, deploy the following plug-in on all the nodes in your cluster.
-
Install the plug-in.
-
For RHEL and CentOS 6:
yum install yum-plugin-priorities
-
For RHEL and CentOS 5:
yum install yum-priorities
-
-
Edit the
/etc/yum/pluginconf.d/priorities.conf
file to add the following:[main]
enabled=1
gpgcheck=0
-
Setting up a Local Repository With Temporary Internet Access
After completing the Getting Started Setting up a Local Repository procedure, finish setting up your repository by completing the following steps:
-
Put the repository configuration files for Ambari and the Stack in place on the host. For options, see Obtaining the Repositories.
-
Confirm availability of the repositories.
-
For RHEL/CentOS/Oracle Linux:
yum repolist
-
For SLES:
zypper repos
-
For Ubuntu:
dpkg
-list
-
-
Synchronize the repository contents to your mirror server.
-
Browse to the web server directory:
-
For RHEL/CentOS/Oracle Linux:
cd /var/www/html
-
For SLES:
cd /srv/www/htdocs/rpms
-
For Ubuntu:
cd var/www/html
-
-
For Ambari, create
ambari
directory and reposync.mkdir -p ambari/<OS>
cd ambari/<OS>
reposync -r Updates-ambari-2.0.0where <OS> is centos5, centos6, sles11, or ubuntu12.
-
For HDP Stack Repositories, create
hdp
directory and reposync.mkdir -p hdp/<OS>
cd hdp/<OS>
reposync -r HDP-<latest.version>
reposync -r HDP-UTILS-<version>
-
-
Generate the repository metadata.
-
For Ambari:
createrepo <web.server.directory>/ambari/<OS>/Updates-ambari-2.0.0
-
For HDP Stack Repositories:
createrepo <web.server.directory>/hdp/<OS>/HDP-<latest.version>
createrepo <web.server.directory>/hdp/<OS>/HDP-UTILS-<version>
-
-
Confirm that you can browse to the newly created repository.
URLs for the New Repository
Repository
URL
Ambari Base URL
http://<web.server>/ambari/<OS>/Updates-ambari-2.0.0
HDP Base URL
http://<web.server>/hdp/<OS>/HDP-<latest.version>
HDP-UTILS Base URL
http://<web.server>/hdp/<OS>/HDP-UTILS-<version>
where <web.server> = FQDN of the web server host, and <OS> is centos5, centos6, sles11, or ubuntu12.
-
Optional. If you have multiple repositories configured in your environment, deploy the following plug-in on all the nodes in your cluster.
-
Install the plug-in.
-
For RHEL and CentOS 6:
yum install yum-plugin-priorities
-
-
Edit the
/etc/yum/pluginconf.d/priorities.conf
file to add the following:[main]
enabled=1
gpgcheck=0
-
Preparing The Ambari Repository Configuration File
-
Download the
ambari.repo
file from the mirror server you created in the preceding sections or from the public repository.-
From your mirror server:
http://<web.server>/ambari/<OS>/2.x/updates/2.0.0/ambari.repo
-
From the public repository:
http://public-repo-1.hortonworks.com/ambari/<OS>/2.x/updates/2.0.0/ambari.repo
where <web.server> = FQDN of the web server host, and <OS> is CENTOS6, SLES11, or UBUNTU12.
-
-
Edit the ambari.repo file using the Ambari repository Base URL obtained when setting up your local repository. Refer to step 3 in Setting Up a Local Repository with No Internet Access, or step 5 in Setting Up a Local Repository with Temporary Internet Access, if necessary.
Base URL for a Local Repository
Repository
URL
Ambari Base URL
http://<web.server>/ambari/<OS>/2.x/updates/2.0.0
where <web.server> = FQDN of the web server host, and <OS> is CENTOS6, SLES11, or UBUNTU12.
-
If this an Ambari updates release, disable the GA repository definition.
[ambari-2.x]
name=Ambari 2.x
baseurl=http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.0.0
gpgcheck=1
gpgkey=http://public-repo-1.hortonworks.com/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=0
priority=1 -
Place the ambari.repo file on the machine you plan to use for the Ambari Server.
-
For RHEL/CentOS/Oracle Linux:
/etc/yum.repos.d/ambari.repo
-
For SLES:
/etc/zypp/repos.d/ambari.repo
-
For Ubuntu:
/etc/apt/sources.list.d/ambari.list
-
Edit the
/etc/yum/pluginconf.d/priorities.conf
file to add the following:[main]
enabled=1
gpgcheck=0
-
-
Proceed to Installing Ambari Server to install and setup Ambari Server.
Download the Ambari Repo
Follow the instructions in the section for the operating system that runs your installation host.
Use a command line editor to perform each instruction.
RHEL/CentOS/Oracle Linux 6
On a server host that has Internet access, use a command line editor to perform the following steps:
-
Log in to your host as
root
. You maysudo
assu
if your environment requires such access. For example, type:<username>
ssh
<hostname.FQDN>sudo su -
where <username> is your user name and <hostname.FQDN> is the fully qualified domain name of your server host. -
Download the Ambari repository file to a directory on your installation host.
wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.0.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
-
Confirm that the repository is configured by checking the repo list.
yum repolist
You should see values similar to the following for Ambari repositories in the list.Version values vary, depending on the installation.
repo id
repo name
status
AMBARI.2.0.0-.x
Ambari 2.x
5
base
CentOS-6 - Base
6,518
extras
CentOS-6 - Extras
15
updates
CentOS-6 - Updates
209
-
Install the Ambari bits. This also installs the default PostgreSQL Ambari database.
yum install ambari-server
-
Enter
y
when prompted to to confirm transaction and dependency checks.A successful installation displays output similar to the following:
Installing : postgresql-libs-8.4.20-1.el6_5.x86_64 1/4
Installing : postgresql-8.4.20-1.el6_5.x86_64 2/4
Installing : postgresql-server-8.4.20-1.el6_5.x86_64 3/4
Installing : ambari-server-2.0.0-147.noarch 4/4
Verifying : postgresql-server-8.4.20-1.el6_5.x86_64 1/4
Verifying : postgresql-libs-8.4.20-1.el6_5.x86_64 2/4
Verifying : ambari-server-2.0.0-147.noarch 3/4
Verifying : postgresql-8.4.20-1.el6_5.x86_64 4/4
Installed : ambari-server.noarch 0:1.7.0-135
Dependency Installed:
postgresql.x86_64 0:8.4.20-1.el6_5
postgresql-libs.x86_64 0:8.4.20-1.el6_5
postgresql-server.x86_64 0:8.4.20-1.el6_5
Complete!
SLES 11
On a server host that has Internet access, use a command line editor to perform the following steps:
-
Log in to your host as
root
. You maysudo
assu
if your environment requires such access. For example, type: -
<username>
ssh
<hostname.FQDN>sudo su -
where <username> is your user name and <hostname.FQDN> is the fully qualified domain name of your server host. -
Download the Ambari repository file to a directory on your installation host.
wget -nv http://public-repo-1.hortonworks.com/ambari/suse11/2.x/updates/2.0.0/ambari.repo -O /etc/zypp/repos.d/ambari.repo
-
Confirm the downloaded repository is configured by checking the repo list.
zypper repos
You should see the Ambari repositories in the list.Version values vary, depending on the installation.
Alias
Name
Enabled
Refresh
AMBARI.2.0.0-1.x
Ambari 2.x
Yes
No
http-demeter.uni-regensburg.de-c997c8f9
SUSE-Linux-Enterprise-Software-Development-Kit-11-SP1 11.1.1-1.57
Yes
Yes
opensuse
OpenSuse
Yes
Yes
-
Install the Ambari bits. This also installs PostgreSQL.
zypper install ambari-server
-
Enter
y
when prompted to to confirm transaction and dependency checks.A successful installation displays output similar to the following:
Retrieving package postgresql-libs-8.3.5-1.12.x86_64 (1/4), 172.0 KiB (571.0 KiB unpacked)
Retrieving: postgresql-libs-8.3.5-1.12.x86_64.rpm [done (47.3 KiB/s)]
Installing: postgresql-libs-8.3.5-1.12 [done]
Retrieving package postgresql-8.3.5-1.12.x86_64 (2/4), 1.0 MiB (4.2 MiB unpacked)
Retrieving: postgresql-8.3.5-1.12.x86_64.rpm [done (148.8 KiB/s)]
Installing: postgresql-8.3.5-1.12 [done]
Retrieving package postgresql-server-8.3.5-1.12.x86_64 (3/4), 3.0 MiB (12.6 MiB unpacked)
Retrieving: postgresql-server-8.3.5-1.12.x86_64.rpm [done (452.5 KiB/s)]
Installing: postgresql-server-8.3.5-1.12 [done]
Updating etc/sysconfig/postgresql...
Retrieving package ambari-server-1.7.0-135.noarch (4/4), 99.0 MiB (126.3 MiB unpacked)
Retrieving: ambari-server-1.7.0-135.noarch.rpm [done (3.0 MiB/s)]
Installing: ambari-server-1.7.0-135 [done]
ambari-server 0:off 1:off 2:off 3:on 4:off 5:on 6:off
UBUNTU 12
On a server host that has Internet access, use a command line editor to perform the following steps:
-
Log in to your host as
root
. You maysudo
assu
if your environment requires such access. For example, type: -
<username>
ssh
<hostname.FQDN>sudo su -
where <username> is your user name and <hostname.FQDN> is the fully qualified domain name of your server host. -
Download the Ambari repository file to a directory on your installation host.
wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu12/2.x/updates/2.0.0/ambari.list -O /etc/apt/sources.list.d/ambari.list
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
apt-get update -
Confirm that Ambari packages downloaded successfully by checking the package name list.
apt-cache pkgnames
You should see the Ambari packages in the list.Version values vary, depending on the installation.
Alias
Name
AMBARI-dev-2.x
Ambari 2.x
-
Install the Ambari bits. This also installs PostgreSQL.
apt-get install ambari-server
RHEL/CentOS/ORACLE Linux 5 (DEPRECATED)
On a server host that has Internet access, use a command line editor to perform the following steps:
-
Log in to your host as
root
. You maysudo
assu
if your environment requires such access. For example, type: -
<username>
ssh
<hostname.FQDN>sudo su -
where <username> is your user name and <hostname.FQDN> is the fully qualified domain name of your server host. -
Download the Ambari repository file to a directory on your installation host.
wget -nv http://public-repo-1.hortonworks.com/ambari/centos5/2.x/updates/2.0.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
-
Confirm the repository is configured by checking the repo list.
yum repolist
You should see the Ambari repositories in the list.AMBARI.2.0.0-1.x | 951 B 00:00
AMBARI.2.0.0-1.x/primary | 1.6 kB 00:00
AMBARI.2.0.0-1.x 5/5
epel | 3.7 kB 00:00
epel/primary_db | 3.9 MB 00:01repo Id
repo Name
status
AMBARI.2.2.0-1.x
Ambari 2.x
5
base
CentOS-5 - Base
3,667
epel
Extra Packages for Enterprise Linux 5 - x86_64
7,614
puppet
Puppet
433
updates
CentOS-5 - Updates
118
-
Install the Ambari bits. This also installs PostgreSQL.
yum install ambari-server
Set Up the Ambari Server
The ambari-server
command manages the setup process. Run the following command on the Ambari server host:
You may append Setup Options to the command.
ambari-server setup
Respond to the following prompts:
-
If you have not temporarily disabled SELinux, you may get a warning. Accept the default (
y
), and continue. -
By default, Ambari Server runs under
root
. Accept the default (n) at theCustomize user account for ambari-server daemon
prompt, to proceed asroot
. If you want to create a different user to run the Ambari Server, or to assign a previously created user, selecty
at theCustomize user account for ambari-server daemon
prompt, then provide a user name. -
If you have not temporarily disabled
iptables
you may get a warning. Entery
to continue. -
Select a JDK version to download. Enter 1 to download Oracle JDK 1.7.
-
Accept the Oracle JDK license when prompted. You must accept this license to download the necessary JDK from Oracle. The JDK is installed during the deploy phase.
-
Select
n
atEnter advanced database configuration
to use the default, embedded PostgreSQL database for Ambari. The default PostgreSQL database name isambari
. The default user name and password areambari/bigdata
. Otherwise, to use an existing PostgreSQL, MySQL or Oracle database with Ambari, selecty
.-
If you are using an existing PostgreSQL, MySQL, or Oracle database instance, use one of the following prompts:
-
To use an existing Oracle 11g r2 instance, and select your own database name, user name, and password for that database, enter
2
.Select the database you want to use and provide any information requested at the prompts, including host name, port, Service Name or SID, user name, and password.
-
To use an existing MySQL 5.x database, and select your own database name, user name, and password for that database, enter
3
.Select the database you want to use and provide any information requested at the prompts, including host name, port, database name, user name, and password.
-
To use an existing PostgreSQL 9.x database, and select your own database name, user name, and password for that database, enter
4
.Select the database you want to use and provide any information requested at the prompts, including host name, port, database name, user name, and password.
-
-
At Proceed with configuring remote database connection properties [y/n] choose
y
. -
Setup completes.
Setup Options
The following table describes options frequently used for Ambari Server setup.
Option |
Description |
---|---|
-j (or --java-home) |
Specifies the JAVA_HOME path to use on the Ambari Server and all hosts in the cluster.
By default when you do not specify this option, Ambari Server setup downloads the
Oracle JDK 1.7 binary and accompanying Java Cryptography Extension (JCE) Policy Files
to /var/lib/ambari-server/resources. Ambari Server then installs the JDK to /usr/jdk64. |
--jdbc-driver |
Should be the path to the JDBC driver JAR file. Use this option to specify the location of the JDBC driver JAR and to make that JAR available to Ambari Server for distribution to cluster hosts during configuration. Use this option with the --jdbc-db option to specify the database type. |
--jdbc-db |
Specifies the database type. Valid values are: [postgres | mysql | oracle] Use this option with the --jdbc-driver option to specify the location of the JDBC driver JAR file. |
-s (or --silent) |
Setup runs silently. Accepts all default prompt values. |
-v (or --verbose) |
Prints verbose info and warning messages to the console during Setup. |
-g (or --debug) |
Start Ambari Server in debug mode |
Next Steps
Start the Ambari Server
-
Run the following command on the Ambari Server host:
ambari-server start
-
To check the Ambari Server processes:
ambari-server status
-
To stop the Ambari Server:
ambari-server stop
Next Steps
Install, configure and deploy an HDP cluster
Install, Configure and Deploy a HDP Cluster
This section describes how to use the Ambari install wizard running in your browser to install, configure, and deploy your cluster.
Log In to Apache Ambari
After starting the Ambari service, open Ambari Web using a web browser.
-
Point your browser to
http://
<your.ambari.server>:8080
, -
Log in to the Ambari Server using the default user name/password: admin/admin. You can change these credentials later.
For a new cluster, the Ambari install wizard displays a Welcome page from which you launch the Ambari Install wizard.
Launching the Ambari Install Wizard
From the Ambari Welcome page, choose Launch Install Wizard.

Name Your Cluster
-
In
Name your cluster
, type a name for the cluster you want to create. Use no white spaces or special characters in the name. -
Choose
Next
.
Select Stack
The Service Stack (the Stack) is a coordinated and tested set of HDP components. Use a radio button to select the Stack version you want to install. To install an HDP 2x stack, select the HDP 2.2, HDP 2.1, or HDP 2.0 radio button.

Expand Advanced Repository Options to select the Base URL of a repository from which Stack software packages download. Ambari sets the default Base URL for each repository, depending on the Internet connectivity available to the Ambari server host, as follows:
-
For an Ambari Server host having Internet connectivity, Ambari sets the repository Base URL for the latest patch release for the HDP Stack version. For an Ambari Server having NO Internet connectivity, the repository Base URL defaults to the latest patch release version available at the time of Ambari release.
-
You can override the repository Base URL for the HDP Stack with an earlier patch release if you want to install a specific patch release for a given HDP Stack version. For example, the HDP 2.1 Stack will default to the HDP 2.1 Stack patch release 7, or HDP-2.1.7. If you want to install HDP 2.1 Stack patch release 2, or HDP-2.1.2 instead, obtain the Base URL from the HDP Stack documentation, then enter that location in Base URL.
-
If you are using a local repository, see Using a Local Repository for information about configuring a local repository location, then enter that location as the Base URL instead of the default, public-hosted HDP Stack repositories.

Operating Systems mapped to each OS Family
OS Family |
Operating Systems |
---|---|
redhat6 |
Red Hat 6, CentOS 6, Oracle Linux 6 |
suse11 |
SUSE Linux Enterprise Server 11 |
ubuntu12 |
Ubuntu Precise 12.04 |
redhat5 |
Red Hat 5, CentOS 5, Oracle Linux 5 |
Install Options
In order to build up the cluster, the install wizard prompts you for general information about how you want to set it up. You need to supply the FQDN of each of your hosts. The wizard also needs to access the private key file you created in Set Up Password-less SSH. Using the host names and key file information, the wizard can locate, access, and interact securely with all hosts in the cluster.
-
Use the
Target Hosts
text box to enter your list of host names, one per line. You can use ranges inside brackets to indicate larger sets of hosts. For example, for host01.domain through host10.domain usehost[01-10].domain
-
If you want to let Ambari automatically install the Ambari Agent on all your hosts using SSH, select
Provide your SSH Private Key
and either use theChoose File
button in theHost Registration Information
section to find the private key file that matches the public key you installed earlier on all your hosts or cut and paste the key into the text box manually.Fill in the user name for the SSH key you have selected. If you do not want to use
root
, you must provide the user name for an account that can executesudo
without entering a password. -
If you do not want Ambari to automatically install the Ambari Agents, select
Perform manual registration
. For further information, see Installing Ambari Agents Manually. -
Choose
Register and Confirm
to continue.
Confirm Hosts
Confirm Hosts
prompts you to confirm that Ambari has located the correct hosts for your cluster
and to check those hosts to make sure they have the correct directories, packages,
and processes required to continue the install.
If any hosts were selected in error, you can remove them by selecting the appropriate
checkboxes and clicking the grey Remove Selected
button. To remove a single host, click the small white Remove
button in the Action column.
At the bottom of the screen, you may notice a yellow box that indicates some warnings
were encountered during the check process. For example, your host may have already
had a copy of wget
or curl
. Choose Click here to see the warnings
to see a list of what was checked and what caused the warning. The warnings page also
provides access to a python script that can help you clear any issues you may encounter
and let you run Rerun Checks
.
When you are satisfied with the list of hosts, choose Next
.
Choose Services
Based on the Stack chosen during Select Stack, you are presented with the choice of Services to install into the cluster. HDP Stack comprises many services. You may choose to install any other available services now, or to add services later. The install wizard selects all available services for installation by default.
-
Choose
none
to clear all selections, or chooseall
to select all listed services. -
Choose or clear individual checkboxes to define a set of services to install now.
-
After selecting the services to install now, choose
Next
.
Assign Masters
The Ambari install wizard assigns the master components for selected services to appropriate hosts in your cluster and displays the assignments in Assign Masters. The left column shows services and current hosts. The right column shows current master component assignments by host, indicating the number of CPU cores and amount of RAM installed on each host.
-
To change the host assignment for a service, select a host name from the drop-down menu for that service.
-
To remove a ZooKeeper instance, click the green minus icon next to the host address you want to remove.
-
When you are satisfied with the assignments, choose
Next
.
Assign Slaves and Clients
The Ambari installation wizard assigns the slave components (DataNodes, NodeManagers, and RegionServers) to appropriate hosts in your cluster. It also attempts to select hosts for installing the appropriate set of clients.
-
Use all or none to select all of the hosts in the column or none of the hosts, respectively.
If a host has a red asterisk next to it, that host is also running one or more master components. Hover your mouse over the asterisk to see which master components are on that host.
-
Fine-tune your selections by using the checkboxes next to specific hosts.
-
When you are satisfied with your assignments, choose
Next
.
Customize Services
Customize Services
presents you with a set of tabs that let you manage configuration settings for HDP
components. The wizard sets reasonable defaults for each of the options here, but
you can use this set of tabs to tweak those settings. You are strongly encouraged
to do so, as your requirements may be slightly different. Pay particular attention
to the directories suggested by the installer.
Hover your cursor over each of the properties to see a brief description of what it does. The number of tabs you see is based on the type of installation you have decided to do. A typical installation has at least ten groups of configuration properties and other related options, such as database settings for Hive/HCat and Oozie, admin name/password, and alert email for Nagios.
The install wizard sets reasonable defaults for all properties. You must provide database
passwords for the Hive, Nagios, and Oozie services, the Master Secret for Knox, and
a valid email address to which system alerts will be sent. Select each service that
displays a number highlighted red. Then, fill in the required field on the Service
Config tab. Repeat this until the red flags disappear.
For example, Choose Hive. Expand the Hive Metastore section, if necessary. In Database
Password, provide a password, then retype to confirm it, in the fields marked red
and "This is required."
For more information about customizing specific services for a particular HDP Stack,
see Customizing HDP Services.
After you complete Customizing Services, choose Next
.
Review
The assignments you have made are displayed. Check to make sure everything is correct. If you need to make changes, use the left navigation bar to return to the appropriate screen.
To print your information for later reference, choose Print
.
When you are satisfied with your choices, choose Deploy
.
Install, Start and Test
The progress of the install displays on the screen. Ambari installs, starts, and runs a simple test on each component. Overall status of the process displays in progress bar at the top of the screen and host-by-host status displays in the main section. Do not refresh your browser during this process. Refreshing the browser may interrupt the progress indicators.
To see specific information on what tasks have been completed per host, click the
link in the Message
column for the appropriate host. In the Tasks
pop-up, click the individual task to see the related log files. You can select filter
conditions by using the Show
drop-down list. To see a larger version of the log contents, click the Open
icon or to copy the contents to the clipboard, use the Copy
icon.
When Successfully installed and started the services
appears, choose Next
.
Complete
The Summary page provides you a summary list of the accomplished tasks. Choose Complete
. Ambari Web GUI displays.
Upgrading Ambari
Ambari and the HDP Stack being managed by Ambari can be upgraded independently. This guide provides information on:
Ambari 2.0 Upgrade Guide
Upgrading to Ambari 2.0
Use this procedure to upgrade Ambari 1.4.1 through 1.7.0 to Ambari 2.0.0. If your current Ambari version is 1.4.1 or below, you must upgrade the Ambari Server version to 1.7.0 before upgrading to version 2.0.0. Upgrading Ambari version does not change the underlying HDP Stack being managed by Ambari.
Before Upgrading Ambari to 2.0.0, make sure that you perform the following actions:
-
You must have root, administrative, or root-equivalent authorization on the Ambari server host and all servers in the cluster.
-
You must know the location of the Nagios server before you begin the upgrade process.
-
You must know the location of the Ganglia server before you begin the upgrade process.
-
You must backup the Ambari Server database.
-
You must make a safe copy of the Ambari Server configuration file found at
/etc/ambari-server/conf/ambari.properties
. -
Plan to remove Nagios and Ganglia from your cluster and replace with Ambari Alerts and Metrics. For more information, see Planning for Ambari Alerts and Metrics in Ambari 2.0.
-
If you have a Kerberos-enabled cluster, you must review Upgrading Ambari with Kerberos-Enabled Cluster and be prepared to perform post-upgrade steps required.
-
If you are using Ambari with Oracle, you must create an Ambari user in the Oracle database and grant that user all required permissions. Specifically, you must alter the Ambari database user and grant the SEQUENCE permission. For more information about creating users and granting required user permissions, see Using Ambari with Oracle.
-
If you plan to upgrade your HDP Stack, back up the configuration properties for your current Hadoop services. For more information about upgrading the Stack and locating the configuration files for your current services, see one of the following topics:
-
Stop the Nagios and Ganglia services. In
Ambari Web
:-
Browse to
Services
and select the Nagios service. -
Use
Service Actions
to stop the Nagios service. -
Wait for the Nagios service to stop.
-
Browse to
Services
and select the Ganglia service. -
Use
Service Actions
to stop the Ganglia service. -
Wait for the Ganglia service to stop.
-
-
Stop the Ambari Server. On the Ambari Server host,
ambari-server stop
-
Stop all Ambari Agents. On each Ambari Agent host,
ambari-agent stop
-
Fetch the new Ambari repo and replace the old repository file with the new repository file on all hosts in your cluster.
Select the repository appropriate for your environment from the following list:
-
For RHEL/CentOS 6/Oracle Linux 6:
wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.0.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
-
For SLES 11:
wget -nv http://public-repo-1.hortonworks.com/ambari/suse11/2.x/updates/2.0.0/ambari.repo -O /etc/zypp/repos.d/ambari.repo
-
For Ubuntu 12:
wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu12/2.x/updates/2.0.0/ambari.list -O /etc/apt/sources/list.d/ambari.list
-
For RHEL/CentOS 5/Oracle Linux 5: (DEPRECATED)
wget -nv http://public-repo-1.hortonworks.com/ambari/centos5/2.x/updates/2.0.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
-
-
Upgrade Ambari Server. On the Ambari Server host:
-
For RHEL/CentOS/Oracle Linux:
yum clean all yum upgrade ambari-server ambari-log4j
-
For SLES:
zypper clean zypper up ambari-server ambari-log4j
-
For Ubuntu:
apt-get clean all apt-get install ambari-server ambari-log4j
-
-
Check for upgrade success by noting progress during the Ambari server installation process you started in Step 5.
-
As the process runs, the console displays output similar, although not identical, to the following:
Setting up Upgrade Process
Resolving Dependencies
--> Running transaction check
---> Package ambari-log4j.noarch 0:1.7.0.169-1 will be updated ...
---> Package ambari-log4j.noarch 0:2.0.0.1129-1 will be an update ...
---> Package ambari-server.noarch 0:1.7.0-169 will be updated ...
---> Package ambari-log4j.noarch 0:2.0.0.1129 will be an update ... -
If the upgrade fails, the console displays output similar to the following:
Setting up Upgrade Process
No Packages marked for Update -
A successful upgrade displays the following output:
Updated: ambari-log4j.noarch 0:2.0.0.111-1 ambari-server.noarch 0:2.0.0-111 Complete!
-
-
On the Ambari Server host: If ambari-agent is also installed on this host, first run "yum upgrade ambari-agent" (or equivalent in other OS'es) Now, upgrade the server database schema by running,
ambari-server upgrade
-
Upgrade the Ambari Agent on each host. On each Ambari Agent host:
-
For RHEL/CentOS/Oracle Linux:
yum upgrade ambari-agent ambari-log4j
-
For SLES:
zypper up ambari-agent ambari-log4j
-
For Ubuntu:
apt-get update apt-get install ambari-agent ambari-log4j
-
-
After the upgrade process completes, check each host to make sure the new 2.0.0 files have been installed:
rpm -qa | grep ambari
-
Start the Ambari Server. On the Ambari Server host:
ambari-server start
-
Start the Ambari Agents on all hosts. On each Ambari Agent host:
ambari-agent start
-
Open Ambari Web.
Point your browser to http://<your.ambari.server>:8080
where <your.ambari.server> is the name of your ambari server host. For example, c6401.ambari.apache.org. -
Log in, using the Ambari administrator credentials that you have set up.
For example, the default name/password is
admin/admin
. -
If you have customized logging properties, you will see a Restart indicator next to each service name after upgrading to Ambari 2.0.0.
To preserve any custom logging properties after upgrading, for each service:
-
Replace default logging properties with your custom logging properties, using
Service Configs > Custom log4j.properties
. -
Restart all components in any services for which you have customized logging properties.
-
-
Review the HDP-UTILS repository Base URL setting in Ambari.
If you are upgrading from Ambari 1.6.1 or earlier, the HDP-UTILS repository Base URL is no longer set in the
ambari.repo
file.- If using HDP 2.2 Stack:
-
Browse to
Ambari Web > Admin > Stack and Versions
. -
Click on the
Versions
tab. -
You will see the current installed HDP Stack version displayed.
-
Click the
Edit
repositories icon in the upper-right of the version display and confirm the value of the HDP-UTILS repository Base URL is correct for your environment. -
If you are using a local repository for HDP-UTILS, be sure to confirm the Base URL is correct for your locally hosted HDP-UTILS repository.
- If using HDP 2.0 or 2.1 Stack:
-
Browse to
Ambari Web > Admin > Stack and Versions
. -
Under the Services table, the current Base URL settings are displayed.
-
Confirm the value of the HDP-UTILS repository Base URL is correct for your environment or click the
Edit
button to modify the HDP-UTILS Base URL. -
If you are using a local repository for HDP-UTILS, be sure to confirm the Base URL is correct for your locally hosted HDP-UTILS repository.
-
-
If using HDP 2.2 Stack, you must get the cluster hosts to advertise the "current version". This can be done by restarting a master or slave component (such as a DataNode) on each host to have the host advertise it's version so Ambari can record the version. For example, in Ambari Web, navigate to the Hosts page and select any Host that has the DataNode component, then restart that DataNode component on that single host.
-
If you have configured Ambari to authenticate against an external LDAP or Active Directory, review your Ambari LDAP authentication settings. You must re-run "ambari-server setup-ldap”. For more information, see Set Up LDAP or Active Directory Authentication.
-
If you have configured your cluster for Hive or Oozie with an external database (Oracle, MySQL or PostgreSQL), you must re-run “ambari-server setup --jdbc-db and --jdbc-driver” to get the JDBC driver JAR file in place. For more information, see Using Non-Default Databases - Hive and Using Non-Default Databases - Oozie.
-
Adjust your cluster for Ambari Alerts and Metrics. For more information, see Planning for Ambari Alerts and Metrics in Ambari 2.0.
-
Adjust your cluster for Kerberos (if already enabled). For more information, see Upgrading Ambari with Kerberos-Enabled Cluster.
Planning for Ambari Alerts and Metrics in Ambari 2.0
As part of Ambari 2.0, Ambari includes built-in systems for alerting and metrics collection. Therefore, when upgrading to Ambari 2.0, the legacy Nagios and Ganglia services must be removed and replaced with the new systems.
Moving from Nagios to Ambari Alerts
After upgrading to Ambari 2.0, the Nagios service will be removed from the cluster. The Nagios server and packages will remain on the existing installed host but Nagios itself is removed from Ambari management.
The Ambari Alerts system is configured automatically to replace Nagios but you must:
-
Configure email notifications in Ambari to handle dispatch of alerts. Browse to
Ambari Web > Alerts
. -
In the Actions menu, select
Manage Notifications
. -
Click to
Create a new Notification
. Enter information about the SMTP host, port to and from email addresses and select the Alerts to receive notifications. -
Click
Save
.
For more information Ambari Alerts, see Managing Alerts in the Ambari User’s Guide.
Moving from Ganglia to Ambari Metrics
After upgrading to Ambari 2.0, the Ganglia service stays intact in cluster. You must perform the following steps to remove Ganglia from the cluster and to move to the new Ambari Metrics system.
-
Stop Ganglia service via Ambari Web.
-
Using the Ambari REST API, remove the Ganglia service by executing the following:
curl -u <admin_user_name>:<admin_password> -H 'X-Requested-By:ambari' -X DELETE 'http://<ambari_server_host>:8080/api/v1/clusters/<cluster_name>/services/GANGLIA'
-
Refresh Ambari Web and make sure that Ganglia service is no longer visible.
-
In the Actions menu on the left beneath the list of Services, use the "Add Service" wizard to add
Ambari Metrics
to the cluster. -
This will install an
Ambari Metrics Collector
into the cluster, and anAmbari Metrics Monitor
on each host. -
Pay careful attention to following service configurations:
Section
Property
Description
Default Value
Advanced ams-hbase-site
hbase.rootdir
Ambari Metrics service uses HBase as default storage backend. Set the rootdir for HBase to either local filesystem path if using Ambari Metrics in embedded mode or to a HDFS dir. For example: hdfs://namenode.example.org:8020/amshbase.
file:///var/lib/ambari-metrics-collector/hbas
-
For the cluster services to start sending metrics to Ambari Metrics, restart all services. For example, restart HDFS, YARN, HBase, Flume, Storm and Kafka.
Upgrading Ambari with Kerberos-Enabled Cluster
If you are upgrading to Ambari 2.0 from an Ambari-managed cluster that is already Kerberos enabled, because of the new Ambari 2.0 Kerberos features, you need perform the following steps after Ambari upgrade.
-
Review the procedure for Configuring Ambari and Hadoop for Kerberos in the Ambari Security Guide.
-
Have your Kerberos environment information readily available, including your KDC Admin account credentials.
-
Take note of current Kerberos security settings for your cluster.
-
Browse to
Services > HDFS > Configs
. -
Record the core-site auth-to-local property value.
-
-
Upgrade Ambari according to the steps in Upgrading to Ambari 2.0.
-
Ensure your cluster and the Services are healthy.
-
Browse to
Admin > Kerberos
and you’ll notice Ambari thinks that Kerberos is not enabled. Run theEnable Kerberos Wizard,
following the instructions in the Ambari Security Guide. -
Ensure your cluster and the Services are healthy.
-
Verify the Kerberos security settings for your cluster are correct.
-
Browse to
Services > HDFS > Configs
. -
Check the core-site auth-to-local property value.
-
Adjust as necessary, based on the pre-upgrade value recorded in Step 3.
-
Upgrading the HDP Stack from 2.1 to 2.2
The HDP Stack is the coordinated set of Hadoop components that you have installed
on hosts in your cluster. Your set of Hadoop components and hosts is unique to your
cluster. Before upgrading the Stack on your cluster, review all Hadoop services and
hosts in your cluster. For example, use the Hosts and Services views in Ambari Web,
which summarize and list the components installed on each Ambari host, to determine
the components installed on each host. For more information about using Ambari to
view components in your cluster, see Working with Hosts, and Viewing Components on a Host.
Upgrading the HDP Stack is a three-step procedure:
In preparation for future HDP 2.2 releases to support rolling upgrades, the HDP RPM
package version naming convention has changed to include the HDP 2.2 product version
in file and directory names. HDP 2.2 marks the first release where HDP rpms, debs,
and directories contain versions in the names to permit side-by-side installations
of later HDP releases. To transition between previous releases and HDP 2.2, Hortonworks
provides hdp-select, a script that symlinks your directories to hdp/current
and lets you maintain using the same binary and configuration paths that you were
using before.
The following instructions have you remove your older version HDP components, install
hdp-select, and install HDP 2.2 components to prepare for rolling upgrade.
Prepare the 2.1 Stack for Upgrade
To prepare for upgrading the HDP Stack, perform the following tasks:
-
Disable Security.
-
Checkpoint user metadata and capture the HDFS operational state.
This step supports rollback and restore of the original state of HDFS data, if necessary. -
Backup Hive and Oozie metastore databases.
This step supports rollback and restore of the original state of Hive and Oozie data, if necessary.
-
Stop all HDP and Ambari services.
-
Make sure to finish all current jobs running on the system before upgrading the stack.
-
Use Ambari Web, browse to
Services
. Go thru each service and in theService Actions
menu, selectStop All
, except for HDFS and ZooKeeper. -
Stop any client programs that access HDFS.
Perform steps 3 through 8 on the NameNode host. In a highly-available NameNode configuration, execute the following procedure on the primary NameNode.
-
If HDFS is in a non-finalized state from a prior upgrade operation, you must finalize HDFS before upgrading further. Finalizing HDFS will remove all links to the metadata of the prior HDFS version. Do this only if you do not want to rollback to that prior HDFS version.
On the NameNode host, as the HDFS user,
su -l <HDFS_USER> hdfs dfsadmin -finalizeUpgrade
where<HDFS_USER>
is the HDFS Service user. For example, hdfs. -
Check the NameNode directory to ensure that there is no snapshot of any prior HDFS upgrade.
Specifically, using
Ambari Web > HDFS > Configs > NameNode
, examine the<dfs.namenode.name.dir>
or the<dfs.name.dir>
directory in the NameNode Directories property. Make sure that only a "\current" directory and no "\previous" directory exists on the NameNode host. -
Create the following logs and other files.
Creating these logs allows you to check the integrity of the file system, post-upgrade.
As the HDFS user,su -l
<HDFS_USER>
where<HDFS_USER>
is the HDFS Service user. For example, hdfs.-
Run
fsck
with the following flags and send the results to a log.The resulting file contains a complete block map of the file system. You use this log later to confirm the upgrade.
hdfs fsck / -files -blocks -locations > dfs-old-fsck-1.log
-
Optional: Capture the complete namespace of the file system.
The following command does a recursive listing of the root file system:
hadoop dfs -ls -R / > dfs-old-lsr-1.log
-
Create a list of all the DataNodes in the cluster.
hdfs dfsadmin -report > dfs-old-report-1.log
-
Optional: Copy all unrecoverable data stored in HDFS to a local file system or to a backup instance of HDFS.
-
-
Save the namespace.
You must be the HDFS service user to do this and you must put the cluster in Safe Mode.
hdfs dfsadmin -safemode enter hdfs dfsadmin -saveNamespace
-
Copy the checkpoint files located in
<dfs.name.dir/current>
into a backup directory.Find the directory, using
Ambari Web > HDFS > Configs > NameNode > NameNode Directories
on your primary NameNode host. -
Store the layoutVersion for the NameNode.
Make a copy of the file at
<dfs.name.dir>/current/VERSION
, where<dfs.name.dir>
is the value of the config parameter NameNode directories. This file will be used later to verify that the layout version is upgraded. -
Stop HDFS.
-
Stop ZooKeeper.
-
Using
Ambari Web
>Services
><service.name>
>Summary
, review each service and make sure that all services in the cluster are completely stopped. -
At the Hive Metastore database host, stop the Hive metastore service, if you have not done so already.
-
If you are upgrading Hive and Oozie, back up the Hive and Oozie metastore databases on the Hive and Oozie database host machines, respectively.
-
Optional - Back up the Hive Metastore database.
Hive Metastore Database Backup and Restore
Database Type
Backup
Restore
MySQL
mysqldump <dbname> > <outputfilename.sql> For example: mysqldump hive > /tmp/mydir/backup_hive.sql
mysql <dbname> < <inputfilename.sql> For example: mysql hive < /tmp/mydir/backup_hive.sql
Postgres
sudo -u <username> pg_dump <databasename> > <outputfilename.sql> For example: sudo -u postgres pg_dump hive > /tmp/mydir/backup_hive.sql
sudo -u <username> psql <databasename> < <inputfilename.sql> For example: sudo -u postgres psql hive < /tmp/mydir/backup_hive.sql
Oracle
Connect to the Oracle database using sqlplus export the database: exp username/password@database full=yes file=output_file.dmp
Import the database: imp username/password@database ile=input_file.dmp
-
Optional - Back up the Oozie Metastore database.
Oozie Metastore Database Backup and Restore
Database Type
Backup
Restore
MySQL
mysqldump <dbname> > <outputfilename.sql> For example: mysqldump oozie > /tmp/mydir/backup_oozie.sql
mysql <dbname> < <inputfilename.sql> For example: mysql oozie < /tmp/mydir/backup_oozie.sql
Postgres
sudo -u <username> pg_dump <databasename> > <outputfilename.sql> For example: sudo -u postgres pg_dump oozie > /tmp/mydir/backup_oozie.sql
sudo -u <username> psql <databasename> < <inputfilename.sql> For example: sudo -u postgres psql oozie < /tmp/mydir/backup_oozie.sql
-
-
Backup Hue. If you are using the embedded SQLite database, you must perform a backup of the database before you upgrade Hue to prevent data loss. To make a backup copy of the database, stop Hue, then "dump" the database content to a file, as follows:
./etc/init.d/hue stop su $HUE_USER mkdir ~/hue_backup cd /var/lib/hue sqlite3 desktop.db .dump > ~/hue_backup/desktop.bak
For other databases, follow your vendor-specific instructions to create a backup. -
Stage the upgrade script.
-
Create an "Upgrade Folder". For example,
/work/upgrade_hdp_2
, on a host that can communicate with Ambari Server. The Ambari Server host is a suitable candidate. -
Copy the upgrade script to the Upgrade Folder. The script is available here:
/var/lib/ambari-server/resources/scripts/upgradeHelper.py
on the Ambari Server host. -
Copy the upgrade catalog to the Upgrade Folder. The catalog is available here:
/var/lib/ambari-server/resources/upgrade/catalog/UpgradeCatalog_2.1_to_2.2.x.json
. -
Make sure that Python is available on the host and that the version is 2.6 or higher: python --version For RHEL/Centos/Oracle Linux 5, you must use Python 2.6.
-
-
Backup current configuration settings.
-
Go to the Upgrade Folder you just created in step 15.
-
Execute the backup-configs action:
python upgradeHelper.py --hostname <HOSTNAME> --user <USERNAME> --password<PASSWORD> --clustername <CLUSTERNAME> backup-configs Where <HOSTNAME> is the name of the Ambari Server host <USERNAME> is the admin user for Ambari Server <PASSWORD> is the password for the admin user <CLUSTERNAME> is the name of the cluster
This step produces a set of files named TYPE_TAG, where TYPE is the configuration type and TAG is the tag. These files contain copies of the various configuration settings for the current (pre-upgrade) cluster. You can use these files as a reference later.
-
-
On the Ambari Server host, stop Ambari Server and confirm that it is stopped.
ambari-server stop
ambari-server status
-
Stop all Ambari Agents. On every host in your cluster known to Ambari,
ambari-agent stop
Upgrade the 2.1 Stack to 2.2
-
Upgrade the HDP repository on all hosts and replace the old repository file with the new file:
-
For RHEL/CentOS/Oracle Linux 6:
wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/2.x/GA/2.2.x.x/hdp.repo -O /etc/yum.repos.d/HDP.repo
-
For SLES 11 SP3:
wget -nv http://public-repo-1.hortonworks.com/HDP/suse11sp3/2.x/GA/2.2.x.x/hdp.repo -O /etc/zypp/repos.d/HDP.repo
-
For SLES 11 SP1:
wget -nv http://public-repo-1.hortonworks.com/HDP/sles11sp1/2.x/GA/2.2.x.x/hdp.repo -O /etc/zypp/repos.d/HDP.repo
-
For Ubuntu12:
wget -nv http://public-repo-1.hortonworks.com/HDP/ubuntu12/2.x/GA/2.2.x.x/hdp.list -O /etc/apt/sourceslist.d/HDP.list
-
For RHEL/CentOS/Oracle Linux 5: (DEPRECATED)
wget -nv http://public-repo-1.hortonworks.com/HDP/centos5/2.x/GA/2.2.x.x/hdp.repo -O /etc/yum.repos.d/HDP.repo
-
-
Update the Stack version in the Ambari Server database.
On the Ambari Server host, use the following command to update the Stack version to HDP-2.2:
ambari-server upgradestack HDP-2.2
-
Back up the files in following directories on the Oozie server host and make sure that all files, including *site.xml files are copied.
mkdir oozie-conf-bak cp -R /etc/oozie/conf/* oozie-conf-bak
-
Remove the old oozie directories on all Oozie server and client hosts
-
rm -rf /etc/oozie/conf
-
rm -rf /usr/lib/oozie/
-
rm -rf /var/lib/oozie/
-
-
Upgrade the Stack on all Ambari Agent hosts.
-
For RHEL/CentOS/Oracle Linux:
-
On all hosts, clean the yum repository.
yum clean all
-
Remove all HDP 2.1 components that you want to upgrade.
This command un-installs the HDP 2.1 component bits. It leaves the user data and metadata, but removes your configurations.
yum erase "hadoop*" "webhcat*" "hcatalog*" "oozie*" "pig*" "hdfs*" "sqoop*" "zookeeper*" "hbase*" "hive*" "tez*" "storm*" "falcon*" "flume*" "phoenix*" "accumulo*" "mahout*" "hue*" "hdp_mon_nagios_addons"
-
Install all HDP 2.2 components that you want to upgrade.
yum install "hadoop_2_2_x_0_*" "oozie_2_2_x_0_*" "pig_2_2_x_0_*" "sqoop_2_2_x_0_*" "zookeeper_2_2_x_0_*" "hbase_2_2_x_0_*" "hive_2_2_x_0_*" "tez_2_2_x_0_*" "storm_2_2_x_0_*" "falcon_2_2_x_0_*" "flume_2_2_x_0_*" "phoenix_2_2_x_0_*" "accumulo_2_2_x_0_*" "mahout_2_2_x_0_*" rpm -e --nodeps hue-shell yum install hue hue-common hue-beeswax hue-hcatalog hue-pig hue-oozie
-
Verify that the components were upgraded.
yum list installed | grep HDP-<old.stack.version.number>
No component file names should appear in the returned list.
-
-
For SLES:
-
On all hosts, clean the zypper repository.
zypper clean --all
-
Remove all HDP 2.1 components that you want to upgrade.
This command un-installs the HDP 2.1 component bits. It leaves the user data and metadata, but removes your configurations.
zypper remove "hadoop*" "webhcat*" "hcatalog*" "oozie*" "pig*" "hdfs*" "sqoop*" "zookeeper*" "hbase*" "hive*" "tez*" "storm*" "falcon*" "flume*" "phoenix*" "accumulo*" "mahout*" "hue*" "hdp_mon_nagios_addons"
-
Install all HDP 2.2 components that you want to upgrade.
zypper install "hadoop\_2_2_x_0_*" "oozie\_2_2_x_0_*" "pig\_2_2_x_0_*" "sqoop\_2_2_x_0_*" "zookeeper\_2_2_x_0_*" "hbase\_2_2_x_0_*" "hive\_2_2_x_0_*" "tez\_2_2_x_0_*" "storm\_2_2_x_0_*" "falcon\_2_2_x_0_*" "flume\_2_2_x_0_*" "phoenix\_2_2_x_0_*" "accumulo\_2_2_x_0_*" "mahout\_2_2_x_0_*" rpm -e --nodeps hue-shell zypper install hue hue-common hue-beeswax hue-hcatalog hue-pig hue-oozie
-
Verify that the components were upgraded.
rpm -qa | grep hdfs, && rpm -qa | grep hive && rpm -qa | grep hcatalog
No component files names should appear in the returned list.
-
If any components were not upgraded, upgrade them as follows:
yast --update hdfs hcatalog hive
-
-
-
Symlink directories, using
hdp-select
.Check that the
hdp-select
package installed:rpm -qa | grep hdp-select
You should see:hdp-select-2.2.x.x-xxxx.el6.noarch for the HDP 2.2.x release.
If not, then run:yum install hdp-select
Runhdp-select
as root, on every node.hdp-select set all 2.2.x.x-<$version>
where$version
is the build number. For the HDP 2.2.4.2 release <$version> = 2. -
Verify that all components are on the new version. The output of this statement should be empty,
hdp-select status | grep -v 2\.2\.x\.x | grep -v None
-
If you are using Hue, you must upgrade Hue manually. For more information, see Confiure and Start Hue.
-
On the Hive Metastore database host, stop the Hive Metastore service, if you have not done so already. Make sure that the Hive Metastore database is running.
-
Upgrade the Hive metastore database schema from v13 to v14, using the following instructions:
-
Set java home:
export JAVA_HOME=/path/to/java
-
Copy (rewrite) old Hive configurations to new conf dir:
cp -R /etc/hive/conf.server/* /etc/hive/conf/
-
Copy jdbc connector to
/usr/hdp/2.2.x.x-<$version>/hive/lib
, if it is not already in that location. -
<HIVE_HOME>/bin/schematool -upgradeSchema -dbType<databaseType>
where<HIVE_HOME>
is the Hive installation directory.For example, on the Hive Metastore host:
/usr/hdp/2.2.x.x-<$version>/hive/bin/schematool -upgradeSchema -dbType <databaseType>
where <$version> is the 2.2.x build number and <databaseType> is derby, mysql, oracle, or postgres.
-
Complete the Upgrade of the 2.1 Stack to 2.2
-
Start Ambari Server.
On the Ambari Server host,
ambari-server start
-
Start all Ambari Agents.
At each Ambari Agent host,
ambari-agent start
-
Update the repository Base URLs in Ambari Server for the HDP-2.2 stack.
Browse to
Ambari Web > Admin > Repositories,
then set the values for the HDP and HDP-UTILS repository Base URLs. For more information about viewing and editing repository Base URLs, see Viewing Cluster Stack Version and Repository URLs. -
Update the respective configurations.
-
Go to the Upgrade Folder you created when Preparing the 2.1 Stack for Upgrade.
-
Execute the update-configs action:
python upgradeHelper.py --hostname $HOSTNAME --user $USERNAME --password $PASSWORD --clustername $CLUSTERNAME --fromStack=$FROMSTACK --toStack=$TOSTACK --upgradeCatalog=$UPGRADECATALOG update-configs [configuration item]
Where
<HOSTNAME> is the name of the Ambari Server host <USERNAME> is the admin user for Ambari Server <PASSWORD> is the password for the admin user <CLUSTERNAME> is the name of the cluster <FROMSTACK> is the version number of pre-upgraded stack, for example 2.1 <TOSTACK> it the version number of the upgraded stack, for example 2.2.x <UPGRADECATALOG> is the path to the upgrade catalog file, for example UpgradeCatalog_2.1_to_2.2.x.json
For example, To update all configuration items:python upgradeHelper.py --hostname $HOSTNAME --user $USERNAME --password $PASSWORD --clustername $CLUSTERNAME --fromStack=2.1 --toStack=2.2.x --upgradeCatalog=UpgradeCatalog_2.1_to_2.2.x.json update-configs
To update configuration item hive-site:python upgradeHelper.py --hostname $HOSTNAME --user $USERNAME --password $PASSWORD --clustername $CLUSTERNAME --fromStack=2.1 --toStack=2.2.x --upgradeCatalog=UpgradeCatalog_2.1_to_2.2.x.json update-configs hive-site
-
-
Using the Ambari Web UI, add the Tez service if if it has not been installed already. For more information about adding a service, see Adding a Service.
-
Using the Ambari Web UI, add any new services that you want to run on the HDP 2.2.x stack. You must add a Service before editing configuration properties necessary to complete the upgrade.
-
Using the
Ambari Web UI
>Services
, start the ZooKeeper service. -
Copy (rewrite) old hdfs configurations to new conf directory, on all Datanode and Namenode hosts,
cp /etc/hadoop/conf.empty/hdfs-site.xml.rpmsave /etc/hadoop/conf/hdfs-site.xml; cp /etc/hadoop/conf.empty/hadoop-env.sh.rpmsave /etc/hadoop/conf/hadoop-env.sh; cp /etc/hadoop/conf.empty/log4j.properties.rpmsave /etc/hadoop/conf/log4j.properties; cp /etc/hadoop/conf.empty/core-site.xml.rpmsave /etc/hadoop/conf/core-site.xml
-
If you are upgrading from an HA NameNode configuration, start all JournalNodes.
At each JournalNode host, run the following command:
su -l <HDFS_USER> -c "/usr/hdp/2.2.x.x-<$version>/hadoop/sbin/hadoop-daemon.sh start journalnode"
where<HDFS_USER>
is the HDFS Service user. For example, hdfs. -
Because the file system version has now changed, you must start the NameNode manually. On the active NameNode host, as the HDFS user,
su -l <HDFS_USER>
-c "export HADOOP_LIBEXEC_DIR=/usr/hdp/2.2.x.x-<$version>/hadoop/libexec && /usr/hdp/2.2.x.x-<$version>/hadoop/sbin/hadoop-daemon.sh start namenode -upgrade"
where<HDFS_USER>
is the HDFS Service user. For example, hdfs.To check if the Upgrade is progressing, check that the "
\previous
" directory has been created in\NameNode
and\JournalNode
directories. The "\previous
" directory contains a snapshot of the data before upgrade. -
Start all DataNodes.
At each DataNode, as the HDFS user,
su -l <HDFS_USER> -c "/usr/hdp/2.2.x.x-<$version>/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start datanode"
where
<HDFS_USER>
is the HDFS Service user. For example, hdfs. -
Restart HDFS.
-
Open the Ambari Web GUI. If the browser in which Ambari is running has been open throughout the process, clear the browser cache, then refresh the browser.
-
Choose
Ambari Web > Services > HDFS > Service Actions > Restart All
. -
Choose
Service Actions > Run Service Check
. Makes sure the service check passes.
-
-
After the DataNodes are started, HDFS exits SafeMode. To monitor the status, run the following command, on each DataNode:
sudo su -l <HDFS_USER> -c "hdfs dfsadmin -safemode get"
where<HDFS_USER>
is the HDFS Service user. For example, hdfs.
When HDFS exits SafeMode, the following message displays:Safe mode is OFF
-
Make sure that the HDFS upgrade was successful. Optionally, repeat step 5 in Prepare the 2.1 Stack for Upgrade to create new versions of the logs and reports, substituting "-
new
" for "-old
" in the file names as necessary.-
Compare the old and new versions of the following log files:
-
dfs-old-fsck-1.log
versusdfs-new-fsck-1.log
.The files should be identical unless the hadoop fsck reporting format has changed in the new version.
-
dfs-old-lsr-1.log
versusdfs-new-lsr-1.log
.The files should be identical unless the format of hadoop fs -lsr reporting or the data structures have changed in the new version.
-
dfs-old-report-1.log
versusfs-new-report-1.log
Make sure that all DataNodes in the cluster before upgrading are up and running.
-
-
-
If YARN is installed in your HDP 2.1 stack, and the Application Timeline Server (ATS) component is NOT, then you must create and install ATS component using the API
Run the following commands on the server that will host the YARN ATS in your cluster. Be sure to replace <your_ATS_component_hostname> with a host name appropriate for your environment.
-
Create the ATS Service Component.
curl --user admin:admin -H "X-Requested-By: ambari" -i -X POST http://localhost:8080/api/v1/clusters/<your_cluster_name>/services/YARN/components/APP_TIMELINE_SERVER
-
Create the ATS Host Component.
curl --user admin:admin -H "X-Requested-By: ambari" -i -X POST http://localhost:8080/api/v1/clusters/<your_cluster_name>/hosts/<your_ATS_component_hostname>/host_components/APP_TIMELINE_SERVER
-
Install the ATS Host Component.
curl --user admin:admin -H "X-Requested-By: ambari" -i -X PUT -d '{"HostRoles": { "state": "INSTALLED"}}' http://localhost:8080/api/v1/clusters/<your_cluster_name>/hosts/<your_ATS_component_hostname>/host_components/APP_TIMELINE_SERVER
-
-
Prepare MR2 and Yarn for work. Execute HDFS commands on any host.
-
Create mapreduce dir in hdfs.
su -l <HDFS_USER> -c "hdfs dfs -mkdir -p /hdp/apps/2.2.x.x-<$version>/mapreduce/"
-
Copy new mapreduce.tar.gz to HDFS mapreduce dir.
su -l <HDFS_USER> -c "hdfs dfs -copyFromLocal /usr/hdp/2.2.x.x-<$version>/hadoop/mapreduce.tar.gz /hdp/apps/2.2.x.x-<$version>/mapreduce/."
-
Grant permissions for created mapreduce dir in hdfs.
su -l <HDFS_USER> -c "hdfs dfs -chown -R <HDFS_USER>:<HADOOP_GROUP> /hdp"; su -l <HDFS_USER> -c "hdfs dfs -chmod -R 555 /hdp/apps/2.2.x.x-<$version>/mapreduce"; su -l <HDFS_USER> -c "hdfs dfs -chmod -R 444 /hdp/apps/2.2.x.x-<$version>/mapreduce/mapreduce.tar.gz"
-
Update YARN Configuration Properties for HDP 2.2.x
Using
Ambari Web UI
>Service
>Yarn
>Configs
>Custom
>yarn-site
:-
Add
Name
Value
hadoop.registry.zk.quorum
<!--List of hostname:port pairs defining the zookeeper quorum binding for the registry-->
yarn.resourcemanager.zk-address
localhost:2181
-
-
Update Hive Configuration Properties for HDP 2.2.x
-
Using
Ambari Web UI > Services > Hive > Configs > Advanced webhcat-site
:Find the
templeton.hive.properties
property and remove whitespaces after "," from the value. -
Using
Ambari Web UI > Services > Hive > Configs > hive-site.xml
:-
Add
Name
Value
hive.cluster.delegation.token.store.zookeeper.connectString
<!-- The ZooKeeper token store connect string. -->
hive.zookeeper.quorum
<!-- List of zookeeper server to talk to -->
-
-
-
-
Using
Ambari Web
>Services
>Service Actions
, start YARN. -
Using
Ambari Web
>Services
>Service Actions
, start MapReduce2. -
Using
Ambari Web
>Services
>Service Actions
, start HBase and ensure the service check passes. -
Using
Ambari Web
>Services
>Service Actions
,
start the Hive service. -
Upgrade Oozie.
-
Perform the following preparation steps on each Oozie server host:
-
Copy configurations from
oozie-conf-bak
to the/etc/oozie/conf
directory on each Oozie server and client. -
Create
/usr/hdp/2.2.x.x
-<$version>
/oozie/libext-upgrade22
directory.mkdir /usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22
-
Copy the JDBC jar of your Oozie database to both
/usr/hdp/2.2.x.x-
<$version>
/oozie/libext-upgrade22
and/usr/hdp/2.2.x.x-
<$version>
/oozie/libtools
. For example, if you are using MySQL, copy yourmysql-connector-java.jar
. -
Copy these files to
/usr/hdp/2.2.x.x-
<$version>
/oozie/libext-upgrade22
directorycp /usr/lib/hadoop/lib/hadoop-lzo*.jar /usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22; cp /usr/share/HDP-oozie/ext-2.2.zip /usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22; cp /usr/share/HDP-oozie/ext-2.2.zip /usr/hdp/2.2.x.x-<$version>/oozie/libext
-
Grant read/write access to the Oozie user.
chmod -R 777 /usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22
-
-
Upgrade steps:
-
On the Services view, make sure that YARN and MapReduce2 services are running.
-
Make sure that the Oozie service is stopped.
-
In
/etc/oozie/conf/oozie-env.sh
, comment outCATALINA_BASE
property, also do the same using Ambari Web UI inServices
>Oozie
>Configs
>Advanced oozie-env
. -
Upgrade Oozie. At the Oozie database host, as the Oozie service user:
sudo su -l <OOZIE_USER> -c"/usr/hdp/2.2.x.x-<$version>/oozie/bin/ooziedb.sh upgrade -run"
where<OOZIE_USER>
is the Oozie service user. For example, oozie.Make sure that the output contains the string "Oozie DB has been upgraded to Oozie version
<OOZIE_Build_Version>
. -
Prepare the Oozie WAR file.
On the Oozie server, as the Oozie user
sudo su -l <OOZIE_USER> -c "/usr/hdp/2.2.x.x-<$version>/oozie/bin/oozie-setup.sh prepare-war -d /usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22"
where <OOZIE_USER> is the Oozie service user. For example, oozie.Make sure that the output contains the string "New Oozie WAR file added".
-
Using
Ambari Web
, chooseServices
>Oozie
>Configs
, expandoozie-log4j
, then add the following property:log4j.appender.oozie.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - SERVER[${oozie.instance.id}] %m%n
where ${oozie.instance.id} is determined by Oozie, automatically. -
Replace the content of
/usr/oozie/share
in HDFS.On the Oozie server host:
-
Extract the Oozie sharelib into a
tmp
folder.mkdir -p /tmp/oozie_tmp; cp /usr/hdp/2.2.x.x-<$version>/oozie/oozie-sharelib.tar.gz /tmp/oozie_tmp; cd /tmp/oozie_tmp; tar xzvf oozie-sharelib.tar.gz;
-
Back up the
/user/oozie/share
folder in HDFS and then delete it.If you have any custom files in this folder, back them up separately and then add them to the /share folder after updating it.
mkdir /tmp/oozie_tmp/oozie_share_backup; chmod 777 /tmp/oozie_tmp/oozie_share_backup;
su -l <HDFS_USER> -c "hdfs dfs -copyToLocal /user/oozie/share /tmp/oozie_tmp/oozie_share_backup"; su -l <HDFS_USER> -c "hdfs dfs -rm -r /user/oozie/share";
where <HDFS_USER> is the HDFS service user. For example, hdfs. -
Add the latest share libs that you extracted in step 1. After you have added the files, modify ownership and acl.
su -l <HDFS_USER> -c "hdfs dfs -copyFromLocal /tmp/oozie_tmp/share /user/oozie/."; su -l <HDFS_USER> -c "hdfs dfs -chown -R <OOZIE_USER>:<HADOOP_GROUP> /user/oozie"; su -l <HDFS_USER> -c "hdfs dfs -chmod -R 755 /user/oozie";
where<HDFS_USER>
is the HDFS service user. For example, hdfs.
-
-
-
-
Use the
Ambari Web UI
>Services
view to start the Oozie service.Make sure that ServiceCheck passes for Oozie.
-
Update WebHCat.
-
Modify the
webhcat-site
config type.Using
Ambari Web
>Services
>WebHCat
, modify the following configuration:Action
Property Name
Property Value
Modify
templeton.storage.class
org.apache.hive.hcatalog.templeton.tool.ZooKeeperStorage
-
Expand
Advanced
>webhcat-site.xml
.Check if property
templeton.port
exists. If not, then add it using the Custom webhcat-site panel. The default value for templeton.port = 50111. -
On each WebHCat host, update the Pig and Hive tar bundles, by updating the following files:
-
/apps/webhcat/pig.tar.gz
-
/apps/webhcat/hive.tar.gz
- For example, to update a *.tar.gz file:
-
Move the file to a local directory.
su -l
<HCAT_USER>-c "hadoop --config /etc/hadoop/conf fs -copyToLocal /apps/webhcat/*.tar.gz
<local_backup_dir>"
-
Remove the old file.
su -l
<HCAT_USER>-c "hadoop --config /etc/hadoop/conf fs -rm /apps/webhcat/*.tar.gz"
-
Copy the new file.
su -l <HCAT_USER> -c "hdfs --config /etc/hadoop/conf dfs -copyFromLocal /usr/hdp/2.2.x.x-<$version>/hive/hive.tar.gz /apps/webhcat/"; su -l <HCAT_USER> -c "hdfs --config /etc/hadoop/conf dfs -copyFromLocal /usr/hdp/2.2.x.x-<$version>/pig/pig.tar.gz /apps/webhcat/";
where <HCAT_USER> is the HCatalog service user. For example, hcat.
-
-
On each WebHCat host, update
/app/webhcat/hadoop-streaming.jar
file.-
Move the file to a local directory.
su -l
<HCAT_USER>-c "hadoop --config /etc/hadoop/conf fs -copyToLocal /apps/webhcat/hadoop-streaming*.jar <local_backup_dir>"
-
Remove the old file.
su -l <HCAT_USER> -c "hadoop --config /etc/hadoop/conf fs -rm /apps/webhcat/hadoop-streaming*.jar"
-
Copy the new hadoop-streaming.jar file.
su -l <HCAT_USER> -c "hdfs --config /etc/hadoop/conf dfs -copyFromLocal /usr/hdp/2.2.x.x-<$version>/hadoop-mapreduce/hadoop-streaming*.jar /apps/webhcat"
where <HCAT_USER> is the HCatalog service user. For example, hcat.
-
-
-
If Tez was not installed during the upgrade, you must prepare Tez for work, using the following steps:
If you use Tez as the Hive execution engine, and if the variable hive.server2.enabled.doAs is set to true, you must create a scratch directory on the NameNode host for the username that will run the HiveServer2 service. If you installed Tez before upgrading the Stack, use the following commands:
sudo su -c "hdfs -makedir /tmp/hive- <username> " sudo su -c "hdfs -chmod 777 /tmp/hive- <username> "
where<username>
is the name of the user that runs the HiveServer2 service.-
Put Tez libraries in hdfs. Execute at any host:
su -l <HDFS_USER> -c "hdfs dfs -mkdir -p /hdp/apps/2.2.x.x-<$version>/tez/" su -l <HDFS_USER> -c "hdfs dfs -copyFromLocal -f /usr/hdp/2.2.x.x-<$version>/tez/lib/tez.tar.gz /hdp/apps/2.2.x.x-<$version>/tez/." su -l <HDFS_USER> -c "hdfs dfs -chown -R <HDFS_USER>:<HADOOP_GROUP> /hdp" su -l <HDFS_USER> -c "hdfs dfs -chmod -R 555 /hdp/apps/2.2.x.x-<$version>/tez" su -l hdfs -c "hdfs dfs -chmod -R 444 /hdp/apps/2.2.x.x-<$version>/tez/tez.tar.gz"
-
-
Prepare the Storm service properties.
-
Edit nimbus.childopts.
Using Ambari Web UI > Services > Storm > Configs > Nimbus > find nimbus.childopts. Update the path for the jmxetric-1.0.4.jar to: /usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar. If nimbus.childopts property value contains "-Djava.security.auth.login.config=/path/to/storm_jaas.conf", remove this text.
-
Edit supervisor.childopts.
Using Ambari Web UI > Services > Storm > Configs > Supervisor > find supervisor.childopts. Update the path for the jmxetric-1.0.4.jar to: /usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar. If supervisor.childopts property value contains "-Djava.security.auth.login.config=/etc/storm/conf/storm_jaas.conf", remove this text.
-
Edit worker.childopts.
Using Ambari Web UI > Services > Storm > Configs > Advanced > storm-site find worker.childopts. Update the path for the jmxetric-1.0.4.jar to: /usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar.
Check if the _storm.thrift.nonsecure.transport property exists. If not, add it, _storm.thrift.nonsecure.transport = backtype.storm.security.auth.SimpleTransportPlugin, using the Custom storm-site panel. -
Remove the
storm.local.dir
from every host where the Storm component is installed.You can find this property in the Storm > Configs > General tab.
rm -rf
<storm.local.dir> -
If you are planning to enable secure mode, navigate to
Ambari Web UI
>Services
>Storm
>Configs
>Advanced storm-site
and add the following property:_storm.thrift.secure.transport=backtype.storm.security.auth.kerberos.KerberosSaslTransportPlugin
-
Stop the Storm Rest_API Component.
curl -u admin:admin -X PUT -H 'X-Requested-By:1' -d '{"RequestInfo":{"context":"Stop Component"},"Body":{"HostRoles":{"state":"INSTALLED"}}}' http://server:8080/api/v1/clusters/c1/hosts/host_name/host_components/STORM_REST_API
-
Delete the Storm Rest_API Component.
curl -u admin:admin -X DELETE -H 'X-Requested-By:1' http://server:8080/api/v1/clusters/c1/hosts/host_name/host_components/STORM_REST_API
-
-
Upgrade Pig.
Copy the the Pig configuration files to /etc/pig/conf.
cp /etc/pig/conf.dist/pig-env.sh /etc/pig/conf/;
-
Using
Ambari Web UI
>Services
>Storm
, start the Storm service. -
Using
Ambari Web
>Services
>Service Actions
, re-start all stopped services. -
The upgrade is now fully functional but not yet finalized. Using the
finalize
command removes the previous version of the NameNode and DataNode storage directories.The upgrade must be finalized before another upgrade can be performed.
To finalize the upgrade, execute the following command once, on the primary NameNode host in your HDP cluster,
sudo su -l <HDFS_USER> -c "hdfs dfsadmin -finalizeUpgrade"
where <HDFS_USER> is the HDFS service user. For example, hdfs.
Upgrading the HDP Stack from 2.0 to 2.2
The HDP Stack is the coordinated set of Hadoop components that you have installed
on hosts in your cluster. Your set of Hadoop components and hosts is unique to your
cluster. Before upgrading the Stack on your cluster, review all Hadoop services and
hosts in your cluster to confirm the location of Hadoop components. For example, use
the Hosts
and Services
views in Ambari Web, which summarize and list the components installed on each Ambari
host, to determine the components installed on each host. For more information about
using Ambari to view components in your cluster, see Working with Hosts, and Viewing Components on a Host.
Complete the following procedures to upgrade the Stack from version 2.0 to version 2.2.x on your current, Ambari-installed-and-managed cluster.
In preparation for future HDP 2.2 releases to support rolling upgrades, the HDP RPM
package version naming convention has changed to include the HDP 2.2 product version
in file and directory names. HDP 2.2 marks the first release where HDP rpms, debs,
and directories contain versions in the names to permit side-by-side installations
of later HDP releases. To transition between previous releases and HDP 2.2, Hortonworks
provides hdp-select, a script that symlinks your directories to hdp/current
and lets you maintain using the same binary and configuration paths that you were
using before.
Prepare the 2.0 Stack for Upgrade
To prepare for upgrading the HDP Stack, this section describes how to perform the following tasks:
-
Disable Security.
-
Checkpoint user metadata and capture the HDFS operational state. This step supports rollback and restore of the original state of HDFS data, if necessary.
-
Backup Hive and Oozie metastore databases. This step supports rollback and restore of the original state of Hive and Oozie data, if necessary.
-
Stop all HDP and Ambari services.
-
Make sure to finish all current jobs running on the system before upgrading the stack.
-
Use
Ambari Web > Services > Service Actions
to stop all services except HDFS and ZooKeeper. -
Stop any client programs that access HDFS.
Perform steps 3 through 8 on the NameNode host. In a highly-available NameNode configuration, execute the following procedure on the primary NameNode.
-
If HDFS is in a non-finalized state from a prior upgrade operation, you must finalize HDFS before upgrading further. Finalizing HDFS will remove all links to the metadata of the prior HDFS version - do this only if you do not want to rollback to that prior HDFS version.
On the NameNode host, as the HDFS user,
su -l
<HDFS_USER>hdfs dfsadmin -finalizeUpgrade
where <HDFS_USER> is the HDFS Service user. For example, hdfs. -
Check the NameNode directory to ensure that there is no snapshot of any prior HDFS upgrade. Specifically, using
Ambari Web > HDFS > Configs > NameNode
, examine the <$dfs.namenode.name.dir> or the <$dfs.name.dir> directory in the NameNode Directories property. Make sure that only a "\current" directory and no "\previous" directory exists on the NameNode host. -
Create the following logs and other files.
Creating these logs allows you to check the integrity of the file system, post-upgrade.
As the HDFS user,su -l
<HDFS_USER>
where <HDFS_USER> is the HDFS Service user. For example, hdfs.-
Run
fsck
with the following flags and send the results to a log. The resulting file contains a complete block map of the file system. You use this log later to confirm the upgrade.hdfs fsck / -files -blocks -locations > dfs-old-fsck-1.log
-
Optional: Capture the complete namespace of the filesystem. The following command does a recursive listing of the root file system:
hadoop dfs -ls -R / > dfs-old-lsr-1.log
-
Create a list of all the DataNodes in the cluster.
hdfs dfsadmin -report > dfs-old-report-1.log
-
Optional: Copy all unrecoverable data stored in HDFS to a local file system or to a backup instance of HDFS.
-
-
Save the namespace.
You must be the HDFS service user to do this and you must put the cluster in Safe Mode.
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
-
Copy the checkpoint files located in <$dfs.name.dir/current> into a backup directory.
Find the directory, using Ambari Web > HDFS > Configs > NameNode > NameNode Directories on your primary NameNode host. -
Store the layoutVersion for the NameNode. Make a copy of the file at <dfs.name.dir>
/current/VERSION
where <dfs.name.dir> is the value of the config parameterNameNode directories
. This file will be used later to verify that the layout version is upgraded. -
Stop HDFS.
-
Stop ZooKeeper.
-
Using
Ambari Web
>Services
> <service.name> >Summary
, review each service and make sure that all services in the cluster are completely stopped. -
On the Hive Metastore database host, stop the Hive metastore service, if you have not done so already.
-
If you are upgrading Hive and Oozie, back up the Hive and Oozie metastore databases on the Hive and Oozie database host machines, respectively.
-
Optional - Back up the Hive Metastore database.
Hive Metastore Database Backup and Restore
Database Type
Backup
Restore
MySQL
mysqldump <dbname> > <outputfilename.sql> For example: mysqldump hive > /tmp/mydir/backup_hive.sql
mysql <dbname> < <inputfilename.sql> For example: mysql hive < /tmp/mydir/backup_hive.sql
Postgres
sudo -u <username> pg_dump <databasename> > <outputfilename.sql> For example: sudo -u postgres pg_dump hive > /tmp/mydir/backup_hive.sql
sudo -u <username> psql <databasename> < <inputfilename.sql> For example: sudo -u postgres psql hive < /tmp/mydir/backup_hive.sql
Oracle
Connect to the Oracle database using sqlplus export the database: exp username/password@database full=yes file=output_file.dmp
Import the database: imp username/password@database ile=input_file.dmp
-
Optional - Back up the Oozie Metastore database.
Oozie Metastore Database Backup and Restore
Database Type
Backup
Restore
MySQL
mysqldump <dbname> > <outputfilename.sql> For example: mysqldump oozie > /tmp/mydir/backup_oozie.sql
mysql <dbname> < <inputfilename.sql> For example: mysql oozie < /tmp/mydir/backup_oozie.sql
Postgres
sudo -u <username> pg_dump <databasename> > <outputfilename.sql> For example: sudo -u postgres pg_dump oozie > /tmp/mydir/backup_oozie.sql
sudo -u <username> psql <databasename> < <inputfilename.sql> For example: sudo -u postgres psql oozie < /tmp/mydir/backup_oozie.sql
-
-
Stage the upgrade script.
-
Create an "Upgrade Folder". For example,
/work/upgrade_hdp_2
, on a host that can communicate with Ambari Server. The Ambari Server host is a suitable candidate. -
Copy the upgrade script to the Upgrade Folder. The script is available on the Ambari Server host in
/var/lib/ambari-server/resources/scripts/upgradeHelper.py
. -
Copy the upgrade catalog to the Upgrade Folder. The catalog is available in
/var/lib/ambari-server/resources/upgrade/catalog/UpgradeCatalog_2.0_to_2.2.x.json
.
-
-
Backup current configuration settings:
-
Go to the Upgrade Folder you just created in step 14.
-
Execute the backup-configs action:
python upgradeHelper.py --hostname <HOSTNAME> --user <USERNAME> --password<PASSWORD> --clustername <CLUSTERNAME> backup-configs
Where <HOSTNAME> is the name of the Ambari Server host <USERNAME> is the admin user for Ambari Server <PASSWORD> is the password for the admin user <CLUSTERNAME> is the name of the cluster
This step produces a set of files named TYPE_TAG, where TYPE is the configuration type and TAG is the tag. These files contain copies of the various configuration settings for the current (pre-upgrade) cluster. You can use these files as a reference later.
-
-
On the Ambari Server host, stop Ambari Server and confirm that it is stopped.
ambari-server stop
ambari-server status
-
Stop all Ambari Agents.
At every host in your cluster known to Ambari,
ambari-agent stop
Upgrade the 2.0 Stack to 2.2
-
Upgrade the HDP repository on all hosts and replace the old repository file with the new file:
-
For RHEL/CentOS/Oracle Linux 6:
wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/2.x/GA/2.2.x.x/hdp.repo -O /etc/yum.repos.d/HDP.repo
-
For SLES 11 SP3:
wget -nv http://public-repo-1.hortonworks.com/HDP/suse11sp3/2.x/GA/2.2.x.x/hdp.repo -O /etc/zypp/repos.d/HDP.repo
-
For SLES 11 SP1:
wget -nv http://public-repo-1.hortonworks.com/HDP/sles11sp1/2.x/GA/2.2.x.x/hdp.repo -O /etc/zypp/repos.d/HDP.repo
-
For UBUNTU 12:
wget -nv http://public-repo-1.hortonworks.com/HDP/ubuntu12/2.x/GA/2.2.x.x/hdp.list -O /etc/apt/sourceslist.d/HDP.list
-
For RHEL/CentOS/Oracle Linux 5:
wget -nv http://public-repo-1.hortonworks.com/HDP/centos5/2.x/GA/2.2.x.x/hdp.repo -O /etc/yum.repos.d/HDP.repo
-
-
Update the Stack version in the Ambari Server database. On the Ambari Server host, use the following command to update the Stack version to HDP-2.2:
ambari-server upgradestack HDP-2.2
-
Back up the files in following directories on the Oozie server host and make sure that all files, including *site.xml files are copied.
mkdir oozie-conf-bak cp -R /etc/oozie/conf/* oozie-conf-bak
-
Remove the old oozie directories on all Oozie server and client hosts.
-
rm -rf /etc/oozie/conf
-
r
m -rf /usr/lib/oozie/
-
rm -rf /var/lib/oozie/
-
-
Upgrade the Stack on all Ambari Agent hosts.
-
For RHEL/CentOS/Oracle Linux:
-
On all hosts, clean the yum repository.
yum clean all
-
Remove all components that you want to upgrade. At least, WebHCat, HCatlaog, and Oozie components. This command un-installs the HDP 2.0 component bits. It leaves the user data and metadata, but removes your configurations.
yum erase "hadoop*" "webhcat*" "hcatalog*" "oozie*" "pig*" "hdfs*" "sqoop*" "zookeeper*" "hbase*" "hive*" "phoenix*" "accumulo*" "mahout*" "hue*" "flume*" "hdp_mon_nagios_addons"
-
Install the following components:
yum install "hadoop_2_2_x_0_*" "oozie_2_2_x_0_*" "pig_2_2_x_0_*" "sqoop_2_2_x_0_*" "zookeeper_2_2_x_0_*" "hbase_2_2_x_0_*" "hive_2_2_x_0_*" "flume_2_2_x_0_*" "phoenix_2_2_x_0_*" "accumulo_2_2_x_0_*" "mahout_2_2_x_0_*" rpm -e --nodeps hue-shell yum install hue hue-common hue-beeswax hue-hcatalog hue-pig hue-oozie
-
Verify that the components were upgraded.
yum list installed | grep HDP-
<old-stack-version-number>Nothing should appear in the returned list.
-
-
For SLES:
-
On all hosts, clean the zypper repository.
zypper clean --all
-
Remove WebHCat, HCatalog, and Oozie components. This command uninstalls the HDP 2.0 component bits. It leaves the user data and metadata, but removes your configurations.
zypper remove "hadoop*" "webhcat*" "hcatalog*" "oozie*" "pig*" "hdfs*" "sqoop*" "zookeeper*" "hbase*" "hive*" "phoenix*" "accumulo*" "mahout*" "hue*" "flume*" "hdp_mon_nagios_addons"
-
Install the following components:
zypper install "hadoop\_2_2_x_0_*" "oozie\_2_2_x_0_*" "pig\_2_2_x_0_*" "sqoop\_2_2_x_0_*" "zookeeper\_2_2_x_0_*" "hbase\_2_2_x_0_*" "hive\_2_2_x_0_*" "flume\_2_2_x_0_*" "phoenix\_2_2_x_0_*" "accumulo\_2_2_x_0_*" "mahout\_2_2_x_0_*" rpm -e --nodeps hue-shell zypper install hue hue-common hue-beeswax hue-hcatalog hue-pig hue-oozie
-
Verify that the components were upgraded.
rpm -qa | grep hadoop, && rpm -qa | grep hive && rpm -qa | grep hcatalog
No 2.0 components should appear in the returned list.
-
If components were not upgraded, upgrade them as follows:
yast --update hadoop hcatalog hive
-
-
-
Symlink directories, using
hdp-select
.Check that the
hdp-select
package installed:rpm -qa | grep hdp-select
You should see:hdp-select-2.2.4.2-2.el6.noarch
If not, then run:yum install hdp-select
Runhdp-select
as root, on every node. In/usr/bin
:hdp-select set all 2.2.x.x-<$version>
where <$version> is the build number. For the HDP 2.2.4.2 release <$version> = 2. -
Verify that all components are on the new version. The output of this statement should be empty,
hdp-select status | grep -v 2\.2\.x\.x | grep -v None
-
If you are using Hue, you must upgrade Hue manually. For more information, see Confiure and Start Hue.
-
On the Hive Metastore database host, stop the Hive Metastore service, if you have not done so already. Make sure that the Hive Metastore database is running.
-
Upgrade the Hive metastore database schema from v12 to v14, using the following instructions:
-
Set java home:
export JAVA_HOME=/path/to/java
-
Copy (rewrite) old Hive configurations to new conf dir:
cp -R /etc/hive/conf.server/* /etc/hive/conf/
-
Copy the jdbc connector to
/usr/hdp/2.2.x.x-<$version>/hive/lib
, if it not there, yet. -
<HIVE_HOME>/bin/schematool -upgradeSchema -dbType<databaseType>
where <HIVE_HOME> is the Hive installation directory.
For example, on the Hive Metastore host:
/usr/hdp/2.2.x.x-<$version>/hive/bin/schematool -upgradeSchema -dbType <databaseType> where <$version> is the 2.2.x build number and <databaseType> is derby, mysql, oracle, or postgres.
-
Complete the Upgrade of the 2.0 Stack to 2.2
-
Start Ambari Server.
On the Server host,
ambari-server start
-
Start all Ambari Agents.
On each Ambari Agent host,
ambari-agent start
-
Update the repository Base URLs in the Ambari Server for the HDP 2.2.0 stack.
Browse to
Ambari Web
>Admin
>Repositories
, then set the value of the HDP and HDP-UTILS repository Base URLs. For more information about viewing and editing repository Base URLs, see Viewing Cluster Stack Version and Repository URLs. -
Update the respective configurations.
-
Go to the Upgrade Folder you created when Preparing the 2.0 Stack for Upgrade.
-
Execute the update-configs action:
python upgradeHelper.py --hostname $HOSTNAME --user $USERNAME --password $PASSWORD --clustername $CLUSTERNAME --fromStack=$FROMSTACK --toStack=$TOSTACK --upgradeCatalog=$UPGRADECATALOG update-configs [configuration item]
Where
<HOSTNAME> is the name of the Ambari Server host <USERNAME> is the admin user for Ambari Server <PASSWORD> is the password for the admin user <CLUSTERNAME> is the name of the cluster <FROMSTACK> is the version number of pre-upgraded stack, for example 2.0 <TOSTACK> it the version number of the upgraded stack, for example 2.2.x <UPGRADECATALOG> is the path to the upgrade catalog file, for example UpgradeCatalog_2.0_to_2.2.x.json
For example, To update all configuration items:python upgradeHelper.py --hostname $HOSTNAME --user $USERNAME --password $PASSWORD --clustername $CLUSTERNAME --fromStack=2.0 --toStack=2.2.x --upgradeCatalog=UpgradeCatalog_2.0_to_2.2.x.json update-configs
To update configuration item hive-site:python upgradeHelper.py --hostname $HOSTNAME --user $USERNAME --password $PASSWORD --clustername $CLUSTERNAME --fromStack=2.0 --toStack=2.2.x --upgradeCatalog=UpgradeCatalog_2.0_to_2.2.x.json update-configs hive-site
-
-
Using the
Ambari Web UI > Services
, start the ZooKeeper service. -
At all Datanode and Namenode hosts, copy (rewrite) old hdfs configurations to new conf directory:
cp /etc/hadoop/conf.empty/hdfs-site.xml.rpmsave /etc/hadoop/conf/hdfs-site.xml;
cp /etc/hadoop/conf.empty/hadoop-env.sh.rpmsave /etc/hadoop/conf/hadoop-env.sh;
cp /etc/hadoop/conf.empty/log4j.properties.rpmsave /etc/hadoop/conf/log4j.properties;
cp /etc/hadoop/conf.empty/core-site.xml.rpmsave /etc/hadoop/conf/core-site.xml
-
If you are upgrading from an HA NameNode configuration, start all JournalNodes.
On each JournalNode host, run the following command:
su -l <HDFS_USER> -c "/usr/hdp/2.2.x.x-<$version>/hadoop/sbin/hadoop-daemon.sh start journalnode"
where <HDFS_USER> is the HDFS Service user. For example, hdfs. -
Because the file system version has now changed, you must start the NameNode manually.
On the active NameNode host, as the HDFS user:
su -l <HDFS_USER> -c "export HADOOP_LIBEXEC_DIR=/usr/hdp/2.2.x.x-<$version>/hadoop/libexec && /usr/hdp/2.2.x.x-<$version>/hadoop/sbin/hadoop-daemon.sh start namenode -upgrade"
To check if the Upgrade is in progress, check that the "
\previous
" directory has been created in \NameNode and \JournalNode directories. The "\previous
" directory contains a snapshot of the data before upgrade. -
Start all DataNodes.
On each DataNode, as the HDFS user,
su -l <HDFS_USER> -c "/usr/hdp/2.2.x.x-<$version>/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start datanode"
where <HDFS_USER> is the HDFS Service user. For example, hdfs.
-
Restart HDFS.
-
Open the Ambari Web GUI. If the browser in which Ambari is running has been open throughout the process, clear the browser cache, then refresh the browser.
-
Choose
Ambari Web
>Services
>HDFS
>Service Actions
>Restart All.
-
Choose
Service Actions
>Run Service Check
. Makes sure the service checks pass.
-
-
After the DataNodes are started, HDFS exits safe mode. Monitor the status, by running the following command, as the HDFS user:
sudo su -l <HDFS_USER> -c "hdfs dfsadmin -safemode get"
When HDFS exits safe mode, the following message displays:
Safe mode is OFF
-
Make sure that the HDFS upgrade was successful.
-
Compare the old and new versions of the following log files:
-
dfs-old-fsck-1.log
versusdfs-new-fsck-1.log
.The files should be identical unless the hadoop fsck reporting format has changed in the new version.
-
dfs-old-lsr-1.log
versusdfs-new-lsr-1.log
.The files should be identical unless the format of hadoop fs -lsr reporting or the data structures have changed in the new version.
-
dfs-old-report-1.log
versusfs-new-report-1.log.
Make sure that all DataNodes in the cluster before upgrading are up and running.
-
-
-
Using Ambari Web, navigate to
Services
>Hive
>Configs
>Advanced
Hive (Advanced) hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider hive.security.metastore.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator
-
Update Hive Configuration Properties for HDP 2.2.x
Using
Ambari Web UI
>Services
>Hive
>Configs
>hive-site.xml
:-
hive-site
Name
Value
hive.cluster.delegation.token.store.zookeeper.connectString
<!-- The ZooKeeper token store connect string. -->
hive.zookeeper.quorum
<!-- List of zookeeper servers to talk to -->
-
webhcat-site
Name
Value
templeton.hive.properties
<!-- Properties to set when running hive -->
-
-
If YARN is installed in your HDP 2.0 stack, and the Application Timeline Server (ATS) components are NOT, then you must create and install ATS service and host components via API by running the following commands on the server that will host the YARN application timeline server in your cluster. Be sure to replace <your_ATS_component_hostname> with a host name appropriate for your envrionment.
-
Create the ATS Service Component.
curl --user admin:admin -H "X-Requested-By: ambari" -i -X POST http://localhost:8080/api/v1/clusters/<your_cluster_name>/services/YARN/components/APP_TIMELINE_SERVER
-
Create the ATS Host Component.
curl --user admin:admin -H "X-Requested-By: ambari" -i -X POST http://localhost:8080/api/v1/clusters/<your_cluster_name>/hosts/<your_ATS_component_hostname>/host_components/APP_TIMELINE_SERVER
-
Install the ATS Host Component.
curl --user admin:admin -H "X-Requested-By: ambari" -i -X PUT -d '{ "HostRoles": { "state": "INSTALLED"}}' http://localhost:8080/api/v1/clusters/<your_cluster_name>/hosts/<your_ATS_component_hostname>/host_components/APP_TIMELINE_SERVER
-
-
Make the following config changes required for Application Timeline Server. Use the Ambari web UI to navigate to the service dashboard and add/modify the following configurations:
YARN (Custom yarn-site.xml) yarn.timeline-service.leveldb-timeline-store.path=/var/log/hadoop-yarn/timeline yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=300000 yarn.timeline-service.store-class=org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore yarn.timeline-service.ttl-enable=true yarn.timeline-service.ttl-ms=2678400000 yarn.timeline-service.generic-application-history.store-class=org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore yarn.timeline-service.webapp.address=<PUT_THE_FQDN_OF_ATS_HOST_NAME_HERE>:8188 yarn.timeline-service.webapp.https.address=<PUT_THE_FQDN_OF_ATS_HOST_NAME_HERE>:8190 yarn.timeline-service.address=<PUT_THE_FQDN_OF_ATS_HOST_NAME_HERE>:10200
HIVE (hive-site.xml) hive.execution.engine=mr hive.exec.failure.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook hive.exec.post.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook hive.exec.pre.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook hive.tez.container.size=<map-container-size>
*If mapreduce.map.memory.mb > 2GB then set it equal to mapreduce.map.memory. Otherwise, set it equal to
mapreduce.reduce.memory.mb* hive.tez.java.opts="-server -Xmx" + Math.round(0.8 * map-container-size) + "m -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseParallelGC"
-
Prepare MR2 and Yarn for work. Execute hdfs commands on any host.
-
Create mapreduce dir in hdfs.
su -l <HDFS_USER> -c "hdfs dfs -mkdir -p /hdp/apps/2.2.x.x-<$version>/mapreduce/"
-
Copy new mapreduce.tar.gz to hdfs mapreduce dir.
su -l <HDFS_USER> -c "hdfs dfs -copyFromLocal /usr/hdp/2.2.x.x-<$version>/hadoop/mapreduce.tar.gz /hdp/apps/2.2.x.x-<$version>/mapreduce/."
-
Grant permissions for created mapreduce dir in hdfs.
su -l <HDFS_USER> -c "hdfs dfs -chown -R <HDFS_USER>:<HADOOP_GROUP> /hdp"; su -l <HDFS_USER> -c "hdfs dfs -chmod -R 555 /hdp/apps/2.2.x.x-<$version>/mapreduce"; su -l <HDFS_USER> -c "hdfs dfs -chmod -R 444 /hdp/apps/2.2.x.x-<$version>/mapreduce/mapreduce.tar.gz"
-
Using
Ambari Web UI
>Service
>Yarn
>Configs
>Advanced
>yarn-site
. Add/modify the following property:Name
Value
hadoop.registry.zk.quorum
<!-- List of zookeeper servers to talk to -->
yarn.resourcemanager.zk-address
<!-- Zookeeper server to talk to -->
yarn.timeline-service.address
<!-- Timeline service fqdn address -->
yarn.timeline-service.webapp.address
<!-- Timeline service webapp fqdn address -->
yarn.timeline-service.webapp.https.address
<!-- Timeline service https webapp fqdn address -->
-
-
Using
Ambari Web
>Services
>Service Actions
, start YARN. -
Using
Ambari Web
>Services
>Service Actions
, start MapReduce2. -
Using
Ambari Web
>Services
>Service Actions,
start HBase and ensure the service check passes. -
Using
Ambari Web
>Services
>Service Actions
,
start the Hive service. -
Upgrade Oozie.
-
Perform the following preparation steps on each Oozie server host:
-
Copy configurations from
oozie-conf-bak
to the/etc/oozie/conf
directory on each Oozie server and client. -
Create
/usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22
directory.mkdir /usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22
-
Copy the JDBC jar of your Oozie database to both
/usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22
and/usr/hdp/2.2.x.x-<$version>/oozie/libtools
. For example, if you are using MySQL, copy yourmysql-connector-java.jar
. -
Copy these files to
/usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22
directorycp /usr/lib/hadoop/lib/hadoop-lzo*.jar /usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22; cp /usr/share/HDP-oozie/ext-2.2.zip /usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22; cp /usr/share/HDP-oozie/ext-2.2.zip /usr/hdp/2.2.x.x-<$version>/oozie/libext
-
Grant read/write access to the Oozie user.
chmod -R 777 /usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22
-
-
Upgrade steps:
-
On the Services view, make sure that YARN and MapReduce2 services are running.
-
Make sure that the Oozie service is stopped.
-
In
oozie-env.sh
, comment outCATALINA_BASE
property, also do the same using Ambari Web UI inServices
>Oozie
>Configs
>Advanced oozie-env
. -
Upgrade Oozie.
At the Oozie server host, as the Oozie service user:
sudo su -l <OOZIE_USER> -c"/usr/hdp/2.2.x.x-<$version>/oozie/bin/ooziedb.sh upgrade -run"
where <OOZIE_USER> is the Oozie service user. For example, oozie.Make sure that the output contains the string "Oozie DB has been upgraded to Oozie version <OOZIE_Build_Version>.
-
Prepare the Oozie WAR file.
At the Oozie server, as the Oozie user
sudo su -l <OOZIE_USER> -c "/usr/hdp/2.2.x.x-<$version>/oozie/bin/oozie-setup.sh prepare-war -d /usr/hdp/2.2.x.x-<$version>/oozie/libext-upgrade22"
where <OOZIE_USER> is the Oozie service user. For example, oozie.Make sure that the output contains the string "New Oozie WAR file added".
-
Using
Ambari Web
, chooseServices > Oozie > Configs, expand oozie-log4j
, then add the following property:log4j.appender.oozie.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - SERVER[${oozie.instance.id}] %m%n
where ${oozie.instance.id} is determined by oozie, automatically. -
Using Ambari Web, choose
Services
>Oozie
>Configs
,
expandAdvanced oozie-site
, then edit the following properties:-
In
oozie.service.coord.push.check.requeue.interval
, replace the existing property value with the following one:30000
-
In
oozie.service.SchemaService.wf.ext.schemas
, append (using copy/paste) to the existing property value the following string, if is it is not already present:shell-action-0.1.xsd,shell-action-0.2.xsd,shell-action-0.3.xsd,email-action-0.1.xsd,email-action-0.2.xsd,hive-action-0.2.xsd,hive-action-0.3.xsd,hive-action-0.4.xsd,hive-action-0.5.xsd,sqoop-action-0.2.xsd,sqoop-action-0.3.xsd,sqoop-action-0.4.xsd,ssh-action-0.1.xsd,ssh-action-0.2.xsd,distcp-action-0.1.xsd,distcp-action-0.2.xsd,oozie-sla-0.1.xsd,oozie-sla-0.2.xsd
-
In
oozie.service.URIHandlerService.uri.handlers
, append to the existing property value the following string, if is it is not already present:org.apache.oozie.dependency.FSURIHandler,org.apache.oozie.dependency.HCatURIHandler
-
In
oozie.services
, make sure all the following properties are present:org.apache.oozie.service.SchedulerService, org.apache.oozie.service.InstrumentationService, org.apache.oozie.service.MemoryLocksService, org.apache.oozie.service.UUIDService, org.apache.oozie.service.ELService, org.apache.oozie.service.AuthorizationService, org.apache.oozie.service.UserGroupInformationService, org.apache.oozie.service.HadoopAccessorService, org.apache.oozie.service.JobsConcurrencyService, org.apache.oozie.service.URIHandlerService, org.apache.oozie.service.DagXLogInfoService, org.apache.oozie.service.SchemaService, org.apache.oozie.service.LiteWorkflowAppService, org.apache.oozie.service.JPAService, org.apache.oozie.service.StoreService, org.apache.oozie.service.CoordinatorStoreService, org.apache.oozie.service.SLAStoreService, org.apache.oozie.service.DBLiteWorkflowStoreService, org.apache.oozie.service.CallbackService, org.apache.oozie.service.ActionService, org.apache.oozie.service.ShareLibService, org.apache.oozie.service.CallableQueueService, org.apache.oozie.service.ActionCheckerService, org.apache.oozie.service.RecoveryService, org.apache.oozie.service.PurgeService, org.apache.oozie.service.CoordinatorEngineService, org.apache.oozie.service.BundleEngineService, org.apache.oozie.service.DagEngineService, org.apache.oozie.service.CoordMaterializeTriggerService, org.apache.oozie.service.StatusTransitService, org.apache.oozie.service.PauseTransitService, org.apache.oozie.service.GroupsService, org.apache.oozie.service.ProxyUserService, org.apache.oozie.service.XLogStreamingService, org.apache.oozie.service.JvmPauseMonitorService
-
Add the
oozie.services.coord.check.maximum.frequency
property with the following property value:false
If you set this property to true, Oozie rejects any coordinators with a frequency faster than 5 minutes. It is not recommended to disable this check or submit coordinators with frequencies faster than 5 minutes: doing so can cause unintended behavior and additional system stress.
-
Add the
oozie.service.AuthorizationService.security.enabled
false
Specifies whether security (user name/admin role) is enabled or not. If disabled any user can manage Oozie system and manage any job. -
Add the
oozie.service.HadoopAccessorService.kerberos.enabled
false
Indicates if Oozie is configured to use Kerberos.
-
Add the
oozie.authentication.simple.anonymous.allowed
true
Indicates if anonymous requests are allowed. This setting is meaningful only when using 'simple' authentication. -
In
oozie.services.ext
, append to the existing property value the following string, if is it is not already present:org.apache.oozie.service.PartitionDependencyManagerService,org.apache.oozie.service.HCatAccessorService
-
After modifying all properties on the Oozie Configs page, choose
Save
to updateoozie.site.xml
, using the updated configurations.
-
-
Replace the content of
/usr/oozie/share
in HDFS. On the Oozie server host:-
Extract the Oozie sharelib into a
tmp
folder.mkdir -p /tmp/oozie_tmp; cp /usr/hdp/2.2.x.x-<$version>/oozie/oozie-sharelib.tar.gz /tmp/oozie_tmp; cd /tmp/oozie_tmp; tar xzvf oozie-sharelib.tar.gz;
-
Back up the
/user/oozie/share
folder in HDFS and then delete it. If you have any custom files in this folder, back them up separately and then add them to the/share
folder after updating it.mkdir /tmp/oozie_tmp/oozie_share_backup; chmod 777 /tmp/oozie_tmp/oozie_share_backup;
su -l <HDFS_USER> -c "hdfs dfs -copyToLocal /user/oozie/share /tmp/oozie_tmp/oozie_share_backup"; su -l <HDFS_USER> -c "hdfs dfs -rm -r /user/oozie/share";
where<HDFS_USER>
is the HDFS service user. For example, hdfs. -
Add the latest share libs that you extracted in step 1. After you have added the files, modify ownership and acl.
su -l <HDFS_USER> -c "hdfs dfs -copyFromLocal /tmp/oozie_tmp/share /user/oozie/."; su -l <HDFS_USER> -c "hdfs dfs -chown -R <OOZIE_USER>:<HADOOP_GROUP> /user/oozie"; su -l <HDFS_USER> -c "hdfs dfs -chmod -R 755 /user/oozie";
where <HDFS_USER> is the HDFS service user. For example, hdfs.
-
-
Add the Falcon Service, using
Ambari Web > Services > Actions > +Add Service
. Without Falcon, Oozie will fail. -
Use the
Ambari Web UI
>Services
view to start the Oozie service. Make sure that ServiceCheck passes for Oozie.
-
-
-
Update WebHCat.
-
Expand
Advanced
>webhcat-site.xml
.Check if
templeton.hive.properties
is set correctly. -
On each WebHCat host, update the Pig and Hive tar bundles, by updating the following files:
-
/apps/webhcat/pig.tar.gz
-
/apps/webhcat/hive.tar.gz
- For example, to update a *.tar.gz file:
-
Move the file to a local directory.
su -l <HCAT_USER> -c "hadoop --config /etc/hadoop/conf fs -copyToLocal /apps/webhcat/*.tar.gz <local_backup_dir>"
-
Remove the old file.
su -l <HCAT_USER> -c "hadoop --config /etc/hadoop/conf fs -rm /apps/webhcat/*.tar.gz"
-
Copy the new file.
su -l <HCAT_USER> -c "hdfs --config /etc/hadoop/conf dfs -copyFromLocal /usr/hdp/2.2.x.x-<$version>/hive/hive.tar.gz /apps/webhcat/"; su -l <HCAT_USER> -c "hdfs --config /etc/hadoop/conf dfs -copyFromLocal /usr/hdp/2.2.x.x-<$version>/pig/pig.tar.gz /apps/webhcat/";
where <HCAT_USER> is the HCatalog service user. For example, hcat.
-
-
On each WebHCat host, update
/app/webhcat/hadoop-streaming.jar
file.-
Move the file to a local directory.
su -l <HCAT_USER> -c "hadoop --config /etc/hadoop/conf fs -copyToLocal /apps/webhcat/hadoop-streaming*.jar <local_backup_dir>"
-
Remove the old file.
su -l <HCAT_USER> -c "hadoop --config /etc/hadoop/conf fs -rm /apps/webhcat/hadoop-streaming*.jar"
-
Copy the new hadoop-streaming.jar file.
su -l <HCAT_USER> -c "hdfs --config /etc/hadoop/conf dfs -copyFromLocal /usr/hdp/2.2.x.x-<$version>/hadoop-mapreduce/hadoop-streaming*.jar /apps/webhcat"
where <HCAT_USER> is the HCatalog service user. For example, hcat.
-
-
-
Prepare Tez for work. Add the Tez service to your cluster using the Ambari Web UI, if Tez was not installed earlier.
Configure Tez.
cd /var/lib/ambari-server/resources/scripts/; ./configs.sh set localhost <your-cluster-name> cluster-env "tez_tar_source" "/usr/hdp/current/tez-client/lib/tez.tar.gz"; ./configs.sh set localhost <your-cluster-name> cluster-env "tez_tar_destination_folder" "hdfs:///hdp/apps/{{ hdp_stack_version }}/tez/"
If you use Tez as the Hive execution engine, and if the variable hive.server2.enabled.doAs is set to true, you must create a scratch directory on the NameNode host for the username that will run the HiveServer2 service. For example, use the following commands:
sudo su -c "hdfs -makedir /tmp/hive- <username> " sudo su -c "hdfs -chmod 777 /tmp/hive- <username> "
where <username> is the name of the user that runs the HiveServer2 service. -
Using the
Ambari Web UI> Services > Hive
, start the Hive service. -
If you use Tez as the Hive execution engine, and if the variable
hive.server2.enabled.doAs
is set totrue
, you must create a scratch directory on the NameNode host for the username that will run the HiveServer2 service. For example, use the following commands:sudo su -c "hdfs -makedir /tmp/hive-<username>"
sudo su -c "hdfs -chmod 777 /tmp/hive-<username>"
where <username> is the name of the user that runs the HiveServer2 service.
-
Using
Ambari Web > Services
, re-start the remaining services. -
The upgrade is now fully functional but not yet finalized. Using the
finalize
command removes the previous version of the NameNode and DataNode storage directories.The upgrade must be finalized before another upgrade can be performed.
To finalize the upgrade, execute the following command once, on the primary NameNode host in your HDP cluster:
sudo su -l <HDFS_USER> -c "hdfs dfsadmin -finalizeUpgrade"
Automated HDP Stack Upgrade: HDP 2.2.0 to 2.2.4
Ambari 2.0 has the capability to perform an automated cluster upgrade for maintenance
and patch releases for the Stack. This capability is available for HDP 2.2 Stack only.
If you have a cluster running HDP 2.2, you can perform Stack upgrades to later maintenance
and patch releases. For example: you can upgrade from the GA release of HDP 2.2 (which
is HDP 2.2.0.0) to the first maintenance release of HDP 2.2 (which is HDP 2.2.4.2).
This section describes the steps to perform an upgrade from HDP 2.2.0 to HDP 2.2.4.
Prerequisites
To perform an automated cluster upgrade from Ambari, your cluster must meet the following prerequisites:
Item |
Requirement |
Description |
---|---|---|
Cluster |
Stack Version |
Must be running HDP 2.2 Stack. This capability is not available for HDP 2.0 or 2.1 Stacks. |
Version |
Target Version |
All hosts must have the target version installed. See the Register Version and Install Version sections for more information. |
HDFS |
NameNode HA |
NameNode HA must be enabled and working properly. See the Ambari User’s Guide for more information Configuring NameNode High Availability. |
HDFS |
Decommission |
No components should be in decommissioning or decommissioned state. |
YARN |
YARN WPR |
Work Preserving Restart must be configured. |
Hosts |
Heartbeats |
All Ambari Agents must be heartbeating to Ambari Server. Any hosts that are not heartbeating must be in Maintenance Mode. |
Hosts |
Maintenance Mode |
Any hosts in Maintenance Mode must not be hosting any Service master components. |
Services |
Services Up |
All Services must be started. |
Services |
Maintenance Mode |
No Services can be in Maintenance Mode. |
If you do not meet the upgrade prerequisite requirements listed above, you can consider a Manual Upgrade of the cluster.
Preparing to Upgrade
Registering a New Version
- Register the HDP 2.2.4.2 Version
-
Log in to Ambari.
-
Browse to
Admin > Stack and Versions
. -
Click on the
Versions
tab. ClickManage Versions
. -
Proceed to register a new version by clicking
+ Register Version
. -
Enter a two-digit version number. For example, enter 4.2 (which makes the version name HDP-2.2.4.2).
-
Select one or more OS families and enter the respective Base URLs.
-
Click
Save
. -
You can click “Install On...MyCluster”, or you can browse back to
Admin > Stack and Versions
. You will see the version current running (HDP 2.2.0.0) and the version you just registered (HDP 2.2.4.2). Proceed to Install a New Version on All Hosts.
Installing a New Version on All Hosts
- Install HDP 2.2.4.2 on All Hosts
-
Log in to Ambari.
-
Browse to
Admin > Stack and Versions
. -
Click on the
Versions
tab. -
Click
Install Packages
and click OK to confirm. -
The Install version operation will start and the new version will be installed on all hosts.
-
You can browse to Hosts and to each host > Versions tab to see the new version is installed. Proceed to Perform Upgrade.
Performing an Upgrade
- Perform the Upgrade to HDP 2.2.4.2
-
Log in to Ambari.
-
Browse to
Admin > Stack and Versions
. -
Click on the
Versions
tab. -
Click
Perform Upgrade
.
Manual HDP Stack Upgrade: HDP 2.2.0 to 2.2.4
The following sections describe the steps involved with performing a manual Stack upgrade:
Registering a New Version
- Register the HDP 2.2.4.2 Version
-
Log in to Ambari.
-
Browse to
Admin > Stack and Versions
. -
Click on the
Versions
tab. You will see the version current running HDP-2.2.0.0-2041. -
Click
Manage Versions
. -
Proceed to register a new version by clicking
+ Register Version
. -
Enter a two-digit version number. For example, enter 4.2(which makes the version name HDP-2.2.4.2).
-
Select one or more OS families and enter the repository Base URLs for that OS.
-
Click
Save
. -
Click
Go to Dashboard
and browse back toAdmin > Stack and Versions > Versions
. You will see the current running version HDP-2.2.0.0-2041 and the version you just registered HDP-2.2.4.2. Proceed to Install a New Version on All Hosts.
Installing a New Version on All Hosts
- Install HDP 2.2.4.2 on All Hosts
-
Log in to Ambari.
-
Browse to
Admin > Stack and Versions
. -
Click on the
Versions
tab. -
Click
Install Packages
and click OK to confirm. -
The Install version operation will start and the new version will be installed on all hosts.
-
You can browse to Hosts and to each host > Versions tab to see the new version is installed. Proceed to Perform Manual Upgrade.
Performing a Manual Upgrade
- Perform the Manual Upgrade to HDP 2.2.4.2
-
Log in to Ambari.
-
Browse to
Admin > Stack and Versions
. -
Click on the
Versions
tab. -
Under the newly registered and installed version HDP-2.2.4.2, is the actual software repository version in parenthesis (Ambari determined this repository version during the install). For example, in the picture below the display name is HDP-2.2.4.2 and the repository version 2.2.4.2-2. Record this repository version. You will use it later in the manual upgrade process.
-
Stop all services from Ambari. On the Services tab, in the Service navigation area Actions button, select
Stop All
to stop all services. -
Go to the command line on each host and move the current HDP version to the newly installed version using the hdp-select utility and repository version number (obtained in Step 4).
hdp-select set all {repository-version}
For example:
hdp-select set all 2.2.4.2-2
-
Restart all services from Ambari. One by one, browse to each Service in Ambari Web, and in the Service Actions menu select
Restart All
. Donot
select Start All. You must use Restart All. For example, browse toAmbari Web > Services > HDFS
and selectRestart All
. -
During a manual upgrade, it is necessary for all components to advertise the version that they are on. This is typically done by Restarting an entire Service. However, client-only services (e.g., Pig, Tez and Slider) do not have a Restart command. Instead, they need an API call that will trigger the same behavior. For each of services installed that are client-only issue an Ambari REST API call that will cause the hosts running these clients to advertise their version. Perform this REST API call for each client-only service configured in your cluster:
curl -X POST -u username:password -H 'X-Requested-By:ambari' http://ambari.server:8080/api/v1/clusters/MyCluster/requests ‘{ "RequestInfo": { "command":"RESTART", "context":"Restart all components for TEZ_CLIENT", "operation_level": { "level":"SERVICE", "cluster_name":"MyCluster", "service_name":"TEZ" } }, "Requests/resource_filters": [{ "service_name":"TEZ", "component_name":"TEZ_CLIENT", "hosts":"c6401.ambari.apache.org,c6402.apache.ambari.org"}] }’
Replace the Ambari Server username + password, Ambari Server hostname, your cluster name, service name + component name (see the following table), and the list of hosts in your cluster that are running the client.
Service
service_name
component_name
Tez
TEZ
TEZ_CLIENT
Pig
PIG
PIG
Slider
SLIDER
SLIDER
Sqoop
SQOOP
SQOOP
-
After all the services are confirmed to be started and healthy, go to the command line on the Ambari Server and run the following to finalize the upgrade, which will move the current version to the new version.
ambari-server set-current --cluster-name=MyCluster --version-display-name=HDP-2.2.4.2 Ambari Admin login: admin Ambari Admin password: *****
-
If the
ambari-server set-current
command is not successful, try restarting the Ambari Server and waiting for all agents to re-register before trying again.
Administering Ambari
Apache Ambari is a system to help you provision, manage and monitor Hadoop clusters.
This guide is intended for Cluster Operators and System Administrators responsible
for installing and maintaining Ambari and the Hadoop clusters managed by Ambari. Installing
Ambari creates a default user with "Admin Admin" privilege, with the following username/password:
admin/admin
.
When you sign into Ambari as Ambari Admin, you can:
For specific information about provisioning an HDP cluster, see Install, Configure, and Deploy an HDP Cluster.
Terms and Definitions
The following basic terms help describe the key concepts associated with Ambari Administration.
Term |
Definition |
---|---|
Ambari Admin |
Specific privilege granted to a user that enables the user to administer Ambari. The
default user |
Account |
User name, password and privileges. |
Cluster |
Installation of a Hadoop cluster, based on a particular Stack, that is managed by Ambari. |
Group |
Unique group of users in Ambari. |
Group Type |
Local and LDAP. Local groups are maintained in the Ambari database. LDAP groups are imported (and synchronized) with an external LDAP (if configured). |
Permissions |
Represents the permission that can be granted to a principal (user or group) on a particular resource. For example, cluster resources support Operator and Read-Only permissions. |
Principal |
User or group that can be authenticated by Ambari. |
Privilege |
Represents the mapping of a principal to a permission and a resource.
For example: the user |
Resource |
Represents the resource available and managed in Ambari. Ambari supports two types of resources: cluster and view. An Ambari Admin assigns permissions for a resource for users and groups. |
User |
Unique user in Ambari. |
User Type |
Local and LDAP. Local users are maintained in the Ambari database and authentication is performed against the Ambari database. LDAP users are imported (and synchronized) with an external LDAP (if configured). |
Version |
Represents a Stack version, which includes a set of repositories to install that version on a cluster. For more information about Stack versions, see Managing Stack and Versions. |
View |
Defines a user interface component that is available to Ambari. |
Logging in to Ambari
After installing Ambari, you can log in to Ambari as follows:
-
Enter the following URL in a web browser:
http://<your.ambari.server>:8080
where<your.ambari.server>
is the hostname for your Ambari server machine and8080
is the default HTTP port. -
Enter the user account credentials for the default administrative user automatically created during install:
username/password = admin/admin
-
The Ambari Administration web page displays. From this page you can Manage Users and Groups, Manage Views, Manage Stack and Versions, and Create a Cluster.
About the Ambari Administration Interface
When you log in to the Ambari Administration interface with "Ambari Admin" privilege, a landing page displays links to the operations available. Plus, the operations are available from the left menu for clusters, views, users, and groups.

-
Clusters displays a link to a cluster (if created) and links to manage access permissions for that cluster. See Creating and Managing a Cluster for more information.
-
User and Group Management provides the ability create and edit users and groups. See Managing Users and Groups for more information.
-
Views lets you to create and edit instances of deployed Views and manage access permissions for those instances. See Managing Views for more information.
-
Versions provides the ability to manage the Stack versions that are available for the clusters. See Managing Stack and Versions for more information.
Changing the Administrator Account Password
During install and setup, the Cluster Installer wizard automatically creates a default
user with "Ambari Admin" privilege. You can change the password for this user (or
other Local users in the system) from the Ambari Administration interface. You can
change the password for the default admin
user to create a unique administrator credential for your system.
To change the password for the default admin
account:
-
Browse to the Users section.
-
Select the
admin
user. -
Click the Change Password button.
-
Enter the current
admin
password and the new password twice. -
Click OK to save the new password.
Ambari Admin Tasks
An "Ambari Admin" has administrator (or super-user) privilege. When logged into Ambari with the "Ambari Admin" privilege, you can:
For more information about creating Ambari users locally and importing Ambari LDAP users, see Managing Users and Groups.
Creating a Cluster
As an Ambari Admin, you can launch the Cluster Install Wizard and create a cluster. To create a cluster, from the Ambari Administration interface:
-
Click
Install Cluster
. The Cluster Install Wizard displays. -
Follow the steps in the wizard to install your cluster.
For more information about prerequisites and system requirements, see Installing HDP using Ambari.
Setting Cluster Permissions
After you create a cluster, users with Admin Admin privileges automatically get Operator
permission on the cluster. By default, no users have access to the cluster. You can
grant permissions on the cluster to other users and groups from the Ambari Administration
interface.
Ambari manages the following permissions for a cluster: Operator
and Read-Only
. Users and Groups with Operator
permission are granted access to the cluster. Operator permission provides full control
of the following services:
-
Start
-
Stop
-
Restart
-
Add New
And The Following Configurations:
-
Modify
-
Revert
Users and Groups with Read-Only
permission can only view, not modify, services and configurations.
Users with Ambari Admin privileges are implicitly granted Operator
permission. Plus, Ambari Admin users have access to the Ambari Administration interface
which allows them to control permissions for the cluster.
To modify user and group permissions for a cluster:
-
As an Ambari Admin, access the Ambari Administration interface.
-
Click Permissions, displayed under the cluster name.
-
The form showing the permissions
Operator
andRead-Only
with users and groups is displayed. -
Modify the users and groups mapped to each permission and save.
For more information about managing users and groups, see Managing Users and Groups.
Viewing the Cluster Dashboard
After you have created a cluster, select Clusters > Go to Dashboard
to open the Dashboard view. For more information about using Ambari to monitor and
manage your cluster, see Monitoring and Managing your HDP Cluster with Ambari.
Renaming a Cluster
A user with Admin Admin privileges can rename a cluster, using the Ambari Administration
interface.
To rename a cluster:
-
In Clusters, click the Rename Cluster icon, next to the cluster name.
The cluster name becomes write-able.
-
Enter alphanumeric characters as a cluster name.
-
Click the check mark.
-
Confirm.
Managing Users and Groups
An "Ambari Admin" can create and manage users and groups available to Ambari. An Ambari Admin can also import user and group information into Ambari from external LDAP systems. This section describes the specific tasks you perform when managing users and groups in Ambari.
Users and Groups Overview
Ambari supports two types of users and groups: Local and LDAP. The following topics describe how Ambari Administration supports managing Local and LDAP users and groups.
Local and LDAP User and Group Types
Local users are stored in and authenticate against the Ambari database. LDAP users
have basic account information stored in the Ambari database. Unlike Local users,
LDAP users authenticate against an external LDAP system.
Local groups are stored in the Ambari database. LDAP groups have basic information
stored in the Ambari database, including group membership information. Unlike Local
groups, LDAP groups are imported and synchronized from an external LDAP system.
To use LDAP users and groups with Ambari, you must configure Ambari to authenticate
against an external LDAP system. For more information about running ambari-server
setup-ldap, see Configure Ambari to use LDAP Server. A new Ambari user or group, created either locally or by synchronizing against LDAP,
is granted no privileges by default. You, as an Ambari Admin, must explicitly grant
each user permissions to access clusters or views.
Ambari Admin Privileges
As an Ambari Admin, you can create new users, delete users, change user passwords and edit user settings. You can control certain privileges for Local and LDAP users. The following table lists the privileges available and those not available to the Ambari Admin for Local and LDAP Ambari users.
Ambari Administrator Privileges for Ambari Local and LDAP Users
Administrator User Privilege |
Local User |
LDAP User |
---|---|---|
Change Password |
Available |
Not Available |
Set Ambari Admin Flag |
Available |
Available |
Change Group Membership |
Available |
Not Available |
Delete User |
Available |
Not Available |
Set Active / Inactive |
Available |
Available |
Creating a Local User
To create a local user:
-
Browse to Users.
-
Click Create Local User.
-
Enter a unique user name.
-
Enter a password, then confirm that password.
-
Click Save.
Setting User Status
User status indicates whether the user is active and should be allowed to log into Ambari or should be inactive and denied the ability to log in. By setting the Status flag as Active or Inactive, you can effectively "disable" user account access to Ambari while preserving the user account information related to permissions.
To set user Status:
-
On the Ambari Administration interface, browse to Users.
-
Click the user name of the user to modify.
-
Click the Status control to toggle between Active or Inactive.
-
Choose OK to confirm the change. The change is saved immediately.
Setting the Ambari Admin Flag
You can elevate one or more users to have Ambari administrative privileges, by setting
the Ambari Admin flag. You must be logged in as an account that is an Ambari Admin
to set or remove the Ambari Admin flag.
To set the Ambari Admin Flag:
-
Browse to the Users section.
-
Click the user name you wish to modify.
-
Click on the Ambari Admin control.
-
Switch Yes to set, or No to remove the Admin flag.
Changing the Password for a Local User
An Ambari Administrator can change local user passwords. LDAP passwords are not managed
by Ambari since LDAP users authenticate to external LDAP. Therefore, LDAP user passwords
cannot be changed from Ambari.
To change the password for a local user:
-
Browse to the user.
-
Click
Change password
. -
Enter YOUR administrator password to confirm that you have privileges required to change a local user password.
-
Enter a password, then confirm that password.
-
Click Save.
Deleting a Local User
Deleting a local user removes the user account from the system, including all privileges associated with the user. You can reuse the name of a local user that has been deleted. To delete a local user:
-
Browse to the User.
-
Click
Delete User
. -
Confirm.
Creating a Local Group
To create a local group:
-
Browse to Groups.
-
Click Create Local Group.
-
Enter a unique group name.
-
Click Save.
Managing Group Membership
You can manage group membership of Local groups by adding or removing users from groups.
Adding a User to a Group
To add a user to group:
-
Browse to Groups.
-
Click a name in the Group Name list.
-
Choose the
Local Members
control to edit the member list. -
In the empty space, type the first character in an existing user name.
-
From the list of available user names, choose a user name.
-
Click the check mark to save the current, displayed members as group members.
Modifying Group Membership
To modify Local group membership:
-
In the Ambari Administration interface, browse to Groups.
-
Click the name of the Group to modify.
-
Choose the
Local Members
control to edit the member list. -
Click in the Local Members text area to modify the current membership.
-
Click the
X
to remove a user. -
To save your changes, click the checkmark. To discard your changes, click the
x
.
Deleting a Local Group
Deleting a local group removes all privileges associated with the group. To delete a local group:
-
Browse to the Group.
-
Click
Delete Group
. -
Confirm. The group is deleted and the associated group membership information is removed.
Managing Views
The Ambari Views Framework offers a systematic way to plug in UI capabilities to surface
custom visualization, management and monitoring features in Ambari Web. The development
and use of Views allows you to extend and customize Ambari Web to meet your specific
needs.
A View extends Ambari to let third parties plug in new resource types along with APIs,
providers, and UIs to support them. A View is deployed into the Ambari Server and
Ambari Admins can create View instances and set the privileges on access to users
and groups.
The following sections cover the basics of Views and how to deploy and manage View
instances in Ambari:
Terminology
The following are Views terms and concepts you should be familiar with:
Term |
Description |
---|---|
Views Framework |
The core framework that is used to develop a View. This is very similar to a Java Web App. |
View Definition |
Describes the View resources and core View properties such as name, version and any necessary configuration properties. On deployment, the View definition is read by Ambari. |
View Package |
Packages the View client and server assets (and dependencies) into a bundle that is ready to deploy into Ambari. |
View Deployment |
Deploying a View into Ambari. This makes the View available to Ambari Admins for creating instances. |
View Name |
Unique identifier for a View. A View can have one or more versions of a View. The name is defined in the View Definition (created by the View Developer) that is built into the View Package. |
View Version |
Specific version of a View. Multiple versions of a View (uniquely identified by View name) can be deployed into Ambari. |
View Instance |
Instantiation of a specific View version. Instances are created and configured by Ambari Admins and must have a unique View instance name. |
View Instance Name |
Unique identifier of a specific instance of View. |
Framework Services |
View context, instance data, configuration properties and events are available from the Views Framework. |
Basic Concepts
Views are basically Web applications that can be “plugged into” Ambari. Just like a typical web application, a View can include server-side resources and client-side assets. Server-side resources, which are written in Java, can integrate with external systems (such as cluster services) and expose REST end-points that are used by the view. Client-side assets, such as HTML/JavaScript/CSS, provide the UI for the view that is rendered in the Ambari Web interface.
Ambari Views FrameworkAmbari exposes the Views Framework as the basis for View development. The Framework provides the following:
-
Method for describing and packaging a View
-
Method for deploying a View
-
Framework services for a View to integrate with Ambari
-
Method for managing View versions, instances, and permissions
The Views Framework is separate from Views themselves. The Framework is a core feature
of Ambari and Views build on that Framework. Although Ambari does include some Views
out-of-the-box, the feature of Ambari is the Framework to enable the development,
deployment and creation of views.
The development and delivery of a View follows this process flow:
-
Develop the View (similar to how you would build a Web application)
-
Package the View (similar to a WAR)
-
Deploy the View into Ambari (using the Ambari Administration interface)
-
Create and configure instances of the View (performed by Ambari Admins)
Considering the above, it is important to understand the different personas involved. The following table describes the three personas:
Persona |
Description |
---|---|
View Developer |
Person who builds the front-end and back-end of a View and uses the Framework services available during development. The Developer created the View, resulting in a View Package that is delivered to an Ambari Admin. |
Ambari Admin |
Ambari user that has Ambari Admin privilege and uses the Views Management section of the Ambari Administration interface to create and managing instances of Views. Ambari Admin also deploys the View Packages delivered by the View Developer. |
View User |
Ambari user that has access to one or more Views in Ambari Web. Basically, this is the end user. |
After Views are developed, views are identified by unique a view name. Each View can have one or more View versions. Each View name + version combination is deployed as a single View package. Once a View package is deployed, the Ambari Admin can create View instances, where each instance is identified by a unique View instance name. The Ambari Admin can then set access permissions for each View instance.
Ambari Views Versions and InstancesDeploying a View
Deploying a View involves obtaining the View Package and making the View available
to the Ambari Server. Each View deployed has a unique name. Multiple versions of a
View can be deployed at the same time. You can configure multiple versions of a View
for your users, depending on their roles, and deploy these versions at the same time.
For more information about building Views, see the Apache Ambari Wiki page.
-
Obtain the View package. For example,
files-0.1.0.jar
. -
On the Ambari Server host, browse to the views directory.
cd /var/lib/ambari-server/resources/views
-
Copy the View package into place.
-
Restart Ambari Server.
ambari-server restart
-
The View is extracted, registered with Ambari, and displays in the Ambari Administration interface as available to create instances.
Creating View Instances
To create a View instance:
-
Browse to a View and expand.
-
Click the “Create Instance” button.
-
Provide the following information:
Item
Required
Description
View Version
Yes
Select the version of the View to instantiate.
Instance Name
Yes
Must be unique for a given View.
Display Label
Yes
Readable display name used for the View instance when shown in Ambari Web.
Description
Yes
Readable description used for the View instance when shown in Ambari Web.
Visible
No
Designates whether the View is visible or not visible to the end-user in Ambari web. Use this property to temporarily hide a view in Ambari Web from users.
Properties
Maybe
Depends on the View. If the View requires certain configuration properties, you are prompted to provide the required information.
Setting View Permissions
After a view instance has been created, an Ambari Admin can set which users and groups
can access the view by setting the Use permission. By default, after view instance
creation, no permissions are set on a view.
To set permissions on a view:
-
Browse to a view and expand. For example, browse to the Slider or Jobs view.
-
Click on the view instance you want to modify.
-
In the Permissions section, click the Users or Groups control.
-
Modify the user and group lists as appropriate.
-
Click the check mark to save changes.
Additional Information
To learn more about developing views and the views framework itself, refer to the following resources:
Resource |
Description |
Link |
---|---|---|
Views Wiki |
Learn about the Views Framework and Framework services available to views developers. |
https://cwiki.apache.org/confluence/display/AMBARI/Viewsche.org/confluence/display/AMBARI/Views |
Views API |
Covers the Views REST API and associated framework Java classes. |
https://github.com/apache/ambari/blob/trunk/ambari-views/docs/index.md |
Views Examples |
Code for example views that hover different areas of the framework and framework services. |
https://github.com/apache/ambari/tree/trunk/ambari-views/examples |
View Contributions |
Views that are being developed and contributed to the Ambari community.[4] |
Ambari Security Guide
Ambari and Hadoop have many advanced security options. This guide provides information on configuring Ambari and Hadoop for strong authentication with Kerberos, as well as other security options.
Configuring Ambari and Hadoop for Kerberos
This topic describes how to configure Kerberos for strong authentication for Hadoop users and hosts in an Ambari-managed cluster.
Kerberos Overview
Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop. Users need to be able to reliably “identify” themselves and then have that identity propagated throughout the Hadoop cluster. Once this is done, those users can access resources (such as files or directories) or interact with the cluster (like running MapReduce jobs). Besides users, Hadoop cluster resources themselves (such as Hosts and Services) need to authenticate with each other to avoid potential malicious systems or daemon’s “posing as” trusted components of the cluster to gain access to data.
Hadoop uses Kerberos as the basis for strong authentication and identity propagation for both user and services. Kerberos is a third party authentication mechanism, in which users and services rely on a third party - the Kerberos server - to authenticate each to the other. The Kerberos server itself is known as the Key Distribution Center, or KDC. At a high level, it has three parts:
-
A database of the users and services (known as principals) that it knows about and their respective Kerberos passwords
-
An Authentication Server (AS) which performs the initial authentication and issues a Ticket Granting Ticket (TGT)
-
A Ticket Granting Server (TGS) that issues subsequent service tickets based on the initial TGT
A user principal requests authentication from the AS. The AS returns a TGT that is encrypted using the user principal's Kerberos password, which is known only to the user principal and the AS. The user principal decrypts the TGT locally using its Kerberos password, and from that point forward, until the ticket expires, the user principal can use the TGT to get service tickets from the TGS. Service tickets are what allow a principal to access various services.
Because cluster resources (hosts or services) cannot provide a password each time to decrypt the TGT, they use a special file, called a keytab, which contains the resource principal's authentication credentials. The set of hosts, users, and services over which the Kerberos server has control is called a realm.
Terminology
Terminology
Term |
Description |
---|---|
Key Distribution Center, or KDC |
The trusted source for authentication in a Kerberos-enabled environment. |
Kerberos KDC Server |
The machine, or server, that serves as the Key Distribution Center (KDC). |
Kerberos Client |
Any machine in the cluster that authenticates against the KDC. |
Principal |
The unique name of a user or service that authenticates against the KDC. |
Keytab |
A file that includes one or more principals and their keys. |
Realm |
The Kerberos network that includes a KDC and a number of Clients. |
KDC Admin Account |
An administrative account used by Ambari to create principals and generate keytabs in the KDC. |
Hadoop and Kerberos Principals
Each service and sub-service in Hadoop must have its own principal. A principal name in a given realm consists of a primary name and an instance name, in this case the instance name is the FQDN of the host that runs that service. As services do not log in with a password to acquire their tickets, their principal's authentication credentials are stored in a keytab file, which is extracted from the Kerberos database and stored locally in a secured directory with the service principal on the service component host.

Principals and Keytabs
Principal and Keytab Naming Conventions
Asset |
Convention |
Example |
---|---|---|
Principals |
$service_component_name/$FQDN@EXAMPLE.COM |
nn/c6401.ambari.apache.org@EXAMPLE.COM |
Keytabs |
$service_component_abbreviation.service.keytab |
/etc/security/keytabs/nn.service.keytab |
Notice in the preceding example the primary name for each service principal. These primary names, such as nn or hive for example, represent the NameNode or Hive service, respectively. Each primary name has appended to it the instance name, the FQDN of the host on which it runs. This convention provides a unique principal name for services that run on multiple hosts, like DataNodes and NodeManagers. Adding the host name serves to distinguish, for example, a request from DataNode A from a request from DataNode B. This is important for the following reasons:
-
Compromised Kerberos credentials for one DataNode do not automatically lead to compromised Kerberos credentials for all DataNodes.
-
If multiple DataNodes have exactly the same principal and are simultaneously connecting to the NameNode, and if the Kerberos authenticator being sent happens to have same timestamps, then the authentication is rejected as a replay request.
Installing and Configuring the KDC
Ambari is able to configure Kerberos in the cluster to work with an existing MIT KDC, or existing Active Directory installation. This section describes the steps necessary to prepare for this integration.
Use an Exisiting MIT KDC
To use an existing MIT KDC for the cluster, you must prepare the following:
-
Ambari Server and cluster hosts have network access to both the KDC and KDC admin hosts.
-
KDC administrative credentials are on-hand.
Proceed with Enabling Kerberos Security in Ambari.
Use an Existing Active Directory Domain
To use an existing Active Directory domain for the cluster, you must prepare the following:
-
Ambari Server and cluster hosts have network access to, and be able to resolve the DNS names of, the Domain Controllers.
-
Active Directory secure LDAP (LDAPS) connectivity has been configured.
-
Active Directory User container for principals has been created and is on-hand. For example, "OU=Hadoop,OU=People,dc=apache,dc=org"
-
Active Directory administrative credentials with delegated control of “Create, delete, and manage user accounts” on the previously mentioned User container are on-hand.
Proceed with Enabling Kerberos Security in Ambari.
(Optional) Install a new MIT KDC
The following gives a very high level description of the KDC installation process. To get more information see specific Operating Systems documentation, such as RHEL documentation, CentOS documentation, or SLES documentation.
Install the KDC Server
-
Install a new version of the KDC server:
RHEL/CentOS/Oracle Linux 6
yum install krb5-server krb5-libs krb5-auth-dialog krb5-workstation
SLES 11
zypper install krb5 krb5-server krb5-client
Ubuntu 12
apt-get install krb5 krb5-server krb5-client
-
Using a text editor, open the KDC server configuration file, located by default here:
/etc/krb5.conf
-
Change the [realms] section of this file by replacing the default “kerberos.example.com” setting for the kdc and admin_server properties with the Fully Qualified Domain Name of the KDC server host. In the following example, “kerberos.example.com” has been replaced with “my.kdc.server”.
[realms] EXAMPLE.COM = { kdc = my.kdc.server admin_server = my.kdc.server }
Create the Kerberos Database
-
Use the utility kdb5_util to create the Kerberos database.
RHEL/CentOS/Oracle Linux 6
kdb5_util create -s
SLES 11
kdb5_util create -s
Ubuntu 12
kdb5_util create -s
Start the KDC
-
Start the KDC server and the KDC admin server.
RHEL/CentOS/Oracle Linux 6
/etc/rc.d/init.d/krb5kdc start /etc/rc.d/init.d/kadmin start
SLES 11
rckrb5kdc start rckadmind start
Ubuntu 12
rckrb5kdc start rckadmind start
Create a Kerberos Admin
Kerberos principals can be created either on the KDC machine itself or through the
network, using an “admin” principal. The following instructions assume you are using
the KDC machine and using the kadmin.local
command line administration utility. Using kadmin.local
on the KDC machine allows you to create principals without needing to create a separate
"admin" principal before you start.
-
Create a KDC admin.
RHEL/CentOS/Oracle Linux 6
kadmin.local -q "addprinc admin/admin"
SLES 11
kadmin.local -q "addprinc admin/admin"
Ubuntu 12
kadmin.local -q "addprinc admin/admin"
-
Confirm that this admin principal has permissions in the KDC ACL.
For example, on RHEL/CentOS, check the /var/kerberos/krb5kdc/kadm5.acl file has an entry like so to allow the */admin principal to administer the KDC for your specific realm. In this case, for the EXAMPLE.COM realm: */admin@EXAMPLE.COM *. When using a realm that is different than EXAMPLE.COM, ensure there is an entry for the realm you are using. If not present, principal creation will fail. After editing the kadm5.acl, you must restart the kadmind process.
Enabling Kerberos Security in Ambari
Ambari provides a wizard to help with enabling Kerberos in the cluster. This section provides information on preparing Ambari before running the wizard, and the steps to run the wizard.
Installing the JCE
Before enabling Kerberos in the cluster, you must deploy the Java Cryptography Extension (JCE) security policy files on the Ambari Server and on all hosts in the cluster. Depending on your choice of JDK and if your Ambari Server has Internet Access, Ambari has a few options and actions for you to pursue.
JCE Options and Actions
Scenario |
Action |
---|---|
If you have Internet Access and selected Oracle JDK 1.6 or Oracle JDK 1.7 during Ambari Server setup. |
Ambari automatically downloaded the JCE policy files (that match the JDK) and installed
the JCE onto the Ambari Server. |
If you have Internet Access and selected Custom JDK during Ambari Server setup. |
The JCE has not been downloaded or installed on the Ambari Server or the hosts in
the cluster. |
If you do not have Internet Access and selected Custom JDK during Ambari Server setup. |
The JCE has not been downloaded or installed on the Ambari Server or the hosts in
the cluster. |
If you have a previous Ambari install and upgraded to Ambari 2.0.0. |
The JCE has not been downloaded or installed on the Ambari Server or the hosts in
the cluster. |
Distribute and Install the JCE
-
On the Ambari Server, obtain the JCE policy file appropriate for the JDK version in your cluster.
-
For Oracle JDK 1.6:
http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download-429243.html
-
For Oracle JDK 1.7:
http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html
-
-
Save the policy file archive in a temporary location.
-
On Ambari Server and on each host in the cluster, add the unlimited security policy JCE jars to
$JAVA_HOME/jre/lib/security/.
For example, run the following to extract the policy jars into the JDK installed on your host:
unzip -o -j -q UnlimitedJCEPolicyJDK7.zip -d /usr/jdk64/jdk1.7.0_67/jre/lib/security/
-
Restart Ambari Server.
-
Proceed to Running the Security Wizard.
Running the Kerberos Wizard
The Kerberos Wizard prompts for information related to the KDC, the KDC Admin Account and the Service and Ambari principals. Once provided, Ambari will automatically create principals, generate keytabs and distribute keytabs to the hosts in the cluster. The services will be configured for Kerberos and the service components are restarted to authenticate against the KDC.
Launching the Kerberos Wizard
-
Be sure you've Installed and Configured your KDC and have prepared the JCE on each host in the cluster.
-
Log in to Ambari Web and Browse to
Admin > Kerberos
. -
Click “Enable Kerberos” to launch the wizard.
-
Select the type of KDC you are using and confirm you have met the prerequisites.
-
Provide information about the KDC and admin account.
-
Proceed with the install. (Optional) To manage your Kerberos client krb5.conf manually (and not have Ambari manage the krb5.conf), expand the Advanced krb5-conf section and uncheck the "Manage" option. (Optional) If you need to customize the attributes for the principals Ambari will create, see the Customizing the Attribute Template for more information.
-
Ambari will install Kerberos clients on the hosts and test access to the KDC by testing that Ambari can create a principal, generate a keytab and distribute that keytab.
-
Customize the Kerberos identities used by Hadoop and proceed to kerberize the cluster.
-
After principals have been created and keytabs have been generated and distributed, Ambari updates the cluster configurations, then starts and tests the Services in the cluster.
Customizing the Attribute Template
Depending on your KDC policies, you can customize the attributes that Ambari sets
when creating principals. On the Configure Kerberos step of the wizard, in the Advanced kerberos-env section, you have access to the Ambari Attribute Template. This template (which is
based on the Apache Velocity templating syntax) can be modified to adjust which attributes are set on the principals
and how those attribute values are derived.
The following table lists the set of computed attribute variables available if you
choose to modify the template:
Attribute Variables |
Example |
---|---|
$normalized_principal |
nn/c6401.ambari.apache.org@EXAMPLE.COM |
$principal_name |
nn/c6401.ambari.apache.org |
$principal_primary |
nn |
$principal_digest |
[[MD5 hash of the $normalized_principal]] |
$principal_instance |
c6401.ambari.apache.org |
$realm |
EXAMPLE.COM |
$password |
[[password]] |
Kerberos Client Packages
As part of the enabling Kerberos process, Ambari installs the Kerberos clients on the cluster hosts. Depending on your operating system, the following packages are installed:
Packages installed by Ambari for the Kerberos Client
Operating System |
Packages |
---|---|
RHEL/CentOS/Oracle Linux 5 + 6 |
krb5-workstation |
SLES 11 |
krb5-client |
Ubuntu 12 |
krb5-user, krb5-config |
Post-Kerberos Wizard User/Group Mapping
If you have chosen to use existing MIT or Active Directory Kerberos infrastructure
with your cluster, it is important to tell the cluster how to map usernames from those
existing systems to principals within the cluster. This is required to properly translate
username syntaxes from existing systems to Hadoop to ensure usernames can be mapped
successfully.
Hadoop uses a rule-based system to create mappings between service principals and
their related UNIX usernames. The rules are specified using the configuration property
hadoop.security.auth_to_local as part of core-site.
The default rule is simply named DEFAULT. It translates all principals in your default
domain to their first component. For example, myusername@EXAMPLE.COM and myusername/admin@EXAMPLE.COM
both become myusername, assuming your default domain is EXAMPLE.COM. In this case,
EXAMPLE.COM represents the Kerberos realm, or Active Directory Domain that is being
used.
Creating Auth-to-Local Rules
To accommodate more complex translations, you can create a hierarchical set of rules to add to the default. Each rule is divided into three parts: base, filter, and substitution.
-
The Base
The base begins with the number of components in the principal name (excluding the realm), followed by a colon, and the pattern for building the username from the sections of the principal name. In the pattern section $0 translates to the realm, $1 translates to the first component and $2 to the second component.
For example: [1:$1@$0] translates myusername@EXAMPLE.COM to myusername@EXAMPLE.COM [2:$1] translates myusername/admin@EXAMPLE.COM to myusername [2:$1%$2] translates myusername/admin@EXAMPLE.COM to “myusername%admin -
The Filter
The filter consists of a regular expression (regex) in a parentheses. It must match the generated string for the rule to apply.
For example: (.*%admin) matches any string that ends in %admin (.*@SOME.DOMAIN) matches any string that ends in @SOME.DOMAIN -
The Substitution
The substitution is a sed rule that translates a regex into a fixed string.
For example: s/@ACME\.COM// removes the first instance of @SOME.DOMAIN s/@[A-Z]*\.COM// removes the first instance of @ followed by a name followed by COM. s/X/Y/g replaces all of X's in the name with Y
Examples
-
If your default realm was EXAMPLE.COM, but you also wanted to take all principals from ACME.COM that had a single component joe@ACME.COM, the following rule would do this:
RULE:[1:$1@$0](.@ACME.COM)s/@.// DEFAULT
-
To translate names with a second component, you could use these rules:
RULE:[1:$1@$0](.@ACME.COM)s/@.// RULE:[2:$1@$0](.@ACME.COM)s/@.// DEFAULT
-
To treat all principals from EXAMPLE.COM with the extension /admin as admin, your rules would look like this:
RULE[2:$1%$2@$0](.%admin@EXAMPLE.COM)s/./admin/ DEFAULT
After your mapping rules have been configured and are in place, Hadoop uses those rules to map principals to UNIX users. By default, Hadoop uses the UNIX shell to resolve a user’s UID, GID, and list of associated groups for secure operation on every node in the cluster. This is because in a kerberized cluster, individual tasks run as the user who submitted the application. In this case, the user’s identity is propagated all they way down to local JVM processes to ensure tasks are run as the user who submitted them. For this reason, typical enterprise customers choose to use technologies such as PAM, SSSD, Centrify, or other solutions to integrate with a corporate directory. As Linux is commonly used in the enterprise, there is most likely an existing enterprise solution that has been adopted for your organization. The assumption going forward is that such a solution has been integrated successfully, so logging into each individual DataNode using SSH can be accomplished using LDAP credentials, and typing in id results in a UID, GID, and list of associated groups being returned.
Advanced Security Options for Ambari
This section describes several security options for an Ambari-monitored-and-managed Hadoop cluster.
Configuring Ambari for LDAP or Active Directory Authentication
By default Ambari uses an internal database as the user store for authentication and authorization. If you want to configure LDAP or Active Directory (AD) external authentication, you need to collect the following information and run a setup command.
Also, you must synchronize your LDAP users and groups into the Ambari DB to be able to manage authorization and permissions against those users and groups.
Setting Up LDAP User Authentication
The following table details the properties and values you need to know to set up LDAP authentication.
Ambari Server LDAP Properties
Property |
Values |
Description |
---|---|---|
authentication.ldap.primaryUrl |
server:port |
The hostname and port for the LDAP or AD server. Example: my.ldap.server:389 |
authentication.ldap.secondaryUrl |
server:port |
The hostname and port for the secondary LDAP or AD server. Example: my.secondary.ldap.server:389 This is an optional value. |
authentication.ldap.useSSL |
true or false |
If true, use SSL when connecting to the LDAP or AD server. |
authentication.ldap.usernameAttribute |
[LDAP attribute] |
The attribute for username. Example: uid |
authentication.ldap.baseDn |
[Distinguished Name] |
The root Distinguished Name to search in the directory for users. Example: ou=people,dc=hadoop,dc=apache,dc=org |
authentication.ldap.referral |
[Referral method] |
Determines if LDAP referrals should be followed, or ignored. |
authentication.ldap.bindAnonymously |
true or false |
If true, bind to the LDAP or AD server anonymously |
authentication.ldap.managerDn |
[Full Distinguished Name] |
If Bind anonymous is set to false, the Distinguished Name (“DN”) for the manager. Example: uid=hdfs,ou=people,dc=hadoop,dc=apache,dc=org |
authentication.ldap.managerPassword |
[password] |
If Bind anonymous is set to false, the password for the manager |
authentication.ldap.userObjectClass |
[LDAP Object Class] |
The object class that is used for users. Example: organizationalPerson |
authentication.ldap.groupObjectClass |
[LDAP Object Class] |
The object class that is used for groups. Example: groupOfUniqueNames |
authentication.ldap.groupMembershipAttr |
[LDAP attribute] |
The attribute for group membership. Example: uniqueMember |
authentication.ldap.groupNamingAttr |
[LDAP attribute] |
The attribute for group name. |
Configure Ambari to use LDAP Server
-
mkdir /etc/ambari-server/keys
where the keys directory does not exist, but should be created.
-
$JAVA_HOME/bin/keytool -import -trustcacerts -alias root -file $PATH_TO_YOUR_LDAPS_CERT -keystore /etc/ambari-server/keys/ldaps-keystore.jks
-
Set a password when prompted. You will use this during ambari-server setup-ldap.
ambari-server setup-ldap
-
At the
Primary URL*
prompt, enter the server URL and port you collected above. Prompts marked with an asterisk are required values. -
At the
Secondary URL*
prompt, enter the secondary server URL and port. This value is optional. -
At the
Use SSL*
prompt, enter your selection. If using LDAPS, entertrue
. -
At the
User object class*
prompt, enter the object class that is used for users. -
At the
User name attribute*
prompt, enter your selection. The default value isuid
. -
At the
Group object class*
prompt, enter the object class that is used for groups. -
At the
Group name attribute*
prompt, enter the attribute for group name. -
At the
Group member attribute*
prompt, enter the attribute for group membership. -
At the
Distinguished name attribute*
prompt, enter the attribute that is used for the distinguished name. -
At the
Base DN*
prompt, enter your selection. -
At the
Referral method*
prompt, enter tofollow
orignore
LDAP referrals. -
At the
Bind anonymously*
prompt, enter your selection. -
At the
Manager DN*
prompt, enter your selection if you have set bind.Anonymously to false. -
At the
Enter the Manager Password*
prompt, enter the password for your LDAP manager DN. -
If you set
Use SSL*
= true in step 3, the following prompt appears:Do you want to provide custom TrustStore for Ambari?
Consider the following options and respond as appropriate.
-
More secure option: If using a self-signed certificate that you do not want imported to the existing JDK keystore, enter
y
.For example, you want this certificate used only by Ambari, not by any other applications run by JDK on the same host.
If you choose this option, additional prompts appear. Respond to the additional prompts as follows:
-
At the
TrustStore type
prompt, enterjks
. -
At the
Path to TrustStore file
prompt, enter/keys/ldaps-keystore.jks
(or the actual path to your keystore file). -
At the
Password for TrustStore
prompt, enter the password that you defined for the keystore.
-
-
Less secure option: If using a self-signed certificate that you want to import and store in the existing, default JDK keystore, enter
n
.-
Convert the SSL certificate to X.509 format, if necessary, by executing the following command:
openssl x509 -in slapd.pem -out
<slapd.crt>Where <slapd.crt> is the path to the X.509 certificate.
-
Import the SSL certificate to the existing keystore, for example the default jre certificates storage, using the following instruction:
/usr/jdk64/jdk1.7.0_45/bin/keytool -import -trustcacerts -file slapd.crt -keystore /usr/jdk64/jdk1.7.0_45/jre/lib/security/cacerts
Where Ambari is set up to use JDK 1.7. Therefore, the certificate must be imported in the JDK 7 keystore.
-
-
-
Review your settings and if they are correct, select
y
. -
Start or restart the Server
ambari-server restart
The users you have just imported are initially granted the Ambari User privilege. Ambari Users can read metrics, view service status and configuration, and browse job information. For these new users to be able to start or stop services, modify configurations, and run smoke tests, they need to be Admins. To make this change, as an Ambari Admin, use
Manage Ambari > Users > Edit
. For instructions, see Managing Users and Groups.
Example Active Directory Configuration
Directory Server implementations use specific object classes and attributes for storing
identities. In this example, configurations specific to Active Directory are displayed
as an example. Only those properties that are specific to Active Directory are displayed.
Run ambari-server setup-ldap
and provide the following information about your Domain.
Prompt |
Example AD Values |
---|---|
User object class* (posixAccount) |
user |
User name attribute* (uid) |
cn |
Group object class* (posixGroup) |
group |
Group member attribute* (memberUid) |
member |
Synchronizing LDAP Users and Groups
Run the LDAP synchronize command and answer the prompts to initiate the sync:ambari-server sync-ldap [option]
The utility provides three options for synchronization:
-
Specific set of users and groups, or
-
Synchronize the existing users and groups in Ambari with LDAP, or
-
All users and groups
Review log files for failed synchronization attempts, at /var/log/ambari-server/ambari-server.log
on the Ambari Server host.
Specific Set of Users and Groups
ambari-server sync-ldap --users users.txt --groups groups.txt
Use this option to synchronize a specific set of users and groups from LDAP into Ambari. Provide the command a text file of comma-separated users and groups. The comma separated entries in each of these files should be based off of the values in LDAP of the attributes chosen during setup. The "User name attribute" should be used for the users.txt file, and the "Group name attribute" should be used for the groups.txt file. This command will find, import, and synchronize the matching LDAP entities with Ambari.
Existing Users and Groups
ambari-server sync-ldap --existing
After you have performed a synchronization of a specific set of users and groups, you use this option to synchronize only those entities that are in Ambari with LDAP.
Users will be removed from Ambari if they no longer exist in LDAP, and group membership
in Ambari will be updated to match LDAP.
All Users and Groups
ambari-server sync-ldap --all
This will import all entities with matching LDAP user and group object classes into
Ambari.
Optional: Encrypt Database and LDAP Passwords
By default the passwords to access the Ambari database and the LDAP server are stored in a plain text configuration file. To have those passwords encrypted, you need to run a special setup command.
Ambari Server should not be running when you do this: either make the edits before you start Ambari Server the first time or bring the server down to make the edits.
-
On the Ambari Server, run the special setup command and answer the prompts:
ambari-server setup-security
-
Select Option
2
: Choose one of the following options:-
[1] Enable HTTPS for Ambari server.
-
[2] Encrypt passwords stored in ambari.properties file.
-
[3] Setup Ambari kerberos JAAS configuration.
-
-
Provide a master key for encrypting the passwords. You are prompted to enter the key twice for accuracy.
If your passwords are encrypted, you need access to the master key to start Ambari Server.
-
You have three options for maintaining the master key:
-
Persist it to a file on the server by pressing
y
at the prompt. -
Create an environment variable AMBARI_SECURITY_MASTER_KEY and set it to the key.
-
Provide the key manually at the prompt on server start up.
-
-
Start or restart the Server
ambari-server restart
-
Reset Encryption
There may be situations in which you want to:
-
Change the current master key, either because the key has been forgotten or because you want to change the current key as a part of a security routine.
Ambari Server should not be running when you do this.
Remove Encryption Entirely
To reset Ambari database and LDAP passwords to a completely unencrypted state:
-
On the Ambari host, open
/etc/ambari-server/conf/ambari.properties
with a text editor and set this propertysecurity.passwords.encryption.enabled=false
-
Delete
/var/lib/ambari-server/keys/credentials.jceks
-
Delete
/var/lib/ambari-server/keys/master
-
You must now reset the database password and, if necessary, the LDAP password. Run ambari-server setup and ambari-server setup-ldap again.
Change the Current Master Key
To change the master key:
-
If you know the current master key or if the current master key has been persisted:
-
Re-run the encryption setup command and follow the prompts.
ambari-server setup-security
-
Select Option
2
: Choose one of the following options:-
[1] Enable HTTPS for Ambari server.
-
[2] Encrypt passwords stored in ambari.properties file.
-
[3] Setup Ambari kerberos JAAS configuration.
-
-
Enter the current master key when prompted if necessary (if it is not persisted or set as an environment variable).
-
At the
Do you want to reset Master Key
prompt, enteryes
. -
At the prompt, enter the new master key and confirm.
-
-
-
If you do not know the current master key:
Optional: Set Up SSL for Ambari
Set Up HTTPS for Ambari Server
If you want to limit access to the Ambari Server to HTTPS connections, you need to provide a certificate. While it is possible to use a self-signed certificate for initial trials, they are not suitable for production environments. After your certificate is in place, you must run a special setup command.
Ambari Server should not be running when you do this. Either make these changes before you start Ambari the first time, or bring the server down before running the setup command.
-
Log into the Ambari Server host.
-
Locate your certificate. If you want to create a temporary self-signed certificate, use this as an example:
openssl genrsa -out $wserver.key 2048 openssl req -new -key $wserver.key -out $wserver.csr openssl x509 -req -days 365 -in $wserver.csr -signkey $wserver.key -out $wserver.crt
Where
$wserver
is the Ambari Server host name.The certificate you use must be PEM-encoded, not DER-encoded. If you attempt to use a DER-encoded certificate, you see the following error:
unable to load certificate 140109766494024:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c :698:Expecting: TRUSTED CERTIFICATE
You can convert a DER-encoded certificate to a PEM-encoded certificate using the following command:
openssl x509 -in cert.crt -inform der -outform pem -out cert.pem
where
cert.crt
is the DER-encoded certificate andcert.pem
is the resulting PEM-encoded certificate. -
Run the special setup command and answer the prompts
ambari-server setup-security
-
Select
1
forEnable HTTPS for Ambari server
. -
Respond
y
toDo you want to configure HTTPS ?
-
Select the port you want to use for SSL. The default port number is 8443.
-
Provide the path to your certificate and your private key. For example, put your certificate and private key in
/etc/ambari-server/certs
with root as the owner or the non-root user you designated during Ambari Server setup for the ambari-server daemon. -
Provide the password for the private key.
-
Start or restart the Server
ambari-server restart
-
Optional: Set Up Kerberos for Ambari Server
When a cluster is enabled for Kerberos, the component REST endpoints (such as the
YARN ATS component) require SPNEGO authentication.
Depending on the Services in your cluster, Ambari Web needs access to these APIs.
As well, views such as the Jobs View and the Tez View need access to ATS. Therefore, the Ambari Server requires a Kerberos principal in
order to authenticate via SPNEGO against these APIs. This section describes how to
configure Ambari Server with a Kerberos principal and keytab to allow views to authenticate
via SPNEGO against cluster components.
-
Create a principal in your KDC for the Ambari Server. For example, using kadmin:
addprinc -randkey ambari-server@EXAMPLE.COM
-
Generate a keytab for that principal.
xst -k ambari.server.keytab ambari-server@EXAMPLE.COM
-
Place that keytab on the Ambari Server host.
/etc/security/keytabs/ambari.server.keytab
-
Stop the ambari server.
ambari-server stop
-
Run the setup-security command.
ambari-server setup-security
-
Select
3
for Setup Ambari kerberos JAAS configuration. -
Enter the Kerberos principal name for the Ambari Server you set up earlier.
-
Enter the path to the keytab for the Ambari principal.
-
Restart Ambari Server.
ambari-server restart
Optional: Set Up Two-Way SSL Between Ambari Server and Ambari Agents
Two-way SSL provides a way to encrypt communication between Ambari Server and Ambari Agents. By default Ambari ships with Two-way SSL disabled. To enable Two-way SSL:
Ambari Server should not be running when you do this: either make the edits before you start Ambari Server the first time or bring the server down to make the edits.
-
On the Ambari Server host, open
/etc/ambari-server/conf/ambari.properties
with a text editor. -
Add the following property:
security.server.two_way_ssl = true
-
Start or restart the Ambari Server.
ambari-server restart
The Agent certificates are downloaded automatically during Agent Registration.
Optional: Configure Ciphers and Protocols for Ambari Server
Ambari provides control of ciphers and protocols that are exposed via Ambari Server.
-
To disable specific ciphers, you can optionally add a list of the following format to ambari.properties. If you specify multiple ciphers, separate each cipher using a vertical bar |.
security.server.disabled.ciphers=TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA
-
To disable specific protocols, you can optionally add a list of the following format to ambari.properties. If you specify multiple protocols, separate each protocol using a vertical bar |.
security.server.disabled.protocols=SSL|SSLv2|SSLv3
Troubleshooting Ambari Deployments
Introduction: Troubleshooting Ambari Issues
The first step in troubleshooting any problem in an Ambari-deploying Hadoop cluster is Reviewing the Ambari Log Files.
Find a recommended solution to a troubleshooting problem in one of the following sections:
Reviewing Ambari Log Files
Find files that log activity on an Ambari host in the following locations:
-
Ambari Server logs
<your.Ambari.server.host>/var/log/ambari-server/ambari-server.log
-
Ambari Agent logs
<your.Ambari.agent.host>/var/log/ambari-agent/ambari-agent.log
-
Ambari Action logs
<your.Ambari.agent.host>/var/lib/ambari-agent/data/
This location contains logs for all tasks executed on an Ambari agent host. Each log name includes:
-
command-N.json - the command file corresponding to a specific task.
-
output-N.txt - the output from the command execution.
-
errors-N.txt - error messages.
-
Resolving Ambari Installer Problems
Try the recommended solution for each of the following problems:
Problem: Browser crashed before Install Wizard completes
Your browser crashes or you accidentally close your browser before the Install Wizard completes.
Solution
The response to a browser closure depends on where you are in the process:
-
The browser closes before you press the
Deploy
button.Re-launch the same browser and continue the install process. Using a different browser forces you to re-start the entire process.
-
The browser closes after you press
Deploy
, while or after theInstall, Start, and Test
screen opens.Re-launch the same browser and continue the process, or log in again, using a different browser. When the
Install, Start, and Test
displays, proceed.
Problem: Install Wizard reports that the cluster install has failed
The Install, Start, and Test screen reports that the cluster install has failed.
Solution
The response to a report of install failure depends on the cause of the failure:
-
The failure is due to intermittent network connection errors during software package installs.
Use the
Retry
button on theInstall, Start, and Test
screen. -
The failure is due to misconfiguration or other setup errors.
-
Use the left navigation bar to go back to the appropriate screen. For example,
Customize Services
. -
Make your changes.
-
Continue in the normal way.
-
-
The failure occurs during the start/test sequence.
-
Click
Next
andComplete,
then proceed to theMonitoring Dashboard
. -
Use the
Services View
to make your changes. -
Re-start the service using
Service Actions
.
-
-
The failure is due to something else.
-
Open an SSH connection to the Ambari Server host.
-
Clear the database. At the command line, type:
ambari-server reset
-
Clear your browser cache.
-
Re-run the Install Wizard.
-
Problem: Ambari Agents May Fail to Register with Ambari Server.
When deploying HDP using Ambari 1.4.x or later on RHEL CentOS 6.5, click the “Failed” link on the Confirm Hosts page in the Cluster Install wizard to display the Agent logs. The following log entry indicates the SSL connection between the Agent and Server failed during registration:
INFO 2014-04-02 04:25:22,669 NetUtil.py:55 - Failed to connect to https://{ambari-server}:8440/cert/ca
due to [Errno 1] _ssl.c:492: error:100AE081:elliptic curve routines:EC_GROUP_new_by_curve_name:unknown
group
For more detailed information about this OpenSSL issue, see https://bugzilla.redhat.com/show_bug.cgi?id=1025598
Solution:
In certain recent Linux distributions, such as RHEL/Centos/Oracle Linux 6.x, the default
value of nproc
is lower than the value required to deploy the HBase service successfully. If you
are deploying HBase, change the value of nproc
:
-
Check the OpenSSL library version installed on your host(s):
rpm -qa | grepopenssl openssl-1.0.1e-15.el6.x86_64
-
If the output reads
openssl-1.0.1e-15.x86_64 (1.0.1 build 15),
you must upgrade the OpenSSL library. To upgrade the OpenSSL library, run the following command:yum upgrade openssl
-
Verify you have the newer version of OpenSSL (1.0.1 build 16):
rpm -qa | grep opensslopenssl-1.0.1e-16.el6.x86_64
-
Restart Ambari Agent(s) and click
Retry -> Failed
in the wizard user interface.
Problem: The “yum install ambari-server” Command Fails
You are unable to get the initial install command to run.
Solution:
You may have incompatible versions of some software components in your environment. See Meet Minimum System Requirements in Installing HDP Using Ambari for more information, then make any necessary changes.
Problem: HDFS Smoke Test Fails
If your DataNodes are incorrectly configured, the smoke tests fail and you get this error message in the DataNode logs:
DisallowedDataNodeException
org.apache.hadoop.hdfs.server.protocol.
DisallowedDatanodeException
Solution:
-
Make sure that reverse DNS look-up is properly configured for all nodes in your cluster.
-
Make sure you have the correct FQDNs when specifying the hosts for your cluster. Do not use IP addresses - they are not supported.
-
Restart the installation process.
Problem: yum Fails on Free Disk Space Check
If you boot your Hadoop DataNodes with/as a ramdisk, you must disable the free space check for yum before doing the install. If you do not disable the free space check, yum will fail with the following error:
Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install unzip' returned 1. Error Downloading
Packages: unzip-6.0-1.el6.x86_64: Insufficient space in download directory /var/cache/yum/x86_64/6/base/packages
* free 0
* needed 149 k
Solution:
To disable free space check, update the DataNode image with a directive in /etc/yum.conf
:
diskspacecheck=0
Problem: A service with a customized service user is not appearing properly in Ambari Web
You are unable to monitor or manage a service in Ambari Web when you have created
a customized service user name with a hyphen, for example, hdfs-user
.
Solution
Hyphenated service user names are not supported. You must re-run the Ambari Install Wizard and create a different name.
Resolving Cluster Deployment Problems
Try the recommended solution for each of the following problems:.
Problem: Trouble Starting Ambari on System Reboot
If you reboot your cluster, you must restart the Ambari Server and all the Ambari Agents manually.
Solution:
Log in to each machine in your cluster separately:
-
On the Ambari Server host machine:
ambari-server start
-
On each host in your cluster:
ambari-agent start
Problem: Metrics and Host information display incorrectly in Ambari Web
Charts appear incorrectly or not at all despite Host health status is displayed incorrectly.
Solution:
All the hosts in your cluster and the machine from which you browse to Ambari Web must be in sync with each other. The easiest way to assure this is to enable NTP.
Problem: On SUSE 11 Ambari Agent crashes within the first 24 hours
SUSE 11 ships with Python version 2.6.0-8.12.2 which contains a known defect that causes this crash.
Solution:
Upgrade to Python version 2.6.8-0.15.1.
Problem: Attempting to Start HBase REST server causes either REST server or Ambari Web to fail
As an option you can start the HBase REST server manually after the install process is complete. It can be started on any host that has the HBase Master or the Region Server installed. If you install the REST server on the same host as the Ambari server, the http ports will conflict.
Solution
In starting the REST server, use the -p option to set a custom port.
Use the following command to start the REST server.
/usr/lib/hbase/bin/hbase-daemon.sh start rest -p <custom_port_number>
Problem: Multiple Ambari Agent processes are running, causing re-register
On a cluster host ps aux | grep ambari-agent
shows more than one agent process running. This causes Ambari Server to get incorrect
ids from the host and forces Agent to restart and re-register.
Solution
On the affected host, kill the processes and restart.
-
Kill the Agent processes and remove the Agent PID files found here:
/var/run/ambari-agent/ambari-agent.pid
. -
Restart the Agent process:
ambari-agent start
Problem: Some graphs do not show a complete hour of data until the cluster has been running for an hour
When you start a cluster for the first time, some graphs, such as Services View > HDFS
and Services View > MapReduce
, do not plot a complete hour of data. Instead, they show data only for the length
of time the service has been running. Other graphs display the run of a complete hour.
Solution
Let the cluster run. After an hour all graphs will show a complete hour of data.
Problem: Ambari stops MySQL database during deployment, causing Ambari Server to crash.
The Hive Service uses MySQL Server by default. If you choose MySQL server as the database on the Ambari Server host as the managed server for Hive, Ambari stops this database during deployment and crashes.
Solution
If you plan to use the default MySQL Server setup for Hive and use MySQL Server for Ambari - make sure that the two MySQL Server instances are different.
If you plan to use the same MySQL Server for Hive and Ambari - make sure to choose the existing database option for Hive.
Problem: Cluster Install Fails with Groupmod Error
The cluster fails to install with an error related to running groupmod
. This can occur in environments where groups are managed in LDAP, and not on local
Linux machines.
You may see an error message similar to the following one:
Fail: Execution of 'groupmod hadoop' returned 10. groupmod: group 'hadoop' does not
exist in /etc/group
Solution
When installing the cluster using the Cluster Installer Wizard, at the Customize Services
step, select the Misc
tab and choose the Skip group modifications during install
option.
Problem: Host registration fails during Agent bootstrap on SLES due to timeout.
When using SLES and performing host registration using SSH, the Agent bootstrap may
fail due to timeout when running the setupAgent.py
script. The host on which the timeout occurs will show the following process hanging:
c6401.ambari.apache.org:/etc/ # ps -ef | grep zypper
root 18318 18317 5 03:15 pts/1 00:00:00 zypper -q search -s --match-exact
ambari-agent
Solution
-
If you have a repository registered that is prompting to accept keys, via user interaction, you may see the hang and timeout. In this case, run
zypper refresh
and confirm all repository keys are accepted for the zypper command to work without user interaction. -
Another alternative is to perform manual Agent setup and not use SSH for host registration. This option does not require that Ambari call zypper without user interaction.
Problem: Host Check Fails if Transparent Huge Pages (THP) is not disabled.
When installing Ambari on RHEL/CentOS 6 using the Cluster Installer Wizard at the Host Checks step, one or more host checks may fail if you have not disabled Transparent Huge Pages on all hosts.
Host Checks will warn you when a failure occurs.
Solution
Disable THP. On all hosts,
-
Add the following command to your
/etc/rc.local
file:if test -f /sys/kernel/mm/transparent_hugepage/enabled; then echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled fi if test -f /sys/kernel/mm/transparent_hugepage/defrag; then echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag fi
-
To confirm, reboot the host then run the following command:
$ cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never]
Resolving General Problems
During Enable Kerberos, the Check Kerberos operation fails.
When enabling Kerberos using the wizard, the Check Kerberos operation fails. In /var/log/ambari-server/ambari-server.log, you see a message: 02:45:44,490 WARN [qtp567239306-238] MITKerberosOperationHandler:384 - Failed to execute kadmin:
Solution 1:
Check that NTP is running and confirm your hosts and the KDC times are in sync. A time skew as little as 5 minutes can cause Kerberos authentication to fail.
Solution 2: (on RHEL/CentOS/Oracle Linux)
Check that the Kerberos Admin principal being used has the necessary KDC ACL rights
as set in /var/kerberos/krb5kdc/kadm5.acl
.
Problem: Hive developers may encounter an exception error message during Hive Service Check
MySQL is the default database used by the Hive metastore. Depending on several factors, such as the version and configuration of MySQL, a Hive developer may see an exception message similar to the following one:
An exception was thrown while adding/validating classes) : Specified key was too long;
max key length is 767 bytes
Solution
Administrators can resolve this issue by altering the Hive metastore database to use
the Latin1 character set, as shown in the following example:
mysql> ALTER DATABASE
<metastore.database.name> character set latin1;
Problem: API calls for PUT, POST, DELETE respond with a "400 - Bad Request"
When attempting to perform a REST API call, you receive a 400 error response. REST API calls require the "X-Requested-By" header.
Solution
Starting with Ambari 1.4.2, you must include the "X-Requested-By" header with the REST API calls.
For example, if using curl, include the -H "X-Requested-By: ambari"
option.
curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://<ambari-host>:8080/api/v1/hosts/host1
Ambari Reference Guide
Ambari Reference Topics
For more information about using Ambari 2.0, see the following topics:
Installing Ambari Agents Manually
Involves two steps:
Download the Ambari Repo
Select the OS family running on your installation host.
RHEL/CentOS/Oracle Linux 6
On a server host that has Internet access, use a command line editor to perform the following steps:
-
Log in to your host as
root
. For example, type:ssh <username>@<fqdn>
sudo su -
where<username>
is your user name and<fqdn>
is the fully qualified domain name of your server host. -
Download the Ambari repository file to a directory on your installation host.
wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.0.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
-
Confirm that the repository is configured by checking the repo list.
yum repolist
You should see values similar to the following for Ambari repositories in the list.Version values vary, depending on the installation.
repo id
repo name
status
AMBARI.2.0.0-2.x
Ambari 2.x
8
base
CentOS-6 - Base
6,518
extras
CentOS-6 - Extras
37
updates
CentOS-6 - Updates
785
-
Install the Ambari bits. This also installs the default PostgreSQL Ambari database.
yum install ambari-server
-
Enter
y
when prompted to to confirm transaction and dependency checks.A successful installation displays output similar to the following:
Installing : postgresql-libs-8.4.20-1.el6_5.x86_64 1/4
Installing : postgresql-8.4.20-1.el6_5.x86_64 2/4
Installing : postgresql-server-8.4.20-1.el6_5.x86_64 3/4
Installing : ambari-server-2.0.0-147.noarch 4/4
Verifying : postgresql-server-8.4.20-1.el6_5.x86_64 1/4
Verifying : postgresql-libs-8.4.20-1.el6_5.x86_64 2/4
Verifying : postgresql-8.4.20-1.el6_5.x86_64 3/4
Verifying : ambari-server-2.0.0-147.noarch 4/4
Installed : ambari-server.noarch 0:2.0.0-59
Dependency
Installed : postgresql.x86_64 0:8.4.20-1.el6_5
postgresql-libs.x86_64 0:8.4.20-1.el6_5
postgresql-server.x86_64 0:8.4.20-1.el6_5
Complete!
SLES 11
On a server host that has Internet access, use a command line editor to perform the following steps:
-
Log in to your host as
root
. For example, type:ssh <username>@<fqdn>
sudo su -
where<username>
is your user name and<fqdn>
is the fully qualified domain name of your server host. -
Download the Ambari repository file to a directory on your installation host.
wget -nv http://public-repo-1.hortonworks.com/ambari/suse11/2.x/updates/2.0.0/ambari.repo -O /etc/zypp/repos.d/ambari.repo
-
Confirm the downloaded repository is configured by checking the repo list.
zypper repos
You should see the Ambari repositories in the list.Version values vary, depending on the installation.
Alias
Name
Enabled
Refresh
AMBARI.2.0.0-2.x
Ambari 2.x
Yes
No
http-demeter.uni-regensburg.de-c997c8f9
SUSE-Linux-Enterprise-Software-Development-Kit-11-SP1 11.1.1-1.57
Yes
Yes
opensuse
OpenSuse
Yes
Yes
-
Install the Ambari bits. This also installs PostgreSQL.
zypper install ambari-server
-
Enter
y
when prompted to to confirm transaction and dependency checks.A successful installation displays output similar to the following:
Retrieving package postgresql-libs-8.3.5-1.12.x86_64 (1/4), 172.0 KiB (571.0 KiB unpacked)
Retrieving: postgresql-libs-8.3.5-1.12.x86_64.rpm [done (47.3 KiB/s)]
Installing: postgresql-libs-8.3.5-1.12 [done]
Retrieving package postgresql-8.3.5-1.12.x86_64 (2/4), 1.0 MiB (4.2 MiB unpacked)
Retrieving: postgresql-8.3.5-1.12.x86_64.rpm [done (148.8 KiB/s)]
Installing: postgresql-8.3.5-1.12 [done]
Retrieving package postgresql-server-8.3.5-1.12.x86_64 (3/4), 3.0 MiB (12.6 MiB unpacked)
Retrieving: postgresql-server-8.3.5-1.12.x86_64.rpm [done (452.5 KiB/s)]
Installing: postgresql-server-8.3.5-1.12 [done]
Updating etc/sysconfig/postgresql...
Retrieving package ambari-server-2.0.0-59.noarch (4/4), 99.0 MiB (126.3 MiB unpacked)
Retrieving: ambari-server-2.0.0-59.noarch.rpm [done (3.0 MiB/s)]
Installing: ambari-server-2.0.0-59 [done]
ambari-server 0:off 1:off 2:off 3:on 4:off 5:on 6:off
UBUNTU 12
On a server host that has Internet access, use a command line editor to perform the following steps:
-
Log in to your host as
root
. For example, type:ssh <username>@<fqdn>
sudo su -
where<username>
is your user name and<fqdn>
is the fully qualified domain name of your server host. -
Download the Ambari repository file to a directory on your installation host.
wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu12/2.x/updates/2.0.0/ambari.list -O /etc/apt/sources.list.d/ambari.list
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
apt-get update -
Confirm that Ambari packages downloaded successfully by checking the package name list.
apt-cache pkgnames
You should see the Ambari packages in the list.Version values vary, depending on the installation.
Alias
Name
AMBARI-dev-2.x
Ambari 2.x
-
Install the Ambari bits. This also installs PostgreSQL.
apt-get install ambari-server
RHEL/CentOS/ORACLE Linux 5 (DEPRECATED)
On a server host that has Internet access, use a command line editor to perform the following steps:
-
Log in to your host as
root
. For example, type:ssh <username>@<fqdn>
sudo su -
where<username>
is your user name and<fqdn>
is the fully qualified domain name of your server host. -
Download the Ambari repository file to a directory on your installation host.
wget -nv http://public-repo-1.hortonworks.com/ambari/centos5/2.x/updates/2.0.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
-
Confirm the repository is configured by checking the repo list.
yum repolist
You should see listed values similar to the following:repo Id
repo Name
status
AMBARI.2.0.0-2.x
Ambari 2.x
5
base
CentOS-5 - Base
3,667
epel
Extra Packages for Enterprise Linux 5 - x86_64
7,614
puppet
Puppet
433
updates
CentOS-5 - Updates
118
-
Install the Ambari bits. This also installs PostgreSQL.
yum install ambari-server
Install the Ambari Agents Manually
Use the instructions specific to the OS family running on your agent hosts.
RHEL/CentOS/Oracle Linux 6
-
Install the Ambari Agent on every host in your cluster.
yum install ambari-agent
-
Using a text editor, configure the Ambari Agent by editing the
ambari-agent.ini
file as shown in the following example:vi /etc/ambari-agent/conf/ambari-agent.ini
[server]
hostname=<your.ambari.server.hostname>
url_port=8440
secured_url_port=8441
-
Start the agent on every host in your cluster.
ambari-agent start
The agent registers with the Server on start.
SLES 11
-
Install the Ambari Agent on every host in your cluster.
zypper install ambari-agent
-
Configure the Ambari Agent by editing the
ambari-agent.ini
file as shown in the following example:vi /etc/ambari-agent/conf/ambari-agent.ini
[server]
hostname=<your.ambari.server.hostname>
url_port=8440
secured_url_port=8441 -
Start the agent on every host in your cluster.
ambari-agent start
The agent registers with the Server on start.
UBUNTU 12
-
Install the Ambari Agent on every host in your cluster.
apt-get install ambari-agent
-
Configure the Ambari Agent by editing the
ambari-agent.ini
file as shown in the following example:vi /etc/ambari-agent/conf/ambari-agent.ini
[server]
hostname=<your.ambari.server.hostname>
url_port=8440
secured_url_port=8441 -
Start the agent on every host in your cluster.
ambari-agent start
The agent registers with the Server on start.
RHEL/CentOS/Oracle Linux 5 (DEPRECATED)
-
Install the Ambari Agent on every host in your cluster.
yum install ambari-agent
-
Using a text editor, configure the Ambari Agent by editing the
ambari-agent.ini
file as shown in the following example:vi /etc/ambari-agent/conf/ambari-agent.ini
[server]
hostname=<your.ambari.server.hostname> url_port=8440
secured_url_port=8441 -
Start the agent on every host in your cluster.
ambari-agent start
The agent registers with the Server on start.
Configuring Ambari for Non-Root
In most secure environments, restricting access to and limiting services that run as root is a hard requirement. For these environments, Ambari can be configured to operate without direct root access. Both Ambari Server and Ambari Agent components allow for non-root operation, and the following sections will walk you through the process.
How to Configure Ambari Server for Non-Root
You can configure the Ambari Server to run as a non-root user. During the ambari-server setup process, when prompted to Customize user account for ambari-server daemon?
, choose y
. The setup process prompts you for the appropriate, non-root user to run the Ambari
Server as; for example: ambari.
How to Configure an Ambari Agent for Non-Root
You can configure the Ambari Agent to run as a non-privileged user as well. That user
requires specific sudo access in order to su to Hadoop service accounts and perform
specific privileged commands. Configuring Ambari Agents to run as non-root requires
that you manually install agents on all nodes in the cluster. For these details, see
Installing Ambari Agents Manually. After installing each agent, you must configure the agent to run as the desired,
non-root user. In this example we will use the ambari
user.
Change the run_as_user
property in the /etc/ambari-agent/conf/ambari-agent.ini
file, as illustrated below:run_as_user=ambari
Once this change has been made, the ambari-agent must be restarted to begin running
as the non-root user.
The non-root functionality relies on sudo to run specific commands that require elevated
privileges as defined in the Sudoer Configuration. The sudo configuration is split into three sections: Customizable Users, Non-Customizable Users, Commands, and Sudo Defaults.
Sudoer Configuration
The Customizable Users, Non-Customizable Users, Commands, and Sudo Defaults sections will cover how sudo should be configured to enable Ambari to run as a non-root
user. Each of the sections includes the specific sudo entries that should be placed
in /etc/sudoers
by running the visudo
command.
Customizable Users
This section contains the su
commands and corresponding Hadoop service accounts that are configurable on install:
# Ambari Customizable Users
ambari ALL=(ALL) NOPASSWD:SETENV: /bin/su hdfs *, /bin/su zookeeper *, /bin/su knox *,/bin/su falcon *,/bin/su flume *,/bin/su hbase *,/bin/su hive *, /bin/su hcat *,/bin/su kafka *,/bin/su mapred *,/bin/su oozie *,/bin/su sqoop *,/bin/su storm *,/bin/su tez *,/bin/su yarn *,/bin/su ams *, /bin/su ambari-qa *, /bin/su spark *, /bin/su ranger *
Non-Customizable Users
This section contains the su
commands for the system accounts that cannot be modified:
# Ambari Non-Customizable Users
ambari ALL=(ALL) NOPASSWD:SETENV: /bin/su mysql *
Commands
This section contains the specific commands that must be issued for standard agent operations:
# Ambari Commands
ambari ALL=(ALL) NOPASSWD:SETENV: /usr/bin/yum,/usr/bin/zypper,/usr/bin/apt-get, /bin/mkdir, /bin/ln,/bin/chown, /bin/chmod, /bin/chgrp, /usr/sbin/groupadd, /usr/sbin/groupmod,/usr/sbin/useradd, /usr/sbin/usermod, /bin/cp, /bin/sed, /bin/mv, /bin/rm, /bin/kill,/usr/bin/unzip, /bin/tar, /usr/bin/hdp-select, /usr/hdp/current/hadoopclient/sbin/hadoop-daemon.sh,/usr/lib/hadoop/bin/hadoop-daemon.sh, /usr/lib/hadoop/sbin/hadoop-daemon.sh, /usr/sbin/service mysql *,/sbin/service mysqld *, /sbin/service mysql *, /sbin/chkconfig gmond off,/sbin/chkconfig gmetad off, /etc/init.d/httpd *, /sbin/service hdp-gmetad start, /sbin/service hdp-gmond start, /usr/bin/tee, /usr/sbin/gmond, /usr/sbin/update-rc.d ganglia-monitor *, /usr/sbin/update-rc.d gmetad *, /etc/init.d/apache2 *, /usr/sbin/service hdp-gmond *, /usr/sbin/service hdpgmetad *, /usr/bin/test, /bin/touch, /usr/bin/stat, /usr/sbin/setenforce, /usr/hdp/current/ranger-admin/setup.sh, /usr/hdp/current/ranger-usersync/setup.sh, /usr/bin/ranger-usersync-start, /usr/bin/ranger-usersync-stop
To re-iterate, you must do this sudo configuration on every node in the cluster. To ensure that the configuration has been done properly, you can su to the ambari user and run sudo -l. There, you can double check that there are no warnings, and that the configuration output matches what was just applied.
Sudo Defaults
Some versions of sudo have a default configuration that prevents sudo from being invoked from a non-interactive shell. In order for the agent to run it's commands non-interactively, some defaults need to be overridden.
Defaults exempt_group = ambari
Defaults !env_reset,env_delete-=PATH
Defaults: ambari !requiretty
To re-iterate, this sudo configuration must be done on every node in the cluster. To ensure that the configuration has been done properly, you can su to the ambari user and run sudo -l. There, you can double-check that there are no warnings, and that the configuration output matches what was just applied.
Customizing HDP Services
Defining Service Users and Groups for a HDP 2.x Stack
The individual services in Hadoop run under the ownership of their respective Unix
accounts. These accounts are known as service users. These service users belong to
a special Unix group. "Smoke Test" is a service user dedicated specifically for running
smoke tests on components during installation using the Services
View of the Ambari Web GUI. You can also run service checks as the "Smoke Test" user
on-demand after installation. You can customize any of these users and groups using
the Misc
tab during the Customize Services
installation step.
If you choose to customize names, Ambari checks to see if these custom accounts already exist. If they do not exist, Ambari creates them. The default accounts are always created during installation whether or not custom accounts are specified. These default accounts are not used and can be removed post-install.
Service Users
Service* |
Component |
Default User Account |
---|---|---|
Ambari Metrics |
Metrics Collector, Metrics Monitor |
ams |
Falcon |
Falcon Server |
falcon (Falcon is available with HDP 2.1 or 2.2 Stack.) |
Flume |
Flume Agents |
flume |
HBase |
MasterServer RegionServer |
hbase |
HDFS |
NameNode SecondaryNameNode DataNode |
hdfs |
Hive |
Hive Metastore, HiveServer2 |
hive |
Kafka |
Kafka Broker |
kafka |
Knox |
Knox Gateway |
knox |
MapReduce2 |
HistoryServer |
mapred |
Oozie |
Oozie Server |
oozie |
PostgreSQL |
PostgreSQL (with Ambari Server) |
postgres (Created as part of installing the default PostgreSQL database with Ambari Server. If you are not using the Ambari PostgreSQL database, this user is not needed.) |
Ranger |
Ranger Admin, Ranger Usersync |
ranger (Ranger is available with HDP 2.2 Stack) |
Spark |
Spark History Server |
spark (Spark is available with HDP 2.2 Stack) |
Sqoop |
Sqoop |
sqoop |
Storm |
Masters (Nimbus, DRPC Server, Storm REST API, Server, Storm UI Server) Slaves (Supervisors, Logviewers) |
storm (Storm is available with HDP 2.1 or 2.2 Stack.) |
Tez |
Tez clients |
tez (Tez is available with HDP 2.1 or 2.2 Stack.) |
WebHCat |
WebHCat Server |
hcat |
YARN |
NodeManager ResourceManager |
yarn |
ZooKeeper |
ZooKeeper |
zookeeper |
*For all components, the Smoke Test user performs smoke tests against cluster services as part of the install process. It also can perform these on-demand, from the Ambari Web UI. The default user account for the smoke test user is ambari-qa.
Service Groups
Service |
Components |
Default Group Account |
---|---|---|
All |
All |
hadoop |
Knox |
Knox Gateway |
knox |
Ranger |
Ranger Admin, Ranger Usersync |
ranger |
Spark |
Spark History Server |
spark |
Setting Properties That Depend on Service Usernames/Groups
Some properties must be set to match specific service user names or service groups.
If you have set up non-default, customized service user names for the HDFS or HBase
service or the Hadoop group name, you must edit the following properties, using Services > Service.Name > Configs > Advanced
:
HDFS Settings: Advanced
Property Name |
Value |
---|---|
dfs.permissions.superusergroup |
The same as the HDFS username. The default is "hdfs" |
dfs.cluster.administrators |
A single space followed by the HDFS username. |
dfs.block.local-path-access.user |
The HBase username. The default is "hbase". |
MapReduce Settings: Advanced
Property Name |
Value |
---|---|
mapreduce.cluster.administrators |
A single space followed by the Hadoop group name. |
Configuring Storm for Supervision
Configuring Storm for Supervision
If you have installed a cluster with HDP 2.2 Stack that includes the Storm service, you can configure the Storm components to operate under supervision. This section describes those steps:
-
Stop all Storm components.
Using Ambari Web, browse to
Services > Storm > Service Actions
, choose Stop. Wait until the Storm service stop completes. -
Stop Ambari Server.
ambari-server stop
-
Change Supervisor and Nimbus command scripts in the Stack definition.
On Ambari Server host, run:
sed -ir "s/scripts\/supervisor.py/scripts\/supervisor_prod.py/g" /var/lib/ambari-server/resources/common-services/STORM/0.9.1.2.1/metainfo.xml sed -ir "s/scripts\/nimbus.py/scripts\/nimbus_prod.py/g" /var/lib/ambari-server/resources/common-services/STORM/0.9.1.2.1/metainfo.xml
-
Install supervisord on all Nimbus and Supervisor hosts.
-
Install EPEL repository.
yum install epel-release -y
-
Install supervisor package for supervisord.
yum install supervisor -y
-
Enable supervisord on autostart.
chkconfig supervisord on
-
Change supervisord configuration file permissions.
chmod 600 /etc/supervisord.conf
-
-
Configure
supervisord
to supervise Nimbus Server and Supervisors by appending the following to/etc/supervisord.conf
on all Supervisor host and Nimbus hosts accordingly.[program:storm-nimbus] command=env PATH=$PATH:/bin:/usr/bin/:/usr/jdk64/jdk1.7.0_67/bin/ JAVA_HOME=/usr/jdk64/jdk1.7.0_67 /usr/hdp/current/storm-nimbus/bin/storm nimbus user=storm autostart=true autorestart=true startsecs=10 startretries=999 log_stdout=true log_stderr=true logfile=/var/log/storm/nimbus.out logfile_maxbytes=20MB logfile_backups=10 [program:storm-supervisor] command=env PATH=$PATH:/bin:/usr/bin/:/usr/jdk64/jdk1.7.0_67/bin/ JAVA_HOME=/usr/jdk64/jdk1.7.0_67 /usr/hdp/current/storm-supervisor/bin/storm supervisor user=storm autostart=true autorestart=true startsecs=10 startretries=999 log_stdout=true log_stderr=true logfile=/var/log/storm/supervisor.out logfile_maxbytes=20MB logfile_backups=10
-
Start Supervisord service on all Supervisor and Nimbus hosts.
service supervisord start
-
Start Ambari Server.
ambari-server start
-
Start all the other Storm components.
Using Ambari Web, browse to
Services > Storm > Service Actions
, chooseStart
.
Using Custom Host Names
You can customize the agent registration host name and the public host name used for each host in Ambari. Use this capability when "hostname" does not return the public network host name for your machines.
How to Customize the name of a host
How to Customize the name of a host
-
At the
Install Options
step in the Cluster Installer wizard, selectPerform Manual Registration for Ambari Agents
. -
Install the Ambari Agents manually on each host, as described in Install the Ambari Agents Manually.
-
To echo the customized name of the host to which the Ambari agent registers, for every host, create a script like the following example, named
/var/lib/ambari-agent/hostname.sh
. Be sure tochmod
the script so it is executable by the Agent.#!/bin/sh echo
<ambari_hostname>where <ambari_hostname> is the host name to use for Agent registration.
-
Open
/etc/ambari-agent/conf/ambari-agent.ini
on every host, using a text editor. -
Add to the
[agent]
section the following line:hostname_script=/var/lib/ambari-agent/hostname.sh
where
/var/lib/ambari-agent/hostname.sh
is the name of your custom echo script. -
To generate a public host name for every host, create a script like the following example, named
var/lib/ambari-agent/public_hostname.sh
to show the name for that host in the UI. Be sure tochmod
the script so it is executable by the Agent.#!/bin/sh
<hostname>-f
where <hostname> is the host name to use for Agent registration.
-
Open
/etc/ambari-agent/conf/ambari-agent.ini
on every host, using a text editor. -
Add to the
[agent]
section the following line:public_hostname_script=/var/lib/ambari-agent/public_hostname.sh
-
If applicable, add the host names to
/etc/hosts
on every host. -
Restart the Agent on every host for these changes to take effect.
ambari-agent restart
Moving the Ambari Server
To transfer an Ambari Server that uses the default, PostgreSQL database to a new host, use the following instructions:
-
Back up all current data - from the original Ambari Server and MapReduce databases.
-
Update all Agents - to point to the new Ambari Server.
-
Install the New Server - on a new host and populate databases with information from original Server.
Back up Current Data
-
Stop the original Ambari Server.
ambari-server stop
-
Create a directory to hold the database backups.
cd /tmp mkdir dbdumps cd dbdumps/
-
Create the database backups.
pg_dump -U
<AMBARI.SERVER.USERNAME>ambari > ambari.sql Password:
<AMBARI.SERVER.PASSWORD>pg_dump -U
<MAPRED.USERNAME>ambarirca > ambarirca.sql Password:
<MAPRED.PASSWORD>where <AMBARI.SERVER.USERNAME>, <MAPRED.USERNAME>, <AMBARI.SERVER.PASSWORD>, and <MAPRED.PASSWORD> are the user names and passwords that you set up during installation. Default values are:
ambari-server/bigdata
andmapred/mapred
.
Update Agents
-
On each agent host, stop the agent.
ambari-agent stop
-
Remove old agent certificates.
rm /var/lib/ambari-agent/keys/*
-
Using a text editor, edit
/etc/ambari-agent/conf/ambari-agent.ini
to point to the new host.[server] hostname= <NEW FULLY.QUALIFIED.DOMAIN.NAME> url_port=8440 secured_url_port=8441
Install the New Server and Populate the Databases
-
Install the Server on the new host.
-
Stop the Server so that you can copy the old database data to the new Server.
ambari-server stop
-
Restart the PostgreSQL instance.
service postgresql restart
-
Open the PostgreSQL interactive terminal.
su - postgres psql
-
Using the interactive terminal, drop the databases created by the fresh install.
drop database ambari; drop database ambarirca;
-
Check to make sure the databases have been dropped.
/list
The databases should not be listed.
-
Create new databases to hold the transferred data.
create database ambari; create database ambarirca;
-
Exit the interactive terminal.
^d
-
Copy the saved data from Back up Current Data to the new Server.
cd /tmp scp -i <ssh-key> root@
<original.Ambari.Server>/tmp/dbdumps/*.sql/tmp
psql -d ambari -f /tmp/ambari.sql psql -d ambarirca -f /tmp/ambarirca.sql
-
Start the new Server.
<exit to root>
ambari-server start
-
On each Agent host, start the Agent.
ambari-agent start
-
Open Ambari Web. Point your browser to:
<new.Ambari.Server>
:8080
-
Go to
Services > MapReduce
and use the Management Header to Stop and Start the MapReduce service. -
Start other services as necessary.
The new Server is ready to use.
Configuring LZO Compression
LZO is a lossless data compression library that favors speed over compression ratio. Ambari does not install nor enable LZO Compression by default. To enable LZO compression in your HDP cluster, you must Configure core-site.xml for LZO.
Optionally, you can implement LZO to optimize Hive queries in your cluster for speed. For more information about using LZO compression with Hive, see Running Compression with Hive Queries.
Configure core-site.xml for LZO
-
Browse to
Ambari Web > Services > HDFS > Configs
, then expandAdvanced core-site
. -
Find the
io.compression.codecs
property key. -
Append to the
io.compression.codecs
property key, the following value:com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec
-
Add a description of the config modification, then choose Save.
-
Expand the
Custom core-site.xml
section. -
Select
Add Property
. -
Add to
Custom core-site.xml
the following property key and valueProperty Key
Property Value
io.compression.codec.lzo.class
com.hadoop.compression.lzo.LzoCodec
-
Choose
Save
. -
Add a description of the config modification, then choose Save.
-
Restart the HDFS, MapReduce2 and YARN services.
Running Compression with Hive Queries
Running Compression with Hive Queries requires creating LZO files. To create LZO files, use one of the following procedures:
Create LZO Files
-
Create LZO files as the output of the Hive query.
-
Use
lzop
command utility or your custom Java to generatelzo.index
for the.lzo
files.
Hive Query Parameters
Prefix the query string with these parameters:
SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
SET hive.exec.compress.output=true
SET mapreduce.output.fileoutputformat.compress=true
For example:
hive -e "SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec;SET
hive.exec.compress.output=true;SET mapreduce.output.fileoutputformat.compress=true;"
Write Custom Java to Create LZO Files
-
Create text files as the output of the Hive query.
-
Write custom Java code to
-
convert Hive query generated text files to
.lzo
files -
generate
lzo.index
files for the.lzo
files
-
Hive Query Parameters
Prefix the query string with these parameters:
SET hive.exec.compress.output=false
SET mapreduce.output.fileoutputformat.compress=false
For example:
hive -e "SET hive.exec.compress.output=false;SET mapreduce.output.fileoutputformat.compress=false;<query-string>"
Using Non-Default Databases
Use the following instructions to prepare a non-default database for Ambari, Hive/HCatalog,
or Oozie. You must complete these instructions before you set up the Ambari Server
by running ambari-server setup
.
Using Non-Default Databases - Ambari
The following sections describe how to use Ambari with an existing database, other than the embedded PostgreSQL database instance that Ambari Server uses by default.
Using Ambari with Oracle
To set up Oracle for use with Ambari:
-
On the Ambari Server host, install the appropriate
JDBC.jar
file.-
Download the Oracle JDBC (OJDBC) driver from http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html.
-
Select
Oracle Database 11g Release 2 - ojdbc6.jar
. -
Copy the .jar file to the Java share directory.
cp ojdbc6.jar /usr/share/java
-
Make sure the .jar file has the appropriate permissions - 644.
-
-
Create a user for Ambari and grant that user appropriate permissions.
For example, using the Oracle database admin utility, run the following commands:
# sqlplus sys/root as sysdba
CREATE USER <AMBARIUSER> IDENTIFIED BY <AMBARIPASSWORD> default tablespace
“USERS” temporary tablespace “TEMP”;
GRANT unlimited tablespace to <AMBARIUSER>;
GRANT create session to <AMBARIUSER>;
GRANT create TABLE to <AMBARIUSER>;
GRANT create SEQUENCE to <AMBARIUSER>;
QUIT;Where <AMBARIUSER> is the Ambari user name and <AMBARIPASSWORD> is the Ambari user password.
-
Load the Ambari Server database schema.
-
You must pre-load the Ambari database schema into your Oracle database using the schema script.
sqlplus
<AMBARIUSER>/<AMBARIPASSWORD>
< Ambari-DDL-Oracle-CREATE.sql
-
Find the Ambari-DDL-Oracle-CREATE.sql file in the
/var/lib/ambari-server/resources/
directory of the Ambari Server host after you have installed Ambari Server.
-
-
When setting up the Ambari Server, select
Advanced Database Configuration > Option [2] Oracle
and respond to the prompts using the username/password credentials you created in step 2.
Using Ambari with MySQL
To set up MySQL for use with Ambari:
-
On the Ambari Server host, install the connector.
-
Install the connector
RHEL/CentOS/Oracle Linux
yum install mysql-connector-java
SLES
zypper install mysql-connector-java
apt-get install mysql-connector-java
-
Confirm that
.jar
is in the Java share directory.ls /usr/share/java/mysql-connector-java.jar
-
Make sure the .jar file has the appropriate permissions - 644.
-
-
Create a user for Ambari and grant it permissions.
-
For example, using the MySQL database admin utility:
# mysql -u root -p
CREATE USER '<AMBARIUSER>'@'%' IDENTIFIED BY '<AMBARIPASSWORD>';
GRANT ALL PRIVILEGES ON *.* TO '<AMBARIUSER>'@'%';
CREATE USER '<AMBARIUSER>'@'localhost' IDENTIFIED BY '<AMBARIPASSWORD>';
GRANT ALL PRIVILEGES ON *.* TO '<AMBARIUSER>'@'localhost';
CREATE USER '<AMBARIUSER>'@'<AMBARISERVERFQDN>' IDENTIFIED BY '<AMBARIPASSWORD>';
GRANT ALL PRIVILEGES ON *.* TO '<AMBARIUSER>'@'<AMBARISERVERFQDN>';
FLUSH PRIVILEGES; -
Where
<AMBARIUSER>
is the Ambari user name,<AMBARIPASSWORD>
is the Ambari user password and<AMBARISERVERFQDN>
is the Fully Qualified Domain Name of the Ambari Server host.
-
-
Load the Ambari Server database schema.
-
You must pre-load the Ambari database schema into your MySQL database using the schema script.
mysql -u <AMBARIUSER> -p CREATE DATABASE
<AMBARIDATABASE>
; USE <AMBARIDATABASE>; SOURCE Ambari-DDL-MySQL-CREATE.sql;
-
Where <AMBARIUSER> is the Ambari user name and <AMBARIDATABASE> is the Ambari database name.
Find the
Ambari-DDL-MySQL-CREATE.sql
file in the/var/lib/ambari-server/resources/
directory of the Ambari Server host after you have installed Ambari Server.
-
-
When setting up the Ambari Server, select
Advanced Database Configuration > Option [3] MySQL
and enter the credentials you defined in Step 2. for user name, password and database name.
Using Ambari with PostgreSQL
To set up PostgreSQL for use with Ambari:
-
Create a user for Ambari and grant it permissions.
-
Using the PostgreSQL database admin utility:
# sudo -u postgres psql
CREATE DATABASE <AMBARIDATABASE>;
CREATE USER <AMBARIUSER> WITH PASSWORD ‘<AMBARIPASSWORD>’;
GRANT ALL PRIVILEGES ON DATABASE <AMBARIDATABASE> TO <AMBARIUSER>;
\connect <AMBARIDATABASE>;
CREATE SCHEMA <AMBARISCHEMA> AUTHORIZATION <AMBARIUSER>;
ALTER SCHEMA <AMBARISCHEMA> OWNER TO <AMBARIUSER>;
ALTER ROLE <AMBARIUSER> SET search_path to ‘<AMBARISCHEMA>’, 'public'; -
Where <AMBARIUSER> is the Ambari user name <AMBARIPASSWORD> is the Ambari user password, <AMBARIDATABASE> is the Ambari database name and <AMBARISCHEMA>
-
-
Load the Ambari Server database schema.
-
You must pre-load the Ambari database schema into your PostgreSQL database using the schema script.
# psql -U <AMBARIUSER> -d <AMBARIDATABASE>
\connect <AMBARIDATABASE>;
\i Ambari-DDL-Postgres-CREATE.sql; -
Find the
Ambari-DDL-Postgres-CREATE.sql
file in the/var/lib/ambari-server/resources/
directory of the Ambari Server host after you have installed Ambari Server.
-
-
When setting up the Ambari Server, select
Advanced Database Configuration > Option[4] PostgreSQL
and enter the credentials you defined in Step 2. for user name, password, and database name.
Troubleshooting Ambari
Use these topics to help troubleshoot any issues you might have installing Ambari with an existing Oracle database.
Problem: Ambari Server Fails to Start: No Driver
Check /var/log/ambari-server/ambari-server.log
for the following error:
ExceptionDescription:Configurationerror.Class[oracle.jdbc.driver.OracleDriver] not
found.
The Oracle JDBC.jar file cannot be found.
Solution
Make sure the file is in the appropriate directory on the Ambari server and re-run
ambari-server setup
. Review the load database procedure appropriate for your database type in Using Non-Default Databases - Ambari.
Problem: Ambari Server Fails to Start: No Connection
Check /var/log/ambari-server/ambari-server.log
for the following error:
The Network Adapter could not establish the connection Error Code: 17002
Ambari Server cannot connect to the database.
Solution
Confirm that the database host is reachable from the Ambari Server and is correctly
configured by reading /etc/ambari-server/conf/ambari.properties
.
server.jdbc.url=jdbc:oracle:thin:@oracle.database.hostname:1521/ambaridb
server.jdbc.rca.url=jdbc:oracle:thin:@oracle.database.hostname:1521/ambari
Problem: Ambari Server Fails to Start: Bad Username
Check /var/log/ambari-server/ambari-server.log
for the following error:
Internal Exception: java.sql.SQLException:ORA01017: invalid username/password; logon
denied
You are using an invalid username/password.
Solution
Confirm the user account is set up in the database and has the correct privileges. See Step 3 above.
Problem: Ambari Server Fails to Start: No Schema
Check /var/log/ambari-server/ambari-server.log
for the following error:
Internal Exception: java.sql.SQLSyntaxErrorException: ORA00942: table or view does
not exist
The schema has not been loaded.
Solution
Confirm you have loaded the database schema. Review the load database schema procedure appropriate for your database type in Using Non-Default Databases - Ambari.
Using Non-Default Databases - Hive
The following sections describe how to use Hive with an existing database, other than the MySQL database instance that Ambari installs by default.
Using Hive with Oracle
To set up Oracle for use with Hive:
-
On the Ambari Server host, stage the appropriate JDBC driver file for later deployment.
-
Download the Oracle JDBC (OJDBC) driver from http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html.
-
Select
Oracle Database 11g Release 2 - ojdbc6.jar
and download the file. -
Make sure the .jar file has the appropriate permissions - 644.
-
Execute the following command, adding the path to the downloaded .jar file:
ambari-server setup --jdbc-db=oracle --jdbc-driver=/path/to/downloaded/ojdbc6.jar
-
-
Create a user for Hive and grant it permissions.
-
Using the Oracle database admin utility:
# sqlplus sys/root as sysdba CREATE USER <HIVEUSER> IDENTIFIED BY <HIVEPASSWORD>; GRANT SELECT_CATALOG_ROLE TO <HIVEUSER>; GRANT CONNECT, RESOURCE TO <HIVEUSER>; QUIT;
-
Where <HIVEUSER> is the Hive user name and <HIVEPASSWORD> is the Hive user password.
-
-
Load the Hive database schema.
-
For a HDP 2.2 Stack
-
For a HDP 2.1 Stack
You must pre-load the Hive database schema into your Oracle database using the schema script, as follows: sqlplus <HIVEUSER>/<HIVEPASSWORD> < hive-schema-0.13.0.oracle.sql
Find the
hive-schema-0.13.0.oracle.sql
file in the/var/lib/ambari-server/resources/stacks/HDP/2.1/services/HIVE/etc/
directory of the Ambari Server host after you have installed Ambari Server. -
For a HDP 2.0 Stack
You must pre-load the Hive database schema into your Oracle database using the schema script, as follows: sqlplus <HIVEUSER>/<HIVEPASSWORD> < hive-schema-0.12.0.oracle.sql
Find the
hive-schema-0.12.0.oracle.sql
file in the/var/lib/ambari-server/resources/stacks/HDP/2.0.6/services/HIVE/etc/
directory of the Ambari Server host after you have installed Ambari Server. -
For a HDP 1.3 Stack
You must pre-load the Hive database schema into your Oracle database using the schema script, as follows: sqlplus <HIVEUSER>/<HIVEPASSWORD> < hive-schema-0.10.0.oracle.sql
Find the
hive-schema-0.10.0.oracle.sql
file in the/var/lib/ambari-server/resources/stacks/HDP/1.3.2/services/HIVE/etc/
directory of the Ambari Server host after you have installed Ambari Server.
-
Using Hive with MySQL
To set up MySQL for use with Hive:
-
On the Ambari Server host, stage the appropriate MySQL connector for later deployment.
-
Install the connector.
RHEL/CentOS/Oracle Linux
yum install mysql-connector-java*
SLES
zypper install mysql-connector-java*
Ubuntu
apt-get install mysql-connector-java*
-
Confirm that
mysql-connector-java.jar
is in the Java share directory.ls /usr/share/java/mysql-connector-java.jar
-
Make sure the .jar file has the appropriate permissions - 644.
-
Execute the following command:
ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
-
-
Create a user for Hive and grant it permissions.
-
Using the MySQL database admin utility:
# mysql -u root -p CREATE USER ‘<HIVEUSER>’@’localhost’ IDENTIFIED BY ‘<HIVEPASSWORD>’; GRANT ALL PRIVILEGES ON *.* TO '<HIVEUSER>'@'localhost'; CREATE USER ‘<HIVEUSER>’@’%’ IDENTIFIED BY ‘<HIVEPASSWORD>’; GRANT ALL PRIVILEGES ON *.* TO '<HIVEUSER>'@'%'; CREATE USER '<HIVEUSER>'@'<HIVEMETASTOREFQDN>'IDENTIFIED BY '<HIVEPASSWORD>'; GRANT ALL PRIVILEGES ON *.* TO '<HIVEUSER>'@'<HIVEMETASTOREFQDN>'; FLUSH PRIVILEGES;
-
Where
<HIVEUSER>
is the Hive user name,<HIVEPASSWORD>
is the Hive user password and<HIVEMETASTOREFQDN>
is the Fully Qualified Domain Name of the Hive Metastore host.
-
-
Create the Hive database.
The Hive database must be created before loading the Hive database schema.
# mysql -u root -p CREATE DATABASE
<HIVEDATABASE>
Where <HIVEDATABASE> is the Hive database name. -
Load the Hive database schema.
-
For a HDP 2.2 Stack:
-
For a HDP 2.1 Stack:
You must pre-load the Hive database schema into your MySQL database using the schema script, as follows.
mysql -u root -p
<HIVEDATABASE>hive-schema-0.13.0.mysql.sql
Find the
hive-schema-0.13.0.mysql.sql
file in the/var/lib/ambari-server/resources/stacks/HDP/2.1/services/HIVE/etc/
directory of the Ambari Server host after you have installed Ambari Server.
-
Using Hive with PostgreSQL
To set up PostgreSQL for use with Hive:
-
On the Ambari Server host, stage the appropriate PostgreSQL connector for later deployment.
-
Install the connector.
RHEL/CentOS/Oracle Linux
yum install postgresql-jdbc*
SLES
zypper install -y postgresql-jdbc
-
Copy the connector.jar file to the Java share directory.
cp /usr/share/pgsql/postgresql-*.jdbc3.jar /usr/share/java/postgresql-jdbc.jar
-
Confirm that .jar is in the Java share directory.
ls /usr/share/java/postgresql-jdbc.jar
-
Change the access mode of the.jar file to 644.
chmod 644 /usr/share/java/postgresql-jdbc.jar
-
Execute the following command:
ambari-server setup --jdbc-db=postgres
--jdbc-driver=/usr/share/java/postgresql-connector-java.jar
-
-
Create a user for Hive and grant it permissions.
-
Using the PostgreSQL database admin utility:
echo "CREATE DATABASE <HIVEDATABASE>;" | psql -U postgres echo "CREATE USER <HIVEUSER> WITH PASSWORD '<HIVEPASSWORD>';" | psql -U postgres echo "GRANT ALL PRIVILEGES ON DATABASE <HIVEDATABASE> TO <HIVEUSER>;" | psql -U postgres
-
Where <HIVEUSER> is the Hive user name, <HIVEPASSWORD> is the Hive user password and <HIVEDATABASE> is the Hive database name.
-
-
Load the Hive database schema.
-
For a HDP 2.2 Stack:
-
For a HDP 2.1 Stack:
You must pre-load the Hive database schema into your PostgreSQL database using the schema script, as follows:
# psql -U <HIVEUSER> -d <HIVEDATABASE> \connect <HIVEDATABASE>; \i hive-schema-0.13.0.postgres.sql;
Find the
hive-schema-0.13.0.postgres.sql
file in the/var/lib/ambari-server/resources/stacks/HDP/2.1/services/HIVE/etc/
directory of the Ambari Server host after you have installed Ambari Server. -
For a HDP 2.0 Stack:
You must pre-load the Hive database schema into your PostgreSQL database using the schema script, as follows:
# sudo -u postgres psql \connect <HIVEDATABASE>; \i hive-schema-0.12.0.postgres.sql;
Find the
hive-schema-0.12.0.postgres.sql
file in the/var/lib/ambari-server/resources/stacks/HDP/2.0.6/services/HIVE/etc/
directory of the Ambari Server host after you have installed Ambari Server. -
For a HDP 1.3 Stack:
You must pre-load the Hive database schema into your PostgreSQL database using the schema script, as follows:
# sudo -u postgres psql \connect <HIVEDATABASE>; \i hive-schema-0.10.0.postgres.sql;
Find the
hive-schema-0.10.0.postgres.sql
file in the/var/lib/ambari-server/resources/stacks/HDP/1.3.2/services/HIVE/etc/
directory of the Ambari Server host after you have installed Ambari Server.
-
Troubleshooting Hive
Use these entries to help you troubleshoot any issues you might have installing Hive with non-default databases.
Problem: Hive Metastore Install Fails Using Oracle
Check the install log:
cp /usr/share/java/${jdbc_jar_name} ${target}] has failures: true
The Oracle JDBC.jar file cannot be found.
Solution
Make sure the file is in the appropriate directory on the Hive Metastore server and click Retry.
Problem: Install Warning when "Hive Check Execute" Fails Using Oracle
Check the install log:
java.sql.SQLSyntaxErrorException: ORA-01754:
a table may contain only one column of type LONG
The Hive Metastore schema was not properly loaded into the database.
Solution
Ignore the warning, and complete the install. Check your database to confirm the Hive
Metastore schema is loaded. In the Ambari Web GUI, browse to Services > Hive. Choose Service Actions > Service Check
to check that the schema is correctly in place.
Problem: Hive Check Execute may fail after completing an Ambari upgrade to version 1.4.2
For secure and non-secure clusters, with Hive security authorization enabled, the Hive service check may fail. Hive security authorization may not be configured properly.
Solution
Two workarounds are possible. Using Ambari Web, in HiveConfigsAdvanced:
-
Disable
hive.security.authorization
, by setting thehive.security.authorization.enabled
value to false.or
-
Properly configure Hive security authorization. For example, set the following properties:
For more information about configuring Hive security, see Metastore Server Security in Hive Authorization and the HCatalog document Storage Based Authorization.
Hive Security Authorization Settings
Property
Value
hive.security.authorization.manager
org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider
hive.security.metastore.authorization.manager
org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider
hive.security.authenticator.manager
org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator
Metastore Server Security Hive Authorization Storage Based Authorization
Using Non-Default Databases - Oozie
The following sections describe how to use Oozie with an existing database, other than the Derby database instance that Ambari installs by default.
Using Oozie with Oracle
To set up Oracle for use with Oozie:
-
On the Ambari Server host, stage the appropriate JDBC driver file for later deployment.
-
Download the Oracle JDBC (OJDBC) driver from http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html.
-
Select Oracle Database 11g Release 2 - ojdbc6.jar.
-
Make sure the .jar file has the appropriate permissions - 644.
-
Execute the following command, adding the path to the downloaded.jar file:
ambari-server setup --jdbc-db=oracle --jdbc-driver=/path/to/downloaded/ojdbc6.jar
-
-
Create a user for Oozie and grant it permissions.
Using the Oracle database admin utility, run the following commands:
# sqlplus sys/root as sysdba CREATE USER <OOZIEUSER> IDENTIFIED BY <OOZIEPASSWORD>; GRANT ALL PRIVILEGES TO <OOZIEUSER>; GRANT CONNECT, RESOURCE TO <OOZIEUSER>; QUIT;
Where <OOZIEUSER> is the Oozie user name and <OOZIEPASSWORD> is the Oozie user password.
Using Oozie with MySQL
To set up MySQL for use with Oozie:
-
On the Ambari Server host, stage the appropriate MySQL connector for later deployment.
-
Install the connector.
RHEL/CentOS/Oracle Linux
yum install mysql-connector-java*
SLES
zypper install mysql-connector-java*
UBUNTU
apt-get install mysql-connector-java*
-
Confirm that
mysql-connector-java.jar
is in the Java share directory.ls /usr/share/java/mysql-connector-java.jar
-
Make sure the .jar file has the appropriate permissions - 644.
-
Execute the following command:
ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
-
-
Create a user for Oozie and grant it permissions.
-
Using the MySQL database admin utility:
# mysql -u root -p CREATE USER ‘<OOZIEUSER>’@’%’ IDENTIFIED BY ‘<OOZIEPASSWORD>’; GRANT ALL PRIVILEGES ON *.* TO '<OOZIEUSER>'@'%'; FLUSH PRIVILEGES;
-
Where <OOZIEUSER> is the Oozie user name and <OOZIEPASSWORD> is the Oozie user password.
-
-
Create the Oozie database.
-
The Oozie database must be created prior.
# mysql -u root -p CREATE DATABASE
<OOZIEDATABASE> -
Where <OOZIEDATABASE> is the Oozie database name.
-
Using Oozie with PostgreSQL
To set up PostgreSQL for use with Oozie:
-
On the Ambari Server host, stage the appropriate PostgreSQL connector for later deployment.
-
Install the connector.
RHEL/CentOS/Oracle Linux
yum install postgresql-jdbc
SLES
zypper install -y postgresql-jdbc
UBUNTU
apt-get install -y postgresql-jdbc
-
Copy the connector.jar file to the Java share directory.
cp /usr/share/pgsql/postgresql-*.jdbc3.jar /usr/share/java/postgresql-jdbc.jar
-
Confirm that .jar is in the Java share directory.
ls /usr/share/java/postgresql-jdbc.jar
-
Change the access mode of the .jar file to 644.
chmod 644 /usr/share/java/postgresql-jdbc.jar
-
Execute the following command:
ambari-server setup --jdbc-db=postgres --jdbc-driver=/usr/share/java/postgresql-connector-java.jar
-
-
Create a user for Oozie and grant it permissions.
-
Using the PostgreSQL database admin utility:
echo "CREATE DATABASE <OOZIEDATABASE>;" | psql -U postgres echo "CREATE USER <OOZIEUSER> WITH PASSWORD '<OOZIEPASSWORD>';" | psql -U postgres echo "GRANT ALL PRIVILEGES ON DATABASE <OOZIEDATABASE> TO <OOZIEUSER>;" | psql -U postgres
-
Where <OOZIEUSER> is the Oozie user name, <OOZIEPASSWORD> is the Oozie user password and <OOZIEDATABASE> is the Oozie database name.
-
Troubleshooting Oozie
Use these entries to help you troubleshoot any issues you might have installing Oozie with non-default databases.
Problem: Oozie Server Install Fails Using MySQL
Check the install log:
cp /usr/share/java/mysql-connector-java.jar
usr/lib/oozie/libext/mysql-connector-java.jar
has failures: true
The MySQL JDBC.jar file cannot be found.
Solution
Make sure the file is in the appropriate directory on the Oozie server and click Retry.
Problem: Oozie Server Install Fails Using Oracle or MySQL
Check the install log:
Exec[exec cd /var/tmp/oozie &&
/usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie.sql -run ]
has failures: true
Oozie was unable to connect to the database or was unable to successfully setup the schema for Oozie.
Solution
Check the database connection settings provided during the Customize Services
step in the install wizard by browsing back to Customize Services > Oozie
. After confirming and adjusting your database settings, proceed forward with the
install wizard.
If the Install Oozie Server wizard continues to fail, get more information by connecting directly to the Oozie server and executing the following command as <OOZIEUSER>:
su oozie /usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie.sql -run
Setting up an Internet Proxy Server for Ambari
If you plan to use the public repositories for installing the Stack, Ambari Server must have Internet access to confirm access to the repositories and validate the repositories. If your machine requires use of a proxy server for Internet access, you must configure Ambari Server to use the proxy server.
How To Set Up an Internet Proxy Server for Ambari
How To Set Up an Internet Proxy Server for Ambari
-
On the Ambari Server host, add proxy settings to the following script:
/var/lib/ambari-server/ambari-env.sh.
-Dhttp.proxyHost=
<yourProxyHost>-Dhttp.proxyPort=
<yourProxyPort> -
Optionally, to prevent some host names from accessing the proxy server, define the list of excluded hosts, as follows:
-Dhttp.nonProxyHosts=
<pipe|separated|list|of|hosts> -
If your proxy server requires authentication, add the user name and password, as follows:
-Dhttp.proxyUser=
<username>-Dhttp.proxyPassword=
<password> -
Restart the Ambari Server to pick up this change.
Configuring Network Port Numbers
This chapter lists port number assignments required to maintain communication between Ambari Server, Ambari Agents, and Ambari Web.
-
Optional: Changing the Default Ambari Server Port
For more information about configuring port numbers for Stack components, see Configuring Ports in the HDP Stack documentation.
Default Network Port Numbers - Ambari
The following table lists the default ports used by Ambari Server and Ambari Agent services.
Service |
Servers |
Default Ports Used |
Protocol |
Description |
Need End User Access? |
Configuration Parameters |
---|---|---|---|---|---|---|
Ambari Server |
Ambari Server host |
8080 See Optional: Change the Ambari Server Port for instructions on changing the default port. |
http See Optional: Set Up HTTPS for Ambari Server for instructions on enabling HTTPS. |
Interface to Ambari Web and Ambari REST API |
No |
|
Ambari Server |
Ambari Server host |
8440 |
https |
Handshake Port for Ambari Agents to Ambari Server |
No |
|
Ambari Server |
Ambari Server host |
8441 |
https |
Registration and Heartbeat Port for Ambari Agents to Ambari Server |
No |
|
Ambari Agent |
All hosts running Ambari Agents |
8670 You can change the Ambari Agent ping port in the Ambari Agent configuration. |
tcp |
Ping port used for alerts to check the health of the Ambari Agent |
No |
Optional: Changing the Default Ambari Server Port
By default, Ambari Server uses port 8080 to access the Ambari Web UI and the REST API. To change the port number, you must edit the Ambari properties file.
Ambari Server should not be running when you change port numbers. Edit ambari.properties
before you start Ambari Server the first time or stop Ambari Server before editing
properties.
-
On the Ambari Server host, open
/etc/ambari-server/conf/ambari.properties
with a text editor. -
Add the client API port property and set it to your desired port value:
client.api.port=<port_number>
-
Start or re-start the Ambari Server. Ambari Server now accesses Ambari Web via the newly configured port:
http://<your.ambari.server>:<port_number>
Changing the JDK Version on an Existing Cluster
During your initial Ambari Server Setup, you selected the JDK to use or provided a path to a custom JDK already installed on your hosts. After setting up your cluster, you may change the JDK version using the following procedure.
How to change the JDK Version for an Existing Cluster
How to change the JDK Version for an Existing Cluster
-
Re-run Ambari Server Setup.
ambari-server setup
-
At the prompt to change the JDK, Enter y.
Do you want to change Oracle JDK [y/n] (n)?
y
-
At the prompt to choose a JDK, Enter
1
to change the JDK to v1.7.[1] - Oracle JDK 1.7
[2] - Oracle JDK 1.6 [3] - Custom JDK Enter choice: 3
If you choose Oracle JDK 1.7 or Oracle JDK 1.6, the JDK you choose downloads and installs automatically.
-
If you choose
Custom JDK
, verify or add the custom JDK path on all hosts in the cluster. -
After setup completes, you must restart each component for the new JDK to be used by the Hadoop services.
Using the Ambari Web UI, do the following tasks:
-
Restart each component
-
Restart each host
-
Restart all services
-
For more information about managing services in your cluster, see Managing Services.
Using Ambari Blueprints
Overview: Ambari Blueprints
Ambari Blueprints provide an API to perform cluster installations. You can build a reusable “blueprint” that defines which Stack to use, how Service Components should be laid out across a cluster and what configurations to set.

After setting up a blueprint, you can call the API to instantiate the cluster by providing the list of hosts to use. The Ambari Blueprint framework promotes reusability and facilitates automating cluster installations without UI interaction.
Learn more about Ambari Blueprints API on the Ambari Wiki.
Configuring HDP Stack Repositories for Red Hat Satellite
As part of installing HDP Stack with Ambari, HDP.repo
and HDP-UTILS.repo
files are generated and distributed to the cluster hosts based on the Base URL user
input from the Cluster Install Wizard during the Select Stack step. In cases where
you are using Red Hat Satellite to manage your Linux infrastructure, you can disable
the repositories defined in the HDP Stack .repo files and instead leverage Red Hat
Satellite.
How To Configure HDP Stack Repositories for Red Hat Satellite
How To Configure HDP Stack Repositories for Red Hat Satellite
To disable the repositories defined in the HDP Stack.repo files:
-
Before starting the Ambari Server and installing a cluster, on the Ambari Server browse to the Stacks definition directory.
cd /var/lib/ambari-server/resources/stacks/
-
Browse the install hook directory:
For HDP 2.0 or HDP 2.1 Stack
cd HDP/2.0.6/hooks/before-INSTALL/templates
For HDP 1.3 Stack
cd HDP/1.3.2/hooks/before-INSTALL/templates
-
Modify the.repo template file
vi repo_suse_rhel.j2
-
Set the enabled property to 0 to disable the repository.
enabled=0
-
Save and exit.
-
Start the Ambari Server and proceed with your install.
The .repo files will still be generated and distributed during cluster install but the repositories defined in the .repo files will not be enabled.
Tuning Ambari Performance
For clusters larger than 200 nodes, calculate and set a larger task cache size on the Ambari server.
How To Tune Ambari Performance
For clusters larger than 200 nodes:
-
Calculate the new, larger cache size, using the following relationship:
ecCacheSizeValue=60*
<cluster_size> where <cluster_size> is the number of nodes in the cluster. -
On the Ambari Server host, in
etc/ambari-server/conf/ambari-properties
, add the following property and value:server.ecCacheSize=<ecCacheSizeValue>
where <ecCacheSizeValue> is the value calculated previously, based on the number of nodes in the cluster. -
Restart Ambari Server.
ambari-server restart
Using Ambari Views
Ambari includes the Ambari Views Framework, which allows for developers to create UI components that “plug into” the Ambari Web interface. Ambari includes a built-in set of Views that are pre-deployed. This section describes the views that are included with Ambari and their configuration.
View |
Description |
HDP Stacks |
Required Services |
---|---|---|---|
Tez |
View information related to Tez jobs that are executing on the cluster. |
HDP 2.2 or later |
HDFS, YARN, Tez, Hive, Pig |
Slider |
A tool to help deploy and manage Slider-based applications. |
HDP 2.1 or later |
HDFS, YARN |
Jobs |
A visualization tool for Hive queries that execute on the Tez engine. |
HDP 2.1 or later |
HDFS, YARN, Tez, Hive |
Learning More About Views
You can learn more about the Views Framework at the following resources:
Resource |
URL |
---|---|
Administering Views |
Ambari Administration Guide - Managing Views |
Ambari Project Wiki |
|
Example Views |
https://github.com/apache/ambari/tree/trunk/ambari-views/examples |
View Contributions |
Tez View
Tez is a general, next-generation execution engine like MapReduce that can efficiently execute jobs from multiple applications such as Apache Hive and Apache Pig. When you run a job such as a Hive query or Tez script using Tez, you can use the Tez View to track and debug the execution of that job. Topics in this section describe how to configure, deploy and use the Tez View to execute jobs in your cluster.
Configuring Tez in Your Cluster
In your cluster, confirm the following configurations are set:
Configuration |
Property |
Comments |
---|---|---|
yarn-site.xml |
yarn.resourcemanager.system-metrics-publisher.enabled |
Enable generic history service in timeline server. Verify that this property is set=true. |
yarn-site.xml |
yarn.timeline-service.enabled |
Enabled the timeline server for logging details. |
yarn-site.xml |
yarn.timeline-service.webapp.address |
Value must be the IP:PORT on which timeline server is running |
Deploying the Tez View
To deploy the Tez View, you must first configure Ambari for Tez, and then configure Tez to make use of the Tez View in Ambari.
-
Configure Ambari for Tez.
-
From the Ambari Administration interface, browse to the Views section.
-
Click to expand the Tez view and click Create Instance.
-
Enter the instance name, the display name and description.
-
Enter the configuration properties for your cluster.
Property
Description
Example
YARN Timeline Server URL (required)
The URL to the YARN Application Timeline Server, used to provide Tez information. Typically this is the yarn.timeline-service.webapp.address property in the yarn-site.xml configuration. URL must be accessible from Ambari Server host.
http://yarn.timeline-service.hostname:8188
YARN ResourceManager URL (required)
The URL to the YARN ResourceManager, used to provide YARN Application data. Typically this is the yarn.resourcemanager.webapp.address property in the yarn-site.xml configuration. URL must be accessible from Ambari Server host.
http://yarn.timeline-service.hostname:8088
-
Save the View.
-
-
Configure Tez to make use of the Tez View in Ambari:
-
From
Ambari > Admin
, Open the Tez View, then choose "Go To Instance". -
Copy the URL for the Tez View from your web browser's address bar.
-
Select
Services > Tez > Configs
. -
In
custom tez-site
, add the following property:Key: tez.tez-ui.history-url.base Value: <Tez View URL>
where<Tez View URL>
is the the URL you copied from the browser session for the open Tez View. -
Restart Tez.
-
Restart Hive.
-
For more information about managing Ambari Views, see Managing Views in the Ambari Administration Guide.
Hive SQL on Tez - DAG, Vertex and Task
In Hive, the user query written in SQL is compiled and for execution converted into
a Tez execution graph, or more precisely a Directed Acyclic Graph (DAG). A DAG is
a collection of Vertices where each Vertex executes a part, or fragment of the user
Query. The directed connections between Vertices determine the order in which they
are executed. For example, the Vertex to read a table has to be run before a filter
can be applied to the rows of that table.
Let’s say that a Vertex reads a user table. This table can be very large and distributed
across multiple machines and multiple racks. So, this table read is achieved by running
many tasks in parallel. Here is a simplified example using a sample query that shows
the execution of a SQL query in Hive.

Executing a SQL query in Hive
The Tez View tool lets your more easily understand and debug any submitted Tez job. Examples of Tez jobs include: a Hive query or Pig script executed using the Tez execution engine. Specifically, Tez helps you do the following tasks:
Identify the Tez DAG for your job
- The Tez View displays a list of jobs sorted by time, latest first. You can search
a job using the following fields:
-
DagID
-
User
-
Start Time
-
Job Status
Tez Job Status Descriptions
Status
Description
Submitted
The DAG has been submitted to Tez but has not started running yet.
Running
The DAG is currently running.
Succeeded
The DAG completed successfully.
Failed
The DAG failed to complete successfully.
Killed
The DAG was stopped manually.
Error
An internal error occurred when executing the DAG.

The Tez View is the primary entry point for finding a Tez job. At this point, no other UI links to the Tez View. To select columns shown in the Tez View, choose the wheel icon, select field names, then choose OK.

Better Understand How your Job is being Executed
This is the primary use case that was not available earlier. Users were not able to get insight into how their tasks are running. This allows the user to identify the complexity and progress of a running job.
- The View Tab shows the following:
-
The DAG graphical view
-
All Vertices
-
Tasks per Vertex on top right of Vertex
-
Failed Vertex displays red to provide visual contrast with successful vertices that display green
-
Details of timelines are available on mouse-over on a Vertex

The View Tab provides a launching place to further investigate the Vertices that have failures or are taking time.
Identify the Cause of a Failed Job
Previously, a Tez task that failed gave an error code such as 1. Someone familiar with Tez error logs had to log in and review the logs to find why a particular task failed. The Tez View exposes errors in a way that you can more easily find and report.
- When a Tez task fails, you must be able to:
-
Identify the reason for task failure
-
Capture the reason for task failure
When a Tez task fails, the Tez Detail Tab show the failure as follows:

See Details of Failing Tasks
Multiple task failures may occur. The Tez Tasks Tab lets you see all tasks that failed and examine the reason and logs for each failure. Logs for genuine failures; not for killed tasks are available to download from the Tez Tasks Tab.

Identify the Cause of a Slow-Performing Job
The Tez View shows counters at the Vertex and Task levels that let you understand why a certain task is performing more slowly than expected.
Counters at Vertex and Task Level
Counters are available at the DAG, Vertex, and Task levels Counters help you understand the task size better and find any anomalies. Elapsed time is one of the primary counters to look at.
DAG-level Counters

Vertex-level Counters

Task-level Counters

Monitor Task Progress for a Job
The Tez View shows task progress by increasing count of completed tasks and total tasks. This allows you identify hung tasks and get insight into long running tasks.
Using the Jobs View
The Jobs view provides a visualization for Hive queries that have executed on the Tez engine.
Deploying the Jobs View
Refer to the Ambari Administration guide for general information about Managing Views.
-
From the Ambari Administration interface, browse to the Views section.
-
Click to expand the
Jobs
view and clickCreate Instance
. -
Enter the instance name, the display name and description.
-
Enter the configuration properties for your cluster.
Property
Description
Example
yarn.ats.url (required)
The URL to the YARN Application Timeline Server, used to provide Tez information. Typically this is the yarn.timeline-service.webapp.address property in the yarn-site.xml configuration. URL must be accessible from Ambari Server host.
http://yarn.timeline-service.hostname:8188
yarn.resourcemanager.url (required)
The URL to the YARN ResourceManager, used to provide YARN Application data. Typically this is the yarn.resourcemanager.webapp.address property in the yarn-site.xml configuration. URL must be accessible from Ambari Server host.
http://yarn.timeline-service.hostname:8088
-
Save the view.
Using the Slider View
Slider is a framework for deploying and managing long-running applications on YARN. When applications are packaged using Slider for YARN, the Slider View can be used to help deploy and manage those applications from Ambari.
Deploying the Slider View
Refer to the Ambari Administration guide for general information about Managing Views.
-
From the Ambari Administration interface, browse to the Views section.
-
Click to expand the Slider view and click Create Instance.
-
Enter the instance name, the display name and description.
-
Enter the configuration properties for your cluster.
Property
Description
Example
Ambari Server URL (required)
The Ambari REST URL to the cluster resource.
http://ambari.server:8080/api/v1/clusters/MyCluster
Ambari Server Username (required)
The username to connect to Ambari. Must be an Ambari Admin user.
admin
Ambari Server Password (required)
The password for the Ambari user.
password
Slider User
The user to deploy slider applications as. By default, the applications will be deployed as the “yarn” service account user. To use the current logged-in Ambari user, enter
${username}
.joe.user or ${username}
Kerberos Principal
The Kerberos principal for Ambari views. This principal identifies the process in which the view runs. Only required if your cluster is configured for Kerberos. Be sure to configure the view principal as a proxy user in core-site.
view-principal@EXAMPLE.CO
Kerberos Keytab
The Kerberos keytab for Ambari views. Only required if your cluster is configured for Kerberos.
/path/to/keytab/view-principal.headless.keytab
-
Save the view.