Tutorial: How to Set Up HDCloud for AWS (BASIC Template)
This tutorial will help you set up Hortonworks Data Cloud for AWS using the basic template, and create an HDP cluster. The tutorial assumes no prior experience with AWS.
In this tutorial, we will set up Hortonworks Data Cloud on AWS, including:
- Meeting the prerequisites
- Subscribing to HDCloud services on AWS Marketplace
- Launching the cloud controller using the basic template
- Exploring AWS resources created
- Accessing the cloud controller UI
- Creating a cluster
- Working with the cloud dashboard
- Opening additional ports
- Cleaning up to avoid further charges
If you encounter errors while performing the steps, refer to the Troubleshooting documentation.
Meet the Prerequisites
Create an AWS Account
In order to launch HDCloud for AWS, you need to have an AWS account. You can set one up at https://aws.amazon.com/. Creating an AWS account is free, but you need to add a credit card that will be charged once you start running AWS services. Alternatively, you may want to contact your IT to find out if your company has an account to which you can be added.
Select an AWS Region
Next, decide in which region you would like launch the cloud controller and clusters. The following regions are supported:
- US East (N.Virginia)
- US West (Oregon)
- EU Central (Frankfurt)
- EU West (Dublin)
- Asia Pacific (Tokyo)
In the tutorial, I will be using the US West (Oregon) region. You may want to pick the region that is nearest to your location - unless you have other constraints. For example, if you have data in Amazon S3 that you will later want to access from your cluster, you will want your clusters to be in the same region as your Amazon S3 data.
Create an SSH Key Pair
Once you’ve decided which region to use, you need to create an SSH key pair in that region. To do that:
- Navigate to the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
- Check the region listed in the top right corner to make sure that you are in the correct region.
- In the left pane, find NETWORK AND SECURITY and click on Key Pairs.
- Click on Create Key Pair to create a new key pair.
- Your private key file will be automatically downloaded onto your computer.
Make sure to save it in a secure location. You will need it to SSH to the cluster nodes. You may want to change access settings for the file using
chmod 400 my-key-pair.pem
Now that you've met all the prerequisites, you can subscribe to HDCLoud services on AWS Marketplace. Let's get started!
Subscribe to HDCloud Services
In order to run HDCloud for AWS, you need to subscribe to two AWS Marketplace services. You can access them by searching the https://aws.amazon.com/marketplace/ or by clicking on these links:
- Hortonworks Data Cloud - Controller Service (allows you to launch HDCLoud)
Hortonworks Data Cloud - HDP Services (allows you to create clusters)
For each of the services, you need to:
Open the listing.
Click ACCEPT SOFTWARE TERMS.
This will add these two services to Your Software. You are all set to launch the cloud controller!
Launch the Cloud Controller
You'll first navigate to the Hortonworks Data - Cloud Controller Service AWS Marketplace listing and then launch the cloud controller using the BASIC Cloud Formation template.
Controller Service AWS Marketplace Listing
Navigate to the Hortonworks Data Cloud - Controller Service listing page:
The only setting that you need to review and change is the Region, which should be the same as the region that you chose in the prerequisites.
Click on Launch with CloudFormation Console and you will be redirected to the Create stack form in the CloudFormation console.
On the Select Template page, your template link is already provided, so just click Next.
On the Specify Details page, provide the details required:
- Stack name: If you want to, you can change it to a shorter name of your choice.
- Controller Instance Type: I recommend that you keep the default. If you pick instance type that is not powerful enough, you will run into issues.
- Email Address and Admin Password: Provide a valid email address and create a password. make sure to remember or write down these credentials, as you will later use them to log in to the cloud controller UI.
- SSH KeyName: This is the SSH key pair that you created as a prerequisite. If you can’t see it as an option in the dropdown, check the top right corner to make sure that you are using the same region.
- Remote Access: This should be a range of IP addresses that can reach the cloud controller. You can use this tool http://www.ipaddressguide.com/cidr#range to calculate a valid CIDR range that includes your public IP address. Or, if you are just playing around, you can enter “0.0.0.0/0”, which will allow access from all IP addresses. Never use “0.0.0.0/0” for a production cluster or a long-running cluster.
The parameters listed in SmartSense Configuration are optional. Enter your SmartSense ID and opt in to SmartSense telemetry if you would like to use flex support.
After you've entered all required values, click Next.
(Optional step) On the Options page, under Advanced, you can change the setting for Rollback on Failure. By default, this is set to Yes, which means that all of the AWS resources will be deleted if launching the stack fails, and you will avoid being charged for the resources. You can change the setting to No if in case of a failure, you want to keep the resources for troubleshooting purposes.
Finally, on the Review page, review the information provided and check I acknowledge that AWS CloudFormation might create IAM resources, and then click CREATE.
Refresh the CloudFormation console. You will see the status of your stack as CREATE_IN_PROGRESS. If everything goes well, after about 15 minutes the status will change to CREATE_COMPLETE, at which point you will be able to proceed to the next step. Meanwhile, let's explore AWS dashboards.
Explore AWS Resources Created
While your cloud controller is being launched, you can click on the Events and Resources tabs to see what AWS resources are being launched on your behalf:
A new VPC, subnet, route table, and Internet gateway were created.
- A new EC2 instance was created to run the cloud controller. A new security group was created and access rules were defined on the security group.
New IAM roles were created.
Once the stack status changed to CREATE_COMPLETE, you can access the cloud controller UI.
Access the Cloud Controller UI
To access the cloud controller UI, select the stack that you created earlier, click on Outputs, and click on the CloudURL:
Even though your browser will tell you that the connection is unsafe, proceed to the UI and log in with the credentials (email address and password) that you provided in the CloudFormation template.
After logging in, you will see the dashboard:
Now that your cloud controller is up and running, you can create your first cluster.
Create a Cluster
On the dashboard, click CREATE CLUSTER to display the form:
The only parameters that you are required to enter are Cluster Name, password, and confirm password. All other fields are pre-populated and you can keep the defaults. Here is a brief explanation for each of the parameters:
- Cluster Name: Enter a name for your cluster.
- HDP Version: Select HDP version 2.5 or 2.6. For each version, a set of preconfigured cluster types is available. I'm going to keep the default HDP version.
- Cluster Type: Select from the configurations available for the HDP version that you selected. I'm going to keep the default cluster type.
- Master Instance Type, Worker Instance Type, and Compute Instance Type: I recommend that you keep the defaults. If you pick instance types that are not powerful enough, you will run into issues.
- (Worker) Instance Count: This determines the number of worker nodes.
- (Compute) Instance Count: This determines the number of compute nodes. If you check Use Spot Instances, spot instances will be used instead of on-demand instances.
- SSH Key Name: Your SSH key name should be pre-populated.
- Remote Access: Same as with the cloud controller, this must be a valid CIDR range that will allow you to connect to the cluster.
- Cluster User: Enter credentials that you want to use for your cluster. These are different from the cloud controller credentials; you will use them to log in to the Ambari web UI.
- Protected Gateway Access: I recommend that you keep the defaults. If you uncheck the two options that are pre-checked, you won’t be able to access Ambari web UI. Checking the third option will give you access to additional cluster UIs.
Optionally, in each of the sections you can click on SHOW ADVANCED OPTIONS to display additional options. If you are interested in learning about these options, refer to the Create Cluster documentation.
Click on CREATE CLUSTER. You have an option to:
- Receive an email notification when your cluster is ready
- Save the cluster as a template
- View CLI JSON that can be used for creating a cluster via HDCloud CLI
Click on YES, CREATE A CLUSTER.
Now you will see a cluster tile appear on the dashboard:
Click on the tile to see the cluster details.
In the EVENT HISTORY log, you can see that a new stack is being launched in the CloudFormation console, then EC2 instances are started to run your cluster nodes, and an Ambari cluster is built. As you can see in the screenshot below, it took 15 minutes to build my 4-node cluster.
Once your cluster is ready, its status will change to RUNNING:
Congratulations! You've just created your first cluster!
Get Started with the Cloud Dashboard
Let’s explore a few shortcuts that you should be aware of when working with HDCloud.
Click on the icon to copy complete SSH information for a specific node:
If you are using a Mac, you can paste it into your terminal and - assuming that your private key is available on your computer - you should be able to access your cluster.
If you are using Windows and need to set up your SSH, refer to http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html.
Next, click on the Ambari link to open the Ambari Web UI in a browser:
Log in to Ambari web UI using the credentials that you specified when creating your cluster. Default user was admin, so unless you changed the default, this user should log you in.
Click on CLUSTER ACTIONS > Resize and resize your cluster by adding one node:
You can also explore the tabs where your cluster settings are available:
Explore Menu Options
Click on the menu icon to see other capabilities available in the cloud controller UI:
- CLUSTERS: This is where you are right now.
- CLUSTER TEMPLATES: When creating a cluster, you have an option to save a cluster template. The CLUSTER TEMPLATES page allows you to manage saved cluster templates.
- SHARED SERVICES: This page allows you to create and manage Hive and Druid metastores. To get started with RDS as a Hive metastore, see this tutorial.
- HISTORY: This page allows you to generate a historical report including all clusters associated with this cloud controller.
- DOWNLOAD CLI: Allows you to download the HDCloud CLI that you can use to create more clusters. To get started with the CLI, refer to this tutorial.
Open Additional Ports
After creating a cluster, you may notice that certain ports are not opened by default, so you may need to manually open these ports by editing the inbound access on the security group. I will show you how to open YARN Resource Manager UI (8088) and Hive UI (10502) ports by manually editing the inbound access on the master node security group.
On AWS, from the Services menu, select EC2 to navigate to the EC2 console.
In the left pane, in the INSTANCES section, click on Instances:
Note: If you can’t see your instances, check the top right corner to make sure that you are in the correct region.
Identify the instance corresponding to your master node and. The name of the instance should be
-1-master. Next, select that instance. This will allow you to see the Description tab, which includes the link to the security group configuration:
Click on the security group URL to open the Security Group section.
Select the Inbound tab:
Check if 8088 and 10502 are found in the Port Range column. If not, add them by clicking the Edit button, then Add Rule, and add a new Custom TCP Rule for port 8088 with source “0.0.0.0/0”. Next, do the same for port 10502. Save changes by hitting the Save button.
Once you don’t need your cluster, you can terminate it by clicking on CLUSTER ACTIONS > TERMINATE:
This will delete all the EC2 instances that were used to run cluster nodes.
After deleting the cluster, you can delete the cloud controller. From the CloudFormation dashboard, delete the stack corresponding to the cloud controller:
If you try deleting the cloud controller before terminating all the clusters associated with it, you will run into errors.
To avoid unnecessary charges to your AWS account, always make sure that the stacks corresponding to the cluster and the cloud controller were successfully deleted in the CloudFormation console and that the EC2 instances running the cloud controller and cluster nodes were deleted in the EC2 console.
Check out other tutorials: