Troubleshooting

Note

If the issue that you are experiencing is not listed below, check the Known Issues section in the Release Notes.

CloudFormation Template Validation Errors

When launching the cloud controller, after clicking Create, you might get a "Stack validation error" or a "Template validation error" if you failed to enter the required CloudFormation template parameters properly.

The table below lists template validation errors and the reasons why they occur:

Template Validation Error Reason for the Error
Stack [stack_name] already exists You chose a Stack name that already exists.
Parameter KeyName failed to satisfy constraint: must be the name of an existing EC2 KeyPair

You did not select an existing SSH Key Name.

In order for an existing key pair to be available in this form, the key pair must exist in the same AWS region. If the key pair that you have previously created is not available in the form, make sure that you created it in the same region in which you're trying to launch the cloud controller. If you need to create a key pair, refer to the AWS documentation.

Parameter RemoteLocation failed to satisfy constraint: must be a single IP address or an IP address range in CIDR notation (for example 203.0.113.5/32) You did not enter a valid CIDR IP for the Remote Access parameter. For example:
  • 10.0.0.0/24 will allow access from 10.0.0.0 through 10.0.0.255.
  • 0.0.0.0/0 will allow access from all.
Parameter EmailAddress failed to satisfy constraint: must be a valid e-mail address You did not enter a valid Email Address.
Parameter AdminPassword failed to satisfy constraint: must be at least 8 characters containing letters, numbers and symbols, except '|'. You did not enter a valid Admin Password that meets the constraints. Note that the '|' character is not allowed.
Parameter SmartSenseId failed to satisfy constraint: Should be empty or a valid SmartSense subscription id like ‘A-00000000-C-00000000’!’ You did not enter a correct SmartSense ID. A valid SmartSense ID uses the following format ‘A-00000000-C-00000000’.
Requires capabilities : [CAPABILITY_IAM] You did not check the "I acknowledge that AWS CloudFormation might create IAM resources." checkbox on the Review page.

Untrusted Connection

Symptom: The stack was created with the CREATE_COMPLETE status; however when you attempt to access the cloud controller URL, your browser warns you about an "Untrusted connection".

Solution: This is normal behavior. Follow the steps described in First Time Access and SSL.

Cloud Controller UI is Not Loading

Symptom: The stack was created with the CREATE_COMPLETE status; however, you are unable to load the cloud controller UI in your browser. The page appears to be loading, yet it remains blank.

Solution: Follow these steps to verify that the cloud controller's security group allows connections from your IP address:

  1. In the AWS CloudFormation console, select the stack and click on the Parameters tab.
  2. Check the value provided for the RemoteLocation parameter and make sure that your public IP address is within this range. This value determines the range of public IP addresses that can access the cloud controller. If your public IP address is not within this range (specified using the CIDR notation), you will not be able to connect to the cloud controller.

    If you don't know your public IP address, the easiest way to find it is by asking Google or other search engine "What's my IP address".

Follow these steps to reconfigure remote access to the cloud controller by manually editing its security group access rules:

  1. Click on the Resources tab.
  2. Find a resource with a Logical ID of CloudbreakSecurityGroup and copy the PhysicalID. You will need this ID to find this security group and then change its settings.
  3. Navigate to the EC2 Dashboard.
  4. In the left pane, find NETWORK & SECURITY and click on Security Groups.
  5. Find the security group identified earlier. The Logical ID that you copied earlier should be listed in the Group ID column.
  6. Select the group identified in the previous step, and click on the Inbound tab.
  7. Check the Source for different connection types, making sure that your public IP address falls within the range specified. If it doesn't, edit the Source values.
  8. Now you should be able to connect to the cloud controller UI.

Errors in the CloudFormation Console

The AWS CloudFormation log is available from the Events tab in the CloudFormation Console:

You can also check the Resources tab to check status for all the resources.

The following sections list commonly reported problems related to launching the cloud controller, and steps to resolve them.

Account Verification

Symptom: Status CREATE_FAILED for Type: AWS::EC2::Instance with an error related to the account verification:. For example:

Solution: These are AWS account verification issues that are between you (or your organization) and AWS, and are not related to the Hortonworks Data Cloud. You must contact AWS support to resolve the issue.

User is Not Authorized

Symptom: Your user is not authorized to create certain resources. For example:

Status Type Status reason
CREATE_FAILED AWS::Lambda::Function User: arn:aws:iam::ID:user/name is not authorized to perform: lambda:CreateFunction

Solution: This means that that your IAM user cannot create IAM roles. This type of error may occur if your account is part of a larger corporate account.

To modify these permissions:

  1. Navigate to the IAM Console.
  2. From the left pane, select Users.
  3. Click on your user.
  4. In the Permissions tab, click on Attach Policy to attach a required policy.

In a corporate environment, you may have to contact your AWS account administrator to get required permissions.

Maximum Number of Resources Has Been Reached

Symptom: CREATE_FAILED for a given resource with a reason such as The maximum number of VPCs has been reached.

Solution: You have reached the limit for a given resource. For example, you can create up to 5 VPCs per region. To learn more, refer to AWS Service Limits documentation.

  1. Review your resources and see if you can reallocate or reuse any of the existing resources:
    • If you have reached the limit for the allowed number of VPCs, you can launch the cloud controller into an existing VPC.
    • If you have reached the limit for a certain instance type, you may be able to use a different instance type.
  2. You can request quota increase by contacting AWS support. You will need to provide a reason for your request.

SSH Key's Length

Symptom: Stack creation fails with the following error:

Status Type Status reason
CREATE_FAILED AWS::CloudFormation::WaitCondition WaitCondition received failed message: 'Failed to create HDC credential. Check if your SSH key's length is at least 2048 and check the startup logs for troubleshooting.' for uniqueId: cbd-init

Solution: Upload or generate a new SSH key pair that is at least 2048 bits.

You Must Opt-in to SmartSense Telemetry

Symptom: Stack creation fails with the following error:

Status Type Status reason
CREATE_FAILED Custom::PrecheckDeployment Failed to create resource. You must opt-in to SmartSense telemetry when entering your existing SmartSenseID!

Solution: Since you entered your SmartSense ID, you must also opt in to SmartSense telemetry by selecting "I have read and opt in to SmartSense telemetry" in the Telemetry Opt In field. This option is available in the SmartSense Configuration section, right below the SmartSenseID..

Selected Subnet Belongs to a Different VPC

Symptom: You may encounter this error when using an advanced template:

Status Type Status reason
CREATE_FAILED Custom::ValidateNetwork Failed to create the resource. The selected subnet belongs to a different VPC.

Solution: This means that you selected a subnet that is not within the VPC selected. You must launch a new cloud controller, making sure that in the Network Configuration section you select a Subnet ID that is within the VPC that you specified in the VPC ID parameter.

RDS Connection Timeout

Symptom: You may encounter this error when using an advanced template:

Status Type Status reason
CREATE_FAILED AWS::CloudFormation::WaitCondition WaitCodition received failed message: 'RDS connectivity issue: psql: could not connect to server: Connection timed out Is the server running on host "myrds.cgkoo2dvo324.us-west-2.rds.amazonaws.com (172.31.23.6) and accepting TCP/IP connections on port 5432?' for uniqueId: cbd-init

Solution: This means that the cloud controller cannot connect to the RDS. The inbound access rules for the security group of your RDS instance don't allow the cloud controller's public IP address to access the RDS instance. There are two ways to solve this problem:

  1. One solution is to create your RDS instance and the cloud controller in the same VPC. This configuration puts the RDS and cloud controller in the same security group, allowing them to communicate without restrictions.
  2. If your RDS instance is in a different VPC and security group than the cloud controller: since you don't know the public IP address of the cloud controller in advance, the only way to ensure that the cloud controller can reach the RDS is by initially setting the inbound access on the RDS's security group to 0.0.0.0/0 (i.e allow access from anywhere). Once the cloud controller is up and running, you can restrict the access rules to only allow inbound access from the public IP address of the EC2 instance running the cloud controller.

RDS Database Doesn't Exist

Symptom: You may encounter this error when using an advanced template:

Status Type Status reason
CREATE_FAILED AWS::CloudFormation::WaitCondition WaitCondition received failed message: 'RDS connectivity issue: psql: FATAL: database "metastoredb" does not exist' for uniqueId: cbd-init

Solution: This means that the Database Name that you specified is incorrect. On the RDS Dashboard, navigate to the Configuration Details for your selected RDS instance and double-check the DB Name parameter.

RDS Authentication Failed

Symptom: You may encounter this error when using an advanced template:

Status Type Status reason
CREATE_FAILED AWS::CloudFormation::WaitCondition WaitCodition received failed message: 'RDS connectivity issue: psql: FATAL: password authentication failed for user "wrongusername" FATAL: password authentication failed for user "wrongusername"' for uniqueId: cbd-init

Solution: This means that the RDS credentials that you provided in the CloudFormation template are incorrect. Launch a new CloudFormation template, making sure that you specify correct credentials this time. You can find your RDS Username in the Configuration Details on the RDS Dashboard.

Cloud Controller is Stuck in CREATE_IN_PROGRESS

Symptom: You may encounter this error when using an advanced template. Your cloud controller is stuck in the CREATE_IN_PROGRESS status. Your EC2 instance was created successfully, but other resources are stuck.

Solution: This typically means that your VPC is misconfigured.

  1. Try accessing the EC2 instance using SSH.
  2. If you are unable to access it, you should check your VPC configuration and reconfigure your VPC.

Symptom: You may encounter an error related to the initializing script. For example:

Status Type Status reason
CREATE_FAILED AWS::CloudFormation::WaitCondition WaitCodition received failed message: 'ERROR: command 'tar -xz -C /bin cbd' exited with status: 2 line: 1' for uniqueId: cbd-init

Solution: This means that something went wrong with the initializing script. If your resources were not rolled back, you can SSH to the EC2 instance running the cloud controller and check the cloudbreak logs such as /var/log/cbd-quick-start.log and /var/log/cbd-init-output.log.

Deleting Cloud Controller Fails

Symptom: DELETE_FAILED error with deleting a stack.

Solution: Make sure that:

If creating the stack fails and the error in the AWS CloudFormation log does not provide enough information, SSH to the EC2 instance running the cloud controller and:

These logs should provide more information than the CloudFormation Events tab.

Errors When Creating a Cluster

If creating your cluster fails, you may be able to diagnose the problem based on an error message printed in the EVENT HISTORY in the cloud controller UI:

Common Errors

Common errors are related to:

In addition, you may encounter the following errors when using the CLI:

If you are not able to diagnose the problem based on the EVENT HISTORY log, SSH to the cloud controller instance and check the /var/lib/cloudbreak-deployment/cbreak.log.

Errors When Registering Hive Metastore

You may experience the following issues when registering a Hive metastore:

When Testing Hive Metastore Connection, the Page is Stuck in the Loading State

Symptom: If you attempt to test the connection with a newly added Hive metastore, the page may freeze in the Loading... state. This means that the cloud controller cannot connect to the RDS database.

Solution: Make sure that the security group configuration for the RDS instance allows the instance to communicate with the cloud controller.

Errors When Accessing Ambari

You may experience the following issues when using Ambari web UI:

502 Bad Gateway

Symptom: If you attempt to access the Ambari web UI before the cluster master node is fully initialized, you will receive a 502 Bad Gateway error.

Solution: This is normal behavior. Once the cluster status changes to RUNNING, you can access Ambari web UI.

If AWS infrastructure is deployed successfully but Ambari cannot be started or agents cannot join the cluster, check these logs:

If deployment fails when Ambari is up and running, and it has already started installing HDP services, then HDCloud will display a message similar to: “Check the Ambari UI to see the failures”. Log in to the Ambari web UI. Errors for specific components or hosts are displayed in red.

Logs

Depending on the deployment stage in which you encounter a problem, check one of the following logs:

LogWhen to Check
In the AWS CloudFormation Console, select the stack. Next, click on the Events tab.

If creating the stack fails, you will see it in this log.

For troubleshooting tips, refer to the Errors in the CloudFormation Console documentation.

SSH to the EC2 instance running the cloud controller and:
  • Check the /var/log/cbd-quick-start.log.
  • Run the `docker logs cbreak_cloudbreak_1` command.
If creating the stack fails and the error in the AWS CloudFormation log does not provide enough information.
In the cloud controller UI, on the cloud controller dashboard, click on the cluster's corresponding tile to navigate to the cluster details page. This page includes cluster EVENT HISTORY.

If cluster create fails, an error message will be visible in the EVENT HISTORY. If the error is AWS-related, you may be able to diagnose and fix it based on the message written to the EVENT HISTORY.

For troubleshooting tips, refer to the Errors When Creating a Cluster documentation.

SSH to the cloud controller instance and check the /var/lib/cloudbreak-deployment/cbreak.log If cluster create fails, check this file to see what happened. It contains aggregated logs collected from the cbd docker containers (cloudbreak, autoscale, authentication, and so on).
SSH to the running AWS master node and/or worker nodes and view:
  • The logs for Salt in /var/log/salt/. Typically, /var/log/salt/minion contains most information.
  • The standard Ambari logs in /var/log/ambari-server or /var/log/ambari-agent.
If AWS infrastructure is deployed successfully but Ambari cannot be started or agents cannot join the cluster, check these logs.
Log in to the Ambari web UI. Errors for specific components or hosts are displayed in red. If deployment fails when Ambari is up and running, and it has already started installing HDP services, then HDCloud will display a message similar to: “Check the Ambari UI to see the failures”.

Troubleshooting Amazon S3

Refer to Troubleshooting Amazon S3 documentation for information on troubleshooting the Amazon S3 integration.