Working with Data Lakes (TP)
Also available as:
PDF

Data lake blueprints

When creating a data lake, you can choose from one of the two available blueprints.

The following data lake blueprints are provided by default in Cloudbreak:

Blueprint Description Node count
HDP 3.1 Data Lake: Apache Ranger, Apache Hive Metastore Includes Apache Ranger and allows all clusters attached to a data lake to connect to the same Hive Metastore.
Note
Note

Hive Metastore has been removed from the HDP 3.x data lake blueprints, but setting up an external database allows all clusters attached to a data lake to connect to the same Hive Metastore.

Includes a single master host group and must include a single node.
HDP 2.6 Data Lake: Apache Ranger, Apache Atlas, Apache Hive Metastore Includes Apache Ranger, Apache Atlas, and Apache Hive Metastore. Includes a single master host group and must include a single node.
HDP 2.6 Data Lake: Apache Ranger, Apache Hive Metastore HA Includes Apache Ranger and Apache Hive Metastore in HA mode. Automatic and manual recovery options are available for this type of data lake. Includes two master host groups.

We recommend either 3 or 5 nodes total for this type of cluster. By default the node count is 3.

Depending on your use case, select one of these blueprints.