DPS Platform Administration
Also available as:
PDF

Chapter 4. Hortonworks DataPlane Service Terminology

You should be familiar with the terminology and concepts used in Hortonworks DataPlane Service (DPS) and in the services that interface with the DPS infrastructure.

DPS Core

DPS Core is a UI service platform from which you can manage and monitor various services on multiple Hortonworks Data Platform (HDP) Hadoop clusters. You can install DPS Core on an HDP cluster or remote to the cluster.

Hortonworks DataPlane Service (DPS)

The family of components that include the DPS Core service platform and all services that plug into it.

service

An autonomous component in the DPS environment. DPS Core is a service, as is each component that is enabled and managed through DPS Core, such as Data Lifecycle Manager (DLM). DPS Core and each of its plugin services must be installed as Docker containers.

cluster

A typical HDP Hadoop cluster. See the Cluster Planning guide for details.

The cluster hosts the various Systems of Record (SoRs) for metadata (Apache Hive, Apache Atlas, Apache Ranger, HDFS, and so on) that DPS Core and associated plugin services rely on. In an on-premise environment, a cluster often equates to a data center. However, a single data center can contain multiple HDP Hadoop clusters.

Data Lifecycle Manager (DLM) Service

DLM is a UI service that is enabled through DPS Core. From the DLM UI you can create and manage replication and disaster recovery policies and jobs.

DLM Engine

Also referred to as the Beacon engine, the backend replication engine required for Data Lifecycle Manager. The DLM engine must be installed as a management pack on each cluster that is to be used in data replication jobs. The engine maintains, in a configured database, information about clusters and policies that are involved in replication.

data center

The facility that contains the computer, server, and storage systems and associated infrastructure, such as routers, switches, and so forth. Corporate data is stored, managed, and distributed from the data center. In an on-premise environment, a data center is often composed of a single Hadoop cluster.

Data Steward Studio (DSS) Service

DSS is a UI service that is enabled through DPS Core.

DSS Profiler Agent

Enable the data steward to gather and view information about data distribution. For example, view the distribution between males and females in column “Gender”, or min/max/mean/null values in a column named “avg_income”. The profilers run at regularly scheduled intervals and periodically generate profiled data. Profilers work with data sourced from Apache Ranger Audit Logs, Apache Atlas Metadata Store, and Apache Hive.

asset collection

A list of assets that have been grouped and assigned unique search criteria by a data steward for purposes of management, administration, and so forth. Allows for organizing a data lake by applying uniform policies and providing access to a limited set of users.

The content of an asset collection is a static list that can be modified only by a user. Therefore, adding new assets to a collection must be done manually.

data asset

A specific instance of a data type, including the related attributes and metadata, that is typically managed as a single unit by components like Atlas and Ranger. A data asset could include a specific instance of an Apache Hive database, table, or column; an HBase namespace, table, or column family; an individual file or a collection of files, and so forth. An asset can belong to only one asset collection. Data assets are also known as “entities” in Atlas.

data lake

A trusted and governed data repository that stores, processes, and provides access to many kinds of enterprise data to support data discovery, data preparation, analytics, insights, and predictive analytics. In the context of Hortonworks DPS, a data lake can be realized in practice with an Apache Ambari-managed Hadoop cluster that runs Apache Atlas for metadata and governance services, and Apache Knox and Apache Ranger for security services.