Data Steward Studio Administration
Also available as:

DSS Terminology

For a complete list of DataPlane terminology, see: Hortonworks DataPlane Service Terminology.

Hortonworks DataPlane Service (HDS)

The family of components that include the Core service platform and all services that plug into it.

Data Center

The facility that contains the computer, server, and storage systems and associated infrastructure, such as routers, switches, and so forth. Corporate data is stored, managed, and distributed from the data center. In an on-premise environment, a data center hosts one or more Hadoop clusters.


Enables the data steward to gather and view information about different relevant characteristics of data such as shape, distribution, quality, and sensitivity which is important to understand and use the data effectively. For example, view the distribution between males and females in column “Gender”, or min/max/mean/null values in a column named “avg_income”. Profiled data is generated on a periodic basis from the profilers, which run at regularly scheduled intervals. Works with data sourced from Apache Ranger Audit Logs, Apache Atlas Metadata Store, and Hive.

Data Lake

A trusted and governed data repository that stores, processes, and access to many kinds of enterprise data to support data discovery, data preparation, analytics, insights, and predictive analytics. In the context of Hortonworks DPS, a data lake can be realized in practice with an Apache Ambari managed Hadoop cluster that runs Apache Atlas for metadata and governance services, and Apache Knox and Apache Ranger for security services.

Data Asset

A data asset is a physical asset located in the Hadoop ecosystem such as a Hive table which contains business or technical data. A data asset could include a specific instance of an Apache Hive database, table, or column. An asset can belong to only one asset collection. Data assets are equivalent to “entities” in Apache Atlas.

Asset Collection

Asset collections allow users of DSS to manage and govern various kinds of data objects as a single unit through a unified interface. Asset collections help organize and curate information about many assets based on many facets including data content and metadata, such as size/schema/tags/alterations, lineage, and impact on processes and downstream objects in addition to the display of security and governance policies.

The content of an asset collection is a static list that can only be modified by a user. So, adding new assets to a collection must be done manually.