Data Governance
Also available as:
loading table of contents...

Atlas System Types

This section describes the available pre-defined Atlas system types (super types).

  • Referenceable – This type represents all entities that can be searched for using a unique qualifiedName attribute.

  • Asset – This type contains attributes such as name, description, and owner. The name attribute is required (multiplicity = required), but the others are optional.

    The purpose of Referenceable and Asset is to provide modelers with a way to enforce consistency when defining and querying entities of their own types. Having these fixed set of attributes allows applications and User Interfaces to make convention-based assumptions about what default attributes they can expect from types.

  • Infrastructure – This type extends Referenceable and Asset, and typically can be used as a common super type for infrastructure metadata objects such as clusters, hosts, etc.

  • DataSet – This type extends Referenceable and Asset. Conceptually, it can be used to represent a type that stores data. In Atlas, Hive tables, Sqoop RDBMS tables, etc., are all types that extend from DataSet. Types that extend DataSet can be expected to have a Schema, in the sense that they would have an attribute that defines attributes of that dataset – for example, the columns attribute in a hive_table. Entities types that extend DataSet also participate in data transformation, and this transformation can be captured by Atlas via lineage (or provenance) graphs.

  • Process – This type extends Referenceable and Asset. Conceptually, it can be used to represent any data transformation operation. For example, an ETL process that transforms a Hive table with raw data to another Hive table that stores some aggregate can be a specific type that extends the Process type. A Process type has two specific attributes: inputs and outputs. Both inputs and outputs are arrays of DataSet entities. Thus an instance of a Process type can use these inputs and outputs to capture how the lineage of a DataSet evolves.