Data Governance
Also available as:
PDF
loading table of contents...

Apache Atlas Architecture

The following image shows the Atlas components.

Atlas components can be grouped under the following categories:

  • Core

  • Integration

  • Metadata Sources

  • Applications

Core

This category contains the components that implement the core of Atlas functionality, including:

Type System: Atlas allows you to define a model for metadata objects. This model is composed of "types" definitions. "Entities" are instances of types that represent the actual metadata objects. All metadata objects managed by Atlas (such as Hive tables) are modeled using types, and represented as entities.

One key point to note is that the generic nature of the modeling in Atlas allows data stewards and integrators to define both technical metadata and business metadata. It is also possible to use Atlas to define rich relationships between technical and business metadata.

Ingest / Export: The Ingest component allows metadata to be added to Atlas. Similarly, the Export component exposes metadata changes detected by Atlas to be raised as events. Consumers can use these change events to react to metadata changes in real time.

Graph Engine: Internally, Atlas represents metadata objects using a Graph model. This facilitates flexibility and rich relationships between metadata objects. The Graph Engine is a component that is responsible for translating between types and entities of the Type System, as well as the underlying Graph model. In addition to managing the Graph objects, The Graph Engine also creates the appropriate indices for the metadata objects to facilitate efficient searches.

Titan: Currently, Atlas uses the Titan Graph Database to store the metadata objects. Titan is used as a library within Atlas. Titan uses two stores. The Metadata store is configured to use HBase by default, and the Index store is configured to use Solr. It is also possible to use BerkeleyDB as the Metadata store, and ElasticSearch as the Index store, by building with those corresponding profiles. The Metadata store is used for storing the metadata objects, and the Index store is used for storing indices of the Metadata properties to enable efficient search.

Integration

You can manage metadata in Atlas using the following methods:

API: All functionality of Atlas is exposed to end users via a REST API that allows types and entities to be created, updated, and deleted. It is also the primary mechanism to query and discover the types and entities managed by Atlas.

Messaging: In addition to the API, you can integrate with Atlas using a messaging interface that is based on Kafka. This is useful both for communicating metadata objects to Atlas, and also to transmit metadata change events from Atlas to applications. The messaging interface is particularly useful if you would like to use a more loosely coupled integration with Atlas that could allow for better scalability and reliability. Atlas uses Apache Kafka as a notification server for communication between hooks and downstream consumers of metadata notification events. Events are written by the hooks and Atlas to different Kafka topics.

Metadata Sources

Currently, Atlas supports ingesting and managing metadata from the following sources:

  • Hive

  • Sqoop

  • Storm/Kafka (limited support)

  • Falcon (limited support)

As a result of this integration:

  • There are metadata models that Atlas defines natively to represent objects of these components.

  • Atlas provides mechanisms to ingest metadata objects from these components (in real time, or in batch mode in some cases).

Applications

Atlas Admin UI: This component is a web-based application that allows data stewards and scientists to discover and annotate metadata. Of primary importance here is a search interface and SQL-like query language that can be used to query the metadata types and objects managed by Atlas. The Admin UI is built using the Atlas REST API.

Ranger Tag-based Policies: Atlas provides data governance capabilities and serves as a common metadata store that is designed to exchange metadata both within and outside of the Hadoop stack. Ranger provides a centralized user interface that can be used to define, administer and manage security policies consistently across all the components of the Hadoop stack. The Atlas-Ranger unites the data classification and metadata store capabilities of Atlas with security enforcement in Ranger.

Business Taxonomy: The metadata objects ingested into Atlas from metadata sources are primarily a form of technical metadata. To enhance the discoverability and governance capabilities, Atlas includes a Business Taxonomy interface that allows users to define a hierarchical set of business terms that represent their business domain, and then associate these terms with Atlas metadata entities Atlas. The Business Taxonomy is included in the Atlas Admin UI, and integrates with Atlas using the REST API.