Data Governance
Also available as:
loading table of contents...

Cataloging Atlas Metadata: Traits and Business Taxonomy

As discussed previously, metadata is added to Atlas as entities (instances) of types (model definitions). Typically, the models are defined by whoever best understands the metadata. For example, the Hive data types are typically defined by someone with a good understanding of Hive types.

Data discovery and governance can be enhanced when metadata use cases are expanded to include business terminology and processes, rather than just technical metadata. This business cataloging can be performed by data stewards or data scientists who act as a bridge between technical and business metadata.

Business metadata can be cataloged using a common business terminology, even if the metadata may not be closely related from a technical standpoint. Using a common business taxonomy enables you to build applications that apply the same governance policies to similar metadata irrespective of their sources of origin. Also, Atlas search capabilities allow you to easily find similar business metadata.

For example, in the finance industry, all data sets that deal with “credit” as a concept can be cataloged as such irrespective of whether they originate from Hive, HBase, or any other data stores. Once similarly cataloged, credit-related policies can be applied to all data assets (entities) cataloged with this concept.

Atlas provides two ways of cataloging metadata: Traits and Business Taxonomy. Loosely speaking, while Traits represent a more free-form way of cataloging or annotating metadata (think of how tags are added to documents in a document management system), Business Taxonomy should relate to a more clearly defined and controlled vocabulary that has specific meanings in a domain, and that is uniformly understood within a certain context.


In the Atlas UI and elsewhere, traits are sometimes referred to as "tags". This document will use the term "traits", as that is the terminology used in the Atlas API.