Data Governance
Also available as:
PDF
loading table of contents...

Atlas Attributes

We have noted that attributes are defined inside composite metatypes such as Class and Struct, and that attributes have a name and a metatype value. However, attributes in Atlas have additional properties that define more concepts related to the type system.

An attribute has the following properties:

name: string, 
typeName: string, 
constraintDefs: list<AtlasContraintDef>, 
isIndexable: boolean, 
isUnique: boolean, 
isOptional : boolean,
cardinality: enum

  • name – The attribute name.

  • typeName – The metatype name of the attribute (native, collection, or composite).

  • constraintDefs – This list indicates an aspect of modeling. If we want to impose custom constraints on the attributes of a type, we can specify those constraints using this field. Let’s take the example of type hive_table. Hive column is a dependent attribute of a Hive table, and does not have a lifecycle of its own. Therefore we can impose a constraint on a hive_column entity that whenever an entity of type hive_table is deleted, all entities of type hive_column contained in hive_table entities should be deleted.

    Let’s examine the attribute definitions in types hive_table and hive_column.

    The columns attribute in type hive_table:

    {
    	"name": "columns",
    	"typeName": "array<hive_column>",
    	"cardinality": "SINGLE",
    	"constraintDefs": [
    		{
    			"type": "mappedFromRef",
    			"params": {
    				"refAttribute": "table"
    			}
    		}
    	],
    	"isIndexable": false,
    	"isOptional": true,
    	"isUnique": false
    }

    And the corresponding table attribute in type hive_column:

    {
    	"name": "table",
    	"typeName": "hive_table",
    	"cardinality": "SINGLE",
    	"constraintDefs": [
    		{
    			"type": "foreignKey",
    			"params": {
    				"onDelete": "cascade"
    			}
    		}
    	],
    	"isIndexable": false,
    	"isOptional": true,
    	"isUnique": false
    }

    “type” : “foreignKey” indicates that column entities are tied to a particular hive_table entity. We have defined an action "onDelete": "cascade" which indicates that if the hive_table entity is deleted, all of the hive_column entities should be deleted.

  • isIndexable – This flag indicates whether this property should be indexed, so that look-ups can be performed using the attribute value as a predicate, which improves efficiency.

  • isUnique

    • This flag is also related to indexing. If an attribute is specified as unique, a special index is created for the attribute in Titan that allows for equality-based look ups.

    • Any attribute with a true value for this flag is treated as a primary key to distinguish the entity from other entities. Therefore, care should be taken ensure that this attribute does model a unique property in the real world.

    For example, consider the name attribute of a hive_table. In isolation, a name is not a unique attribute for a hive_table, because tables with the same name can exist in multiple databases. Even a pair of (database name, table name) is not unique if Atlas is storing metadata of Hive tables among multiple clusters. Only a cluster location, database name, and table name can be deemed unique in the physical world.

  • isOptional – Indicates whether a value is optional or required.

  • cardinality – Indicates whether this attribute is a singleton or could be multi-valued. Possible values are SINGLE, LIST and SET.

With this information, let us expand on the attribute definition of one of the attributes of the Hive table below. Let us look at the "db" attribute, which represents the database to which the Hive table belongs:

db:
    "dataTypeName": "hive_db", 
    "isIndexable": true, 
    “isOptional”: false,
    "isUnique": false, 
    “Cardinality” : “SINGLE”,
    "name": "db", 

Note the false value for isOptional. A table entity cannot be sent without a db reference.

From this description and examples, you can see that attribute definitions can be used to influence specific modeling behavior (constraints, indexing, etc.) to be enforced by the Atlas system.