Data Governance
Also available as:
PDF
loading table of contents...

Consuming Entity Changes from Atlas

For every entity that Atlas adds, updates (including association and disassociation of Classifications), or deletes, an event is raised from Atlas into the Kafka topic ATLAS_ENTITIES. Applications can consume these events and build functionality that is based on metadata changes.

An excellent example of such an application is Apache Ranger’s Tag Based policy management (http://hortonworks.com/hadoop-tutorial/tag-based-policies-atlas-ranger/).

This section describes the message formats for events that are notified from Atlas. Standard Kafka consumers compatible with the version of the Kafka broker Atlas uses can be used to consume these events.

The messages written to ATLAS_ENTITIES are referred to as EntityNotification messages in the Atlas source code. There are five types of these events.

ENTITY_CREATE Message

An ENTITY_CREATE message is sent when an entity is created in the Atlas metadata store. An ENTITY_CREATE message has the following format:

{
	"version": {
		"version": version_string
	},
	"message": {
		"entity": entity_definition_structure,
		"operationType": "ENTITY_CREATE",
		"Classifications": []
	}
}

The message structure is very similar to the ENTITY_CREATE message in Publishing Entity Changes to Atlas, but with the following differences:

  • version – This structure has one field version, which is of the form major.minor.revision. This has been introduced to allow Atlas to evolve message formats while still allowing compatibility with older messages. In the 0.7-incubating release, the supported version number is 1.0.0. The version number can be used by components to determine if the message is compatible with the structure they can decode.

  • message – This structure contains the details of the entity.

    • entity – This is a single entity that is created. The structure of the entity is exactly the same as the EntityDefinition structure described in Important Atlas API Datatypes. This is a critical difference from the ENTITY_CREATE message in the previous publishing section, in that notifications from Atlas always contain only one entity at a time, and not an array. The other key difference is that because these are entities created by Atlas, the IDs assigned will be the actual GUIDs of the entities.

    • operationType – The type of this message is ENTITY_CREATE.

    • Classifications – This field is empty for this operation.

Example :

When an hbase_table is created, hbase_column and hbase_column_family entities are also created. In the API and messages we have seen thus far, we were creating all of these together. However, as described above, every hbase_column entity is notified in a unique separate message as shown below.

{
	"version": {
		"version": "1.0.0"
	},
	"message": {
		"entity": {
			"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
			"id": {
				"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
				"id": "9027517b-1644-4f64-bf2c-7b6b49ae9ef2",
				"version": 0,
				"typeName": "hbase_column",
				"state": "ACTIVE"
			},
			"typeName": "hbase_column",
			"values": {
				"name": "cssnsi",
				"qualifiedName": "default.webtable.anchor.cssnsi@cluster1",
				"owner": "crawler",
				"type": "string"
			},
			"ClassificationNames": [],
			"Classifications": {}
		},
		"operationType": "ENTITY_CREATE",
		"Classifications": []
	}
}

ENTITY_UPDATE Message

This message is sent when an entity is updated by Atlas. The format of this message is as follows:

{
	"version": {
		"version": version_string
	},
	"message": {
		"entity": entity_definition_structure,
		"operationType": "ENTITY_UPDATE",
		"Classifications": []
	}
}

This structure is similar to the ENTITY_CREATE message described above. One point to note is that Atlas does not currently say what part of the entity has changed, but the entity field has the complete definition (including unchanged attributes).

Example :

Previously in Update a Subset of Entity Attributes we updated the hbase_table to set the isEnabled flag to false. This operation results in an ENTITY_UPDATE event as shown below. The details of all of the columnFamilies, etc. are omitted for the sake of brevity.

{
	"version": {
		"version": "1.0.0"
	},
	"message": {
		"entity": {
			"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
			"id": {
				"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
				"id": "de9c64bd-f7fc-4b63-96fa-52879b651efe",
				"version": 0,
				"typeName": "hbase_table",
				"state": "ACTIVE"
			},
			"typeName": "hbase_table",
			"values": {
				"columnFamilies": [...],
				"name": "webtable",
				"description": "Table that stores crawled information",
				"qualifiedName": "default.webtable@cluster1",
				"isEnabled": false,
				"namespace": {...}
			},
			"ClassificationNames": [],
			"Classifications": {}
		},
		"operationType": "ENTITY_UPDATE",
		"Classifications": []
	}
}

ENTITY_DELETE Message

You can use ENTITY_DELETE to deleted an entity. This message has the following structure:

{
	"version": {
		"version": version_string
	},
	"message": {
		"entity": {
			"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
			"id": id_structure containing GUID of the deleted entity,
			"typeName": typeName,
			"values": empty_map,
			"ClassificationNames": empty_list,
			"Classifications": empty_map
		},
		"operationType": "ENTITY_DELETE",
		"Classifications": []
	}
}

The message structure is similar to the ENTITY_CREATE and ENTITY_UPDATE messages above. The key difference is that the values attribute does not contain any data.

You should also note that the deletion of an entity can result in multiple ENTITY_DELETE messages. This is because when an entity is deleted, any entities referred to in composite attributes of the entity are deleted as well, and these also trigger individual messages.

Example :

In the following example, when the hbase_table is deleted, the composite attributes referred in columnFamilies are deleted as well. This example shows the ENTITY_DELETE message of one such hbase_column_family entity.

{
	"version": {
		"version": "1.0.0"
	},
	"message": {
		"entity": {
			"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
			"id": {
				"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
				"id": "eef88491-8333-4538-8e39-5af1f56de9c5",
				"version": 0,
				"typeName": "hbase_column_family",
				"state": "ACTIVE"
			},
			"typeName": "hbase_column_family",
			"values": {},
			"ClassificationNames": [],
			"Classifications": {}
		},
		"operationType": "ENTITY_DELETE",
		"Classifications": []
	}
}

CLASSIFICATION_ADD Message

This message is sent when a Classification instance is associated with an entity. The format of this message is as follows:

{
	"version": {
		"version": version_string
	},
	"message": {
		"entity": entity_definition_structure,
		"operationType": "CLASSIFICATION_ADD",
		"Classifications": [{
			"typeName": Classification_name,
			"values": {
				Classification_attribute: value,
				...
			}
		}]

	}
}

The message structure is similar to an ENTITY_CREATE message.

  • entity – Contains the entity definition to which the Classification instance is added.

  • Classifications – An array containing information about the associated Classifications. Each attribute is a structure with the following fields:

    • typeName – The name of the Classification being added.

    • values – A map whose keys are the attributes defined in the Classification definition, and the corresponding values defined for in the Classification instance that is associated with the entity.

Example :

When the Retainable Classification is associated with an hbase_column_family, the following CLASSIFICATION_ADD message is generated:

{
	"version": {
		"version": "1.0.0"
	},
	"message": {
		"entity": {
			"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
			"id": {
				"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
				"id": "7d4575d1-97f9-4f70-aa15-d7c3aaba3352",
				"version": 0,
				"typeName": "hbase_column_family",
				"state": "ACTIVE"
			},
			"typeName": "hbase_column_family",
			"values": {
				"name": "contents",
				"inMemory": false,
				"description": "The contents column family that stores the crawled content",
				"versions": 1,
				"compression": "lzo",
				"blockSize": 1024,
				"qualifiedName": "default.webtable.contents@cluster2",
				"columns": [{
					"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
					"id": "340d841a-8682-4ff9-9b8d-797a7aa387e2",
					"version": 0,
					"typeName": "hbase_column",
					"state": "ACTIVE"
				}],
				"owner": "crawler"
			},
			"ClassificationNames": ["Retainable"],
			"Classifications": {
				"Retainable": {
					"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
					"typeName": "Retainable",
					"values": {
						"retentionPeriod": 100
					}
				}
			}
		},
		"operationType": "CLASSIFICATION_ADD",
		"Classifications": [{
			"typeName": "Retainable",
			"values": {
				"retentionPeriod": 100
			}
		}]
	}
}

Note that the Classifications attribute contains the Classification instance details.

CLASSIFICATION_DELETE Message

This message is sent when a Classification instance is disassociated from an entity. The format of this message is as follows:

{
	"version": {
		"version": version_string
	},
	"message": {
		"entity": entity_definition_structure,
		"operationType": "CLASSIFICATION_DELETE",
		"Classifications": []
	}
}

The message structure is similar to an ENTITY_CREATE message.

  • entity – Contains the entity definition from which the Classification instance is disassociated.