Data Governance
Also available as:
PDF
loading table of contents...

Publishing Entity Changes to Atlas

Metadata sources can communicate the following forms of entity changes to Atlas: creation, updates, and deletions of entities. These messages are referred to as HookNotification messages in the Atlas source code. There are four types of these messages described in the following sections. The sources can publish these messages to the ATLAS_HOOK topic, and the Atlas server will pick these up and process them. The format of publishing should be using String encoding of Kafka. Any Kafka producer client compatible with the Kafka broker version can be used for this purpose.

ENTITY_CREATE Message

ENTITY_CREATE notification messages are used to add one or more entities to Atlas. An ENTITY_CREATE message has the following format:

{
	"version": {
		"version": version_string
	},
	"message": {
		"entities": [array of entity_definition_structure],
		"type": "ENTITY_CREATE",
		"user": user_name
	}
}

Attribute Definitions:

  • version – This structure has one field version, which is of the form major.minor.revision. This has been introduced to allow Atlas to evolve message formats while still allowing compatibility with older messages. In the 0.7-incubating release, the supported version number is 1.0.0.

  • message – This structure contains the details of the message.

    • entities – This is an array of entities that must be added to Atlas. Each element in the array is an EntityDefinition structure that is defined in Important Atlas API Datatypes.

    • type – The type of this message is ENTITY_CREATE.

    • user – This is the name of the user on whose behalf the entity is being added. Typically it will be the service through which metadata is generated.

Example :

The following example is an hbase_namespace message that is being added to Atlas. Note that it is a single element array, and the element structure matches the entity definition of an hbase_namespace entity.

{
	"version": {
		"version": "1.0.0"
	},
	"message": {
		"entities": [{
			"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
			"id": {
				"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
				"id": "-1467290565135246000",
				"version": 0,
				"typeName": "hbase_namespace",
				"state": "ACTIVE"
			},
			"typeName": "hbase_namespace",
			"values": {
				"qualifiedName": "default@cluster3",
				"owner": "hbase_admin",
				"description": "Default HBase namespace",
				"name": "default"
			},
			"ClassificationNames": [],
			"Classifications": {}
		}],
		"type": "ENTITY_CREATE",
		"user": "integ_user"
	}
}

ENTITY_FULL_UPDATE Message

There is one important difference between the API and the Messaging modes of communication. The API uses two-way communication that allows Atlas to communicate results back to the caller, while Messaging communication is one-way, and there is no notification from Atlas to the system generating the messages.

Consider how hbase_table entities are added using the API. When referring to the hbase_namespace a table belongs to, we could use the GUID of the previously added hbase_namespace entity. We could retrieve this GUID by using either the value returned by a create request, or by looking it up using a query. Both of these synchronous, two-way modes do not apply for the Messaging system. While it is still possible to make API calls, it defeats the purpose of trying to decouple connection between the metadata sources and Atlas.

To address this situation, Atlas provides an ENTITY_FULL_UPDATE, where you can give an entity definition in full, but mark it as an update request. Atlas uses the unique attribute definition of this entity to check to see if this entity already exists in the metadata store. If it does, the entity attributes are updated with values from the request. Otherwise, they are created.

Thus, to add an hbase_table entity and refer to an hbase_namespace entity in one of the attributes, you do not need to fetch the GUID using the API. You can simply include all of these entity definitions in an ENTITY_FULL_UPDATE message and Atlas handles this automatically.

The structure of an ENTITY_FULL_UPDATE message is as follows:

{
	"version": {
		"version": version_string
	},
	"message": {
		"entities": [array of entity_definition_structure],
		"type": "ENTITY_FULL_UPDATE",
		"user": user_name
	}
}

This structure is identical to the ENTITY_CREATE structure, except that the type is ENTITY_FULL_UPDATE.

Example :

In the following example we create an hbase_table entity along with hbase_column_family and hbase_column entities. To refer to the namespace, we include the hbase_namespace entity again in the array of entities at the beginning. The structure is given below, but details of all columns, column families, etc. are omitted for the sake of brevity. They follow the same structure as described in the Atlas Entities API.

{
	"version": {
		"version": "1.0.0"
	},
	"message": {
		"entities": [{
			"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
			"id": {
				"id": "-1467290566519456000",
                                               ...
			},
			"typeName": "hbase_namespace",
			"values": {
				"qualifiedName": "default@cluster3",
                                               ...
			},
			"ClassificationNames": [],
			"Classifications": {}
		}, {
			"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
			"id": {
				"id": "-1467290566519491000",
                                               ...
			},
		}, …, {
			"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
			"id": {
				"id": "-1467290566519615000",
                                               ...
			},
			"typeName": "hbase_table",
			"values": {
				"qualifiedName": "default.webtable@cluster3",
                                               ...
				"namespace": {
					"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
					"id": "-1467290566519456000",
					"version": 0,
					"typeName": "hbase_namespace",
					"state": "ACTIVE"
				}
			},
			"ClassificationNames": [],
			"Classifications": {}
		}],
		"type": "ENTITY_FULL_UPDATE",
		"user": "integ_user"
	}
}

  • Note that the ID of the namespace entity (first in the array) is set to a negative number and not the real GUID even though this might already be created in Atlas.

  • Note also that when the namespace attribute is defined for the table entity, the same negative ID (-1467290566519456000) is used.

  • The "qualifiedName": "default.webtable@cluster3" will be what Atlas uses to lookup the namespace entity for updating, because it is defined as the unique attribute for the hbase_namespace type.

ENTITY_PARTIAL_UPDATE Message

When the entity being updated has already been added to Atlas, you can send a partial update message. This message has the following structure:

{
	"version": {
		"version": "1.0.0"
	},
	"message": {
		"typeName": type_name,
		"attribute": unique_attribute_name,
           "attributeValue": unique_attribute_value,
		"entity": {
			"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
			"id": {
				"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
				"id": temp_id,
				"version": 0,
				"typeName": type_name,
				"state": "ACTIVE"
			},
			"typeName": type_name,
			"values": {
				updated_attribute_name: updated_attribute_value
			},
			"ClassificationNames": [],
			"Classifications": {}
		},
		"type": "ENTITY_PARTIAL_UPDATE",
		"user": user_name
	}
}

The structure is very similar to the ENTITY_CREATE and ENTITY_FULL_UPDATE messages, with the following differences:

  • typeName – The name of the type being updated.

  • attribute – The unique attribute name of the entity being updated.

  • attributeValue – The value of the unique attribute.

  • entity – This is a partial EntityDefinition structure with the following fields:

    • id – This is a typical ID structure as seen in an EntityDefinition, but the ID value can be a temporary value and not the actual GUID.

    • values – This is a map whose keys are the attributes of the type that is being updated along with the new values.

Using the typeName, attribute and attributeValue, Atlas can locate the entity that needs to be updated. These parameters are similar to the API parameters described in Update a Subset of Entity Attributes.

Example :

In the following example we update an hbase_table entity with qualifiedNamedefault.webtable@cluster3, and set the isEnabled attribute to false, we can add an ENTITY_PARTIAL_UPDATE message as follows:

{
	"version": {
		"version": "1.0.0"
	},
	"message": {
		"typeName": "hbase_table",
		"attribute": "qualifiedName",
		"entity": {
			"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
			"id": {
				"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
				"id": "-1467290566551498000",
				"version": 0,
				"typeName": "hbase_table",
				"state": "ACTIVE"
			},
			"typeName": "hbase_table",
			"values": {
				"isEnabled": false
			},
			"ClassificationNames": [],
			"Classifications": {}
		},
		"attributeValue": "default.webtable@cluster3",
		"type": "ENTITY_PARTIAL_UPDATE",
		"user": "integ_user"
	}
}

ENTITY_DELETE Message

You can use ENTITY_DELETE to deleted an entity. This message has the following structure:

{
	"version": {
		"version": version_string
	},
	"message": {
		"typeName": type_name,
		"attribute": unique_attribute_name,
		"attributeValue": unique_attribute_value,
		"type": "ENTITY_DELETE",
		"user": user_name
	}
}

The message structure is a subset of the ENTITY_PARTIAL_UPDATE structure.

  • typeName – The type name of the entity being deleted.

  • attribute – The unique attribute name of the type being deleted.

  • attributeValue – The value of the unique attribute of the type being deleted.

Note that these three attributes form the key through which Atlas can identify an entity to delete, similar to how it can reference an entity for a partial update.

Example :

The following message can be used to delete a hbase_table with a qualifiedName of default.webtable@cluster3.

{
	"version": {
		"version": "1.0.0"
	},
	"message": {
		"typeName": "hbase_table",
		"attribute": "qualifiedName",
		"attributeValue": "default.webtable@cluster3",
		"type": "ENTITY_DELETE",
		"user": "integ_user"
	}
}