Add External Data Sources to Unity Catalog Lineage
In my lineage, I have only the Delta streaming table, but I need to include Kafka from the IoT device and tractor, which sends measurements from the field. How to do it?
Source: SunnyData / Hubert Dudek
The Unity Catalogue was built for engineers, but as it expands to business users, we need to provide them with a way to document the lineage of data within the company, including our tractor.
Now you can add your own lineage to Unity Catalog. For business users, you can do it manually through UI, or programmatically for technical folk.
Source: SunnyData / Hubert Dudek
How to add external lineage manually
Before Anything: Set Permissions
First things first, the metastore admin needs to grant “CREATE EXTERNAL METADATA” permission. No one without this permission can add external lineage (this serves for both programmatic and manual approaches).
Step 1: Add external entities
We will add our external entities, Kafka and Tractor. In the catalog, we can find the button “external data”:
Step 2: Add external metadata
And in the external data section, there is an option “external metadata”:
Here, we can add a new entity and view its JSON representation. We can choose from predefined system types or select OTHER:
Step 3: Create relationships between external metadata
Now, in external metadata, we can add relationships with a simple UI. We can also add a relationship between external metadata:
Finally, if we go to the table in the Unity Catalog we can see that the external lineage is included! ↓
How to add external lineage programmatically
Although it is mainly a UI feature, it would be nice to have it under SQL or DABS and save YML with external metadata. I hope it will come, but for now, we can create it using the API/SDK:
# create external entity from databricks.sdk import WorkspaceClient w = WorkspaceClient() w.api_client.do( method="POST", path="/api/2.0/lineage-tracking/external-metadata", body={ "name": "tractor", "description": "", "system_type": "OTHER", "entity_type": "tractor", "columns": ["measurement"] } ) """ ------ response ------- {'name': 'tractor', 'system_type': 'OTHER', 'entity_type': 'tractor', 'description': '', 'columns': ['measurement'], 'owner': 'hubert.dudek@databrickster.com', 'metastore_id': 'c52b2c4f-ed68-40e2-a43d-c3c787c4b7a8', 'create_time': '2025-07-05T10:12:32.176Z', 'created_by': 'hubert.dudek@databrickster.com', 'update_time': '2025-07-05T10:12:32.176Z', 'updated_by': 'hubert.dudek@databrickster.com', 'id': '2c2c9528-b9aa-427b-9197-c3ef2834ab58', 'securable_type': 'EXTERNAL_METADATA', 'securable_kind': 'EXTERNAL_METADATA_STANDARD'} """ # create external lineage from databricks.sdk import WorkspaceClient w = WorkspaceClient() w.api_client.do( method="POST", path="/api/2.0/lineage-tracking/external-lineage", body={ "source": {"external_metadata": {"name": "tractor"}}, "target": {"external_metadata": {"name": "iot_reading"}}, "properties": {}, "columns": [], }, ) """ ------ response ------- {'id': '04e0f0cd-7b04-e28e-0b92-f58c6294877b', 'source': {'external_metadata': {'name': 'tractor'}}, 'target': {'external_metadata': {'name': 'iot_reading'}}} """ # list external lineage from databricks.sdk import WorkspaceClient w = WorkspaceClient() w.api_client.do( method="GET", path="/api/2.0/lineage-tracking/external-lineage", query={ "object_info": {"external_metadata": {"name": "tractor"}}, "lineage_direction": "DOWNSTREAM"} ) """ ------ response ------- {'external_lineage_relationships': [{'external_metadata_info': {'name': 'iot_reading', 'system_type': 'KAFKA', 'entity_type': 'IoT', 'event_time': '2025-07-05T10:14:47Z'}, 'external_lineage_info': {'id': '04e0f0cd-7b04-e28e-0b92-f58c6294877b'}}]} """