Add External Data Sources to Unity Catalog Lineage

In my lineage, I have only the Delta streaming table, but I need to include Kafka from the IoT device and tractor, which sends measurements from the field. How to do it?

Source: SunnyData / Hubert Dudek

The Unity Catalogue was built for engineers, but as it expands to business users, we need to provide them with a way to document the lineage of data within the company, including our tractor.

Now you can add your own lineage to Unity Catalog. For business users, you can do it manually through UI, or programmatically for technical folk.

Source: SunnyData / Hubert Dudek

How to add external lineage manually

Before Anything: Set Permissions
First things first, the metastore admin needs to grant “CREATE EXTERNAL METADATA” permission. No one without this permission can add external lineage (this serves for both programmatic and manual approaches).

Step 1: Add external entities
We will add our external entities, Kafka and Tractor. In the catalog, we can find the button “external data”:

Step 2: Add external metadata
And in the external data section, there is an option “external metadata”:

Here, we can add a new entity and view its JSON representation. We can choose from predefined system types or select OTHER:

Step 3: Create relationships between external metadata
Now, in external metadata, we can add relationships with a simple UI. We can also add a relationship between external metadata:

Finally, if we go to the table in the Unity Catalog we can see that the external lineage is included! ↓

How to add external lineage programmatically

Although it is mainly a UI feature, it would be nice to have it under SQL or DABS and save YML with external metadata. I hope it will come, but for now, we can create it using the API/SDK:

# create external entity

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

w.api_client.do(
    method="POST",
    path="/api/2.0/lineage-tracking/external-metadata",
    body={
        "name": "tractor",
        "description": "",
        "system_type": "OTHER",
        "entity_type": "tractor",
        "columns": ["measurement"]
    }
)

"""
------ response -------
{'name': 'tractor',
 'system_type': 'OTHER',
 'entity_type': 'tractor',
 'description': '',
 'columns': ['measurement'],
 'owner': 'hubert.dudek@databrickster.com',
 'metastore_id': 'c52b2c4f-ed68-40e2-a43d-c3c787c4b7a8',
 'create_time': '2025-07-05T10:12:32.176Z',
 'created_by': 'hubert.dudek@databrickster.com',
 'update_time': '2025-07-05T10:12:32.176Z',
 'updated_by': 'hubert.dudek@databrickster.com',
 'id': '2c2c9528-b9aa-427b-9197-c3ef2834ab58',
 'securable_type': 'EXTERNAL_METADATA',
 'securable_kind': 'EXTERNAL_METADATA_STANDARD'}
 """
 
 # create external lineage

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

w.api_client.do(
    method="POST",
    path="/api/2.0/lineage-tracking/external-lineage",
    body={
        "source": {"external_metadata": {"name": "tractor"}},
        "target": {"external_metadata": {"name": "iot_reading"}},
        "properties": {},
        "columns": [],
    },
)
"""
------ response -------
{'id': '04e0f0cd-7b04-e28e-0b92-f58c6294877b',
 'source': {'external_metadata': {'name': 'tractor'}},
 'target': {'external_metadata': {'name': 'iot_reading'}}}
 """
 
 # list external lineage

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

w.api_client.do(
    method="GET",
    path="/api/2.0/lineage-tracking/external-lineage",
    query={
        "object_info": {"external_metadata": {"name": "tractor"}},
        "lineage_direction": "DOWNSTREAM"}
)
"""
------ response -------
{'external_lineage_relationships': [{'external_metadata_info': {'name': 'iot_reading',
    'system_type': 'KAFKA',
    'entity_type': 'IoT',
    'event_time': '2025-07-05T10:14:47Z'},
   'external_lineage_info': {'id': '04e0f0cd-7b04-e28e-0b92-f58c6294877b'}}]}
"""
Hubert Dudek

Databricks MVP | Advisor to Databricks Product Board and Technical advisor to SunnyData

https://www.linkedin.com/in/hubertdudek/
Next
Next

AI_PARSE_DOCUMENT() Get PDF Invoices Into The Database