Recursive CTE: The beauty of SQL Self-Referencing Queries

Self-Referencing Queries (Loops in SQL): Recursive CTEs enable a SQL query to refer to itself (similar to a loop in SQL), building results incrementally until a specified condition is met. This means you can iteratively solve problems (such as finding connected data and calculating running values) all within a single query, eliminating the need for manual looping.

Generate Sequences & Patterns:

They enable iterative logic in SQL for tasks like generating sequences or cumulative patterns. For example, a recursive query can produce incremental strings: "a", "ab", "abc", etc., up to "abcdefghij" by repeatedly adding letters, just taking the previous string from itself and adding another:

Source: SunnyData / Hubert Dudek

Similarly, you could generate numeric sequences or other patterns entirely in SQL, which was challenging to do without writing custom code.

WITH RECURSIVE recursive_cte(col) MAX RECURSION LEVEL 10 -- recursive limit, kind of timeout as if more than 10 it will fail
AS (
  SELECT
    'a'
  UNION ALL
    -- get the previous result of the recursive CTE and add another character
  SELECT
    col || CHAR(ASCII(SUBSTR(col, -1)) + 1)
  FROM
    recursive_cte -- it is recursive cte, not a table, it is self-referencing
  WHERE
    LENGTH(col) < 10
)
SELECT
  col
FROM
  recursive_cte;

Traverse Graphs and Networks:

Recursive CTEs are perfect for exploring graph-like relationships such as routes, links, or dependencies. For instance, starting from "New York", a recursive CTE can find all reachable cities (direct or through other cities) and even build the full travel path – e.g., New York → Washington → Raleigh. This is an example from the Databricks documentation, which I found amazing. So we have a table like this:

-- Example data ---------------------------------------------------------------
CREATE OR REPLACE TEMPORARY VIEW routes (origin, destination) AS VALUES
  ('New York', 'Washington'),
  ('New York', 'Boston'),
  ('Boston',   'New York'),
  ('Washington','Boston'),
  ('Washington','Raleigh');

Source: SunnyData / Hubert Dudek

Which, in fact, represents that hierarchy (and we consider that we can not go back to the same city):

Source: SunnyData / Hubert Dudek

So, we need to take another hop to calculate the route and exclude already visited cities:

Source: SunnyData / Hubert Dudek

-- Recursive CTE --------------------------------------------------------------
WITH RECURSIVE recursive_cte AS (
  /* anchor = New York = starting city */
  SELECT
    'New York' AS current_city,
    ARRAY('New York') AS path,
    0 AS hops -- start 0 hops
  UNION ALL
    /* step = follow every outgoing edge that does not revisit a city */
  SELECT
    recursive_cte.destination AS current_city,
    concat(recursive_cte.path, array(r.destination)),
    recursive_cte.hops + 1
  FROM
    routes r
    JOIN recursive_cte  -- it is recursive cte not a table, it is self-referencing
        ON r.origin = recursive_cte.current_city -- we join with next city which we can reach from city
    AND NOT array_contains(recursive_cte.path, r.destination) -- we exclude already visited cities
)
SELECT
  *
FROM
  recursive_cte
ORDER BY
  hops,
  current_city;

Handle Hierarchies with Ease:

You can effortlessly query hierarchical or tree-structured data (like organizational charts or category trees) without complicated, multi-join queries. For example, starting from a department "A", a recursive query can retrieve "A" and all its sub-departments (B, C, D, F) in a single attempt. This naturally reflects the hierarchy in the results. In that example, we will calculate how far the department is from the executives of the company:

Source: SunnyData / Hubert Dudek

CREATE OR REPLACE TEMPORARY VIEW department_hierarchy (department, parent) AS
VALUES
  ('Company', NULL),
  ('Sales', 'Company'),
  ('North America Sales', 'Sales'),
  ('California Sales', 'North America Sales'),
  ('Florida Sales', 'North America Sales'),
  ('EMEA Sales', 'Sales'),
  ('France Sales', 'EMEA Sales');

-- Recursive CTE --------------------------------------------------------------
WITH RECURSIVE recursive_cte AS (
  -- anchor row(s): the department we start from
  SELECT
    1 AS level,
    department,
    parent
  FROM
    department_hierarchy
  WHERE
    department = 'Sales'
  UNION ALL
    -- recursive step: get direct children of the previous level
  SELECT
    recursive_cte.level + 1 AS level,
    d.department,
    d.parent
  FROM
    department_hierarchy d
    JOIN recursive_cte -- it is recursive cte not table, it is self-referencing
      ON d.parent = sd.department
)
SELECT
  level,
  department,
  parent
FROM
  recursive_cte
ORDER BY
  level,
  department;

Summary

Recursive CTEs enable a query to repeatedly reference its own growing result set, row by row, while still joining to any other tables at each step.

And here is the official current syntax for CTE available from runtime 17 and SQL warehouse 2025.20:

WITH [ RECURSIVE ] common_table_expression [, ...]

common_table_expression
  view_identifier [ ( column_identifier [, ...] ) ] [ recursion_limit ] [ AS ] ( query | recursive_query )

recursion_limit
  MAX RECURSION LEVEL maxLevel

recursive_query
  base_case_query UNION ALL step_case_query
Hubert Dudek

Databricks MVP | Advisor to Databricks Product Board and Technical advisor to SunnyData

https://www.linkedin.com/in/hubertdudek/
Next
Next

Managing Data Changes with SCDs in Databricks