New Databricks INSERT Features: INSERT REPLACE ON and INSERT REPLACE USING

Databricks

Aug 15

Databricks SQL continues to evolve with powerful new capabilities. Two exciting additions to the INSERT command are now available: INSERT REPLACE ON for replacing records that match specific conditions, and INSERT REPLACE USING for overwriting entire physical partitions. Let's explore these features in detail.

INSERT REPLACE ON: Conditional Record Replacement

With INSERT REPLACE ON, you can specify a condition to target which rows should be replaced. The process works by first deleting all rows that match your expression (comparing source and target data), then inserting the new rows from your INSERT statement.

Use Case: Email Data Cleanup

Here's a practical example using the trim and lower functions with a condition to match and fix email addresses. We'll also use the <=> spaceship operator to safely handle NULL values, ensuring records with NULLs are properly caught and replaced.

This approach will:

Catch and replace JOE@EXAMPLE.COM
Handle NULL values (thanks to the <=> operator)
Insert jane@example.com as a new record (wasn't in the target before)
Leave patrick@example.com unchanged (doesn't match the condition)

  
    CREATE TABLE contacts(email STRING, state STRING);

INSERT INTO contacts
  VALUES
    ('JOE@EXAMPLE.COM', 'Old'),
    ('patrick@example.com', 'Old'),
    (NULL, 'Old');

CREATE TEMP VIEW contacts_fixes AS
SELECT
  *
FROM
  VALUES 
  ('joe@example.com', 'Replacement'), -- will replace JOE@EXAMPLE.COM
  ('jane@example.com', 'New'), -- will be added as new record
  (NULL, 'Replacement') -- will replace NULL using <=> operator safe for NULLs
  AS s (email, state);

INSERT INTO
  TABLE contacts AS t -- target table for replacement
  REPLACE ON lower(trim(t.email)) <=> lower(trim(s.email)) ( -- <=> spaceship operator safe for NULL, functions allowed in expression
    SELECT
      *
    FROM
      contacts_fixes -- SELECT statement for replacement data
  );

SELECT * FROM contacts;
  

The results show that contacts matching the expression were successfully replaced:

REPLACE ON deletes records matching the condition and then inserts new ones. In the previous example, no deletions occurred, so let's look at a deletion scenario.

Use Case: Event Data Management

When new events arrive, you might want to delete all older events. Here's how to accomplish this with REPLACE ON:

  
    CREATE TABLE events (date DATE, payload STRING);
CREATE TABLE new_events (date DATE, payload STRING);

INSERT INTO events VALUES 
(DATE'2025-08-10','old1'), 
(DATE'2025-08-20','old2'), 
(DATE'2025-08-30', 'KEEP'); -- marked as KEEP for demonstration - won't be deleted

INSERT INTO new_events VALUES 
(DATE'2025-08-25', 'new'); -- inserting newer data; all older records will be deleted

INSERT INTO TABLE events AS target
REPLACE ON target.date < source.date
(SELECT * FROM new_events) AS source;

SELECT * FROM events ORDER BY date;
  

The results show that old1 and old2 were deleted because they were older than the new event date:

INSERT REPLACE USING: Partition-Level Replacement

In the past, partition overwrites were often handled in problematic ways, including operations outside of Delta Lake that caused integration issues with schema and history management.

Now we have a best practice solution: INSERT REPLACE USING (field) ensures that partitions specified in the field are completely overwritten. The field values must correspond to the partitions you want to replace.

Since this performs folder-level overwrites, it applies to classic partitions, not liquid partitions.

Let's create a partitioned table to demonstrate:

  
    CREATE TABLE students (name STRING, country STRING)
PARTITIONED BY (country);

-- Insert initial data into US, UK, and IT partitions
INSERT INTO students VALUES
('Dylan', 'US'),
('Doug', 'UK'),
('Julia', 'IT');
  

You can see the partitioned structure in the storage account:

Now let's create a view with data that will replace entire partitions:

  
    -- New data to replace the UK partition and add an FR partition
CREATE OR REPLACE TEMP VIEW new_students AS SELECT * FROM VALUES
('Jennie', 'UK'), -- will replace all data in the UK partition
('Peter', 'FR') AS s(name, country); -- will create a new FR partition
  

First, let's validate which partitions will be affected:

SELECT country FROM new_students;

Now perform the dynamic partition overwrite using the country key:

  
    INSERT INTO TABLE students
REPLACE USING (country)
SELECT * FROM new_students;
  

Looking at the storage after the operation, we can see that the FR partition was created:

Examining the UK folder, we see a new parquet file that overwrites the old one. Since this is a Delta table and we maintain Delta history, the old parquet file remains alongside the new one:

Finally, let's verify the data shows the UK partition is overwritten, FR is added, and the rest remains unchanged:

Version Availability

INSERT REPLACE USING: Available in Databricks Runtime 16.3+
INSERT REPLACE ON: Available in Databricks Runtime 17.1+

These new INSERT features provide powerful, Delta-native ways to manage data updates and partition overwrites, eliminating the need for complex workarounds while maintaining data integrity and history.

#Databricksdelta lakedelta live tablesData Engineering