Managed Iceberg Tables

DatabricksTools

Jul 2

So you're staring at that new Iceberg icon next to managed tables in Databricks. Should you click it? Well, that depends on whether you can live without CDC and you're ready to embrace manifest files. Let me save you some headaches.

Iceberg Survival Guide

Now, you can see the Iceberg icon next to managed tables. From now on, you can create a managed Delta or an Iceberg table. I think that soon, the formats will unify, but to avoid headaches for now, you need to know a few things:

No CDC in Iceberg
We need to know the manifest concept
Liquid partitioning is supported, but needs some additional table properties
Maintenance: OPTIMIZE or VACUUM in UC is made by the same commands as for Delta.

Let us create the first Iceberg Managed Table.

CREATE TABLE IF NOT EXISTS hub.default.iceberg_orders (
  order_id     BIGINT,
  customer_id  BIGINT,
  order_ts     TIMESTAMP,
  total_amt    DECIMAL(12,2)
)
USING ICEBERG;

INSERT INTO hub.default.iceberg_orders (order_id, customer_id, order_ts, total_amt)
VALUES (1, 12345, '2025-06-29 00:00:00', 100.00);

Let us find a table in the catalog and see its location. We can see here also completely different properties than for Delta.

After inserting the example row, we can examine the storage in the storage account and observe that the structure differs from that of Delta. We can see the manifest files below.

MANIFEST

metadata.json (one per table version):

Stores schema partition spec for that version
Pointer to the manifest-list file
Every commit can change the schema safely (amazing schema evolution)

Snap-<id>.avro (manifest list):

One tiny Avro file per snapshot
Lists the manifest files* that belong to this snapshot

**<uuid>-m*.avro (manifest files):**

Many per snapshot, but tiny (a few MB)
Each row = one data or deleted file, with stats partition values
The suffix tells the file "flavor"
m0.avro → rows added
m1.avro → position delete files (rows to ignore - deleted)
m2.avro → equality delete files (values to ignore - deleted)

Moreover, actual data can be found in Parquet files.

The chart below can help us understand that concept:

CDC

Iceberg has many benefits, including good support for schema evolution and fewer files to scan. However, its most significant disadvantage is the lack of Change Data Capture. CDC is becoming the market standard, and it is used, for example, to update Materialized Views incrementally. Please keep this in mind, as it can negatively impact the performance of your materialized views.

ICEBERG maintains commands

I was positively surprised that instead of complicated Iceberg Catalog commands like CALL system.rewrite_data_files(table => 'hub.default. ice_orders', options => map('min-input-files', '5')), we can use standard Delta (I think better to say in that scenario now UC commands) to make things like OPTIMIZE or VACUUM.

CLUSTER BY - liquid partitioning

For Iceberg tables, you must explicitly turn off deletion vectors and row IDs when you use CLUSTER BY.

'write.delete-vector.enabled' = 'false', -- Disable deletion vectors

'write.row-id.enabled' = 'false' -- Disable row IDs

TABLE OPTIONS

Table properties are different for iceberg tables; in the example below, I have included some of the most important ones, including two mentioned earlier.

CREATE OR REPLACE TABLE hub.default.ice_orders (
  order_id     BIGINT,
  customer_id  BIGINT,
  order_ts     TIMESTAMP,
  total_amt    DECIMAL(12,2)
)
USING ICEBERG
CLUSTER BY (customer_id)
TBLPROPERTIES (
  'write.target-file-size-bytes' = '536870912', -- Target file size for written files in bytes (512 MB)
  'write.manifest.target-file-size-bytes' = '16777216', -- Target file size for manifest files in bytes (16 MB)
  'history.expire.min-snapshots-to-keep' = '1', -- Minimum number of snapshots to keep when expiring history
  'commit.retry.num-retries' = '3', -- Number of retries for commit operations
  'commit.retry.total-timeout-ms' = '60000', -- Total timeout for commit retries in milliseconds (60 seconds)
  'read.split.target-size' = '134217728', -- Target size for read splits in bytes (128 MB)
  'read.split.open-file-cost' = '4194304', -- Cost of opening a file for read splits in bytes (4 MB)
  'write.sort.order' = 'order_id,customer_id', -- Sort order for written data
  'metadata.previous-versions-max' = '5', -- Maximum number of previous metadata versions to keep
  'object.tagging.enabled' = 'false', -- Enable or disable object tagging
  'write.delete-vector.enabled' = 'false', -- Disable deletion vectors
  'write.row-id.enabled' = 'false' -- Disable row IDs
);

Bottom Line

Here's the deal: Choose Iceberg if you need bulletproof schema evolution and can live without CDC. Stick with Delta if your materialized views depend on change tracking. The manifest architecture is clever, but that CDC gap is real.

DatabricksApache Icebergdelta lakedelta live tablesunity catalogData lakehouseLakehouse

Hubert Dudek

Databricks MVP | Advisor to Databricks Product Board and Technical advisor to SunnyData

https://www.linkedin.com/in/hubertdudek/