Managed Iceberg Tables
So you're staring at that new Iceberg icon next to managed tables in Databricks. Should you click it? Well, that depends on whether you can live without CDC and you're ready to embrace manifest files. Let me save you some headaches.
Source: SunnyData / Hubert Dudek
Iceberg Survival Guide
Now, you can see the Iceberg icon next to managed tables. From now on, you can create a managed Delta or an Iceberg table. I think that soon, the formats will unify, but to avoid headaches for now, you need to know a few things:
No CDC in Iceberg
We need to know the manifest concept
Liquid partitioning is supported, but needs some additional table properties
Maintenance: OPTIMIZE or VACUUM in UC is made by the same commands as for Delta.
Let us create the first Iceberg Managed Table.
CREATE TABLE IF NOT EXISTS hub.default.iceberg_orders ( order_id BIGINT, customer_id BIGINT, order_ts TIMESTAMP, total_amt DECIMAL(12,2) ) USING ICEBERG; INSERT INTO hub.default.iceberg_orders (order_id, customer_id, order_ts, total_amt) VALUES (1, 12345, '2025-06-29 00:00:00', 100.00);
Let us find a table in the catalog and see its location. We can see here also completely different properties than for Delta.
Source: SunnyData / Hubert Dudek
After inserting the example row, we can examine the storage in the storage account and observe that the structure differs from that of Delta. We can see the manifest files below.
Source: SunnyData / Hubert Dudek
MANIFEST
metadata.json (one per table version):
Stores schema partition spec for that version
Pointer to the manifest-list file
Every commit can change the schema safely (amazing schema evolution)
Snap-<id>.avro (manifest list):
One tiny Avro file per snapshot
Lists the manifest files* that belong to this snapshot
<uuid>-m*.avro (manifest files):
Many per snapshot, but tiny (a few MB)
Each row = one data or deleted file, with stats partition values
The suffix tells the file "flavor"
m0.avro → rows added
m1.avro → position delete files (rows to ignore - deleted)
m2.avro → equality delete files (values to ignore - deleted)
Moreover, actual data can be found in Parquet files.
The chart below can help us understand that concept:
Source: SunnyData / Hubert Dudek
CDC
Iceberg has many benefits, including good support for schema evolution and fewer files to scan. However, its most significant disadvantage is the lack of Change Data Capture. CDC is becoming the market standard, and it is used, for example, to update Materialized Views incrementally. Please keep this in mind, as it can negatively impact the performance of your materialized views.
ICEBERG maintains commands
I was positively surprised that instead of complicated Iceberg Catalog commands like CALL system.rewrite_data_files(table => 'hub.default. ice_orders', options => map('min-input-files', '5')), we can use standard Delta (I think better to say in that scenario now UC commands) to make things like OPTIMIZE or VACUUM.
Source: SunnyData / Hubert Dudek
CLUSTER BY - liquid partitioning
For Iceberg tables, you must explicitly turn off deletion vectors and row IDs when you use CLUSTER BY.
'write.delete-vector.enabled' = 'false', -- Disable deletion vectors
'write.row-id.enabled' = 'false' -- Disable row IDs
TABLE OPTIONS
Table properties are different for iceberg tables; in the example below, I have included some of the most important ones, including two mentioned earlier.
CREATE OR REPLACE TABLE hub.default.ice_orders ( order_id BIGINT, customer_id BIGINT, order_ts TIMESTAMP, total_amt DECIMAL(12,2) ) USING ICEBERG CLUSTER BY (customer_id) TBLPROPERTIES ( 'write.target-file-size-bytes' = '536870912', -- Target file size for written files in bytes (512 MB) 'write.manifest.target-file-size-bytes' = '16777216', -- Target file size for manifest files in bytes (16 MB) 'history.expire.min-snapshots-to-keep' = '1', -- Minimum number of snapshots to keep when expiring history 'commit.retry.num-retries' = '3', -- Number of retries for commit operations 'commit.retry.total-timeout-ms' = '60000', -- Total timeout for commit retries in milliseconds (60 seconds) 'read.split.target-size' = '134217728', -- Target size for read splits in bytes (128 MB) 'read.split.open-file-cost' = '4194304', -- Cost of opening a file for read splits in bytes (4 MB) 'write.sort.order' = 'order_id,customer_id', -- Sort order for written data 'metadata.previous-versions-max' = '5', -- Maximum number of previous metadata versions to keep 'object.tagging.enabled' = 'false', -- Enable or disable object tagging 'write.delete-vector.enabled' = 'false', -- Disable deletion vectors 'write.row-id.enabled' = 'false' -- Disable row IDs );
Bottom Line
Here's the deal: Choose Iceberg if you need bulletproof schema evolution and can live without CDC. Stick with Delta if your materialized views depend on change tracking. The manifest architecture is clever, but that CDC gap is real.