Resources and insights
Our Blog
Explore insights and practical tips on mastering Databricks Data Intelligence Platform and the full spectrum of today's modern data ecosystem.
Most teams that move to Databricks get the hard part right. They migrate the processing engine, rebuild the transformation logic, and stand up Unity Catalog. Then they leave Azure Data Factory running in the background: connected to everything, owned by nobody, and quietly accumulating cost and complexity. In this entry, that’s the gap we address.
Explore More Content
ML & AI
Genie Code Analysis: Two Weeks Later
Databricks Genie Code hit a 77.1% task success rate in production data science workflows — more than double what general-purpose coding agents achieve. But that performance is entirely conditional on the quality of your Unity Catalog metadata. SunnyData's two-week evaluation breaks down what works, what doesn't, and the governance layer you need before you go live.
5 Databricks Patterns That Look Fine Until They Aren't
Five common Databricks coding patterns — including undocumented API calls, manual SparkSession instantiation, and hardcoded Spark configs — that pass code review but fail silently in serverless environments or during platform migrations. For each anti-pattern, this post explains why it breaks and shows the correct native Databricks approach using DABS, the Databricks SDK, and dynamic job parameters.
Databricks Lakewatch: The Future of Agentic SIEM
Databricks Lakewatch replaces the traditional SIEM model with an Open Security Lakehouse — storing 100% of telemetry in open formats at up to 80% lower TCO. AI agents reason across years of unified data to detect and respond at machine speed, closing the visibility gap that legacy SIEMs were structurally forced to create. Early customers include Adobe and Dropbox, with broader availability following Private Preview.
Lakeflow Connect Free Tier: $35/Day Back in Your Budget
Databricks' permanent Lakeflow Connect free tier delivers 100 DBUs per workspace per day — covering up to 100 million records of ingestion at no additional compute cost. For enterprise teams running multiple workspaces, that's over $255,000 in avoided annual costs. This post breaks down the economics, architecture, and what it means for teams still paying a third-party ETL tax.
The Lakehouse Finally Has Real Transactions
Learn how Databricks multi-statement transactions use Unity Catalog catalog-managed commits to guarantee atomic updates across multiple Delta tables — with a step-by-step walkthrough.
Why Your Databricks Upgrade Is Incomplete If You're Still Running ADF
Still running ADF after moving to Databricks? Here's why it happens, what it's costing your governance story, and how Lakeflow Jobs closes the gap.
Your Databricks Stack Is Modern. Your Orchestration Isn't.
Most Databricks migrations modernize the processing engine and leave Azure Data Factory running untouched. This post explains why that gap is a compounding business risk, and maps out the three practical paths to migrating orchestration to Lakeflow Jobs, including the on-prem push pattern that removes the most common blocker.
Prioritize AI Quality by Establishing a Data Quality Pillar
AI quality isn't just a model problem — it starts with your data. This guide outlines six executive-grade requirements for establishing a data quality pillar in Databricks, and explains how agentic monitoring can help organizations scale quality across their entire data estate.
Deduplicating Data on the Databricks Lakehouse: Making joins, BI, and AI queries “safe by default.”
Learn 5 proven deduplication strategies for Databricks Lakehouse. Prevent duplicate data from breaking AI queries, BI dashboards, and analytics. Includes code examples.
Deploy Your Databricks Dashboards to Production
Stop deploying Databricks dashboards manually. Learn how to use Git, Asset Bundles, and CI/CD for reliable, reproducible dashboard deployments across environments.
The Nightmare of Initial Load (And How to Tame It)
Initial data loads don't have to be nightmares. Discover the split Bronze table pattern that separates historical backfills from incremental streaming.
You Pay for the Complexity of Your Move From On-Prem to Cloud
Moving data from on-prem to cloud shouldn't require 5+ systems. Discover why complexity costs you money and how Zerobus Ingest simplifies data pipelines.
Temp Tables Are Here, and They're Going to Change How You Use SQL
Learn how temporary tables in Databricks SQL warehouses enable materialized data, DML operations, and session-scoped ETL workflows. Complete with practical examples.
95% of GenAI projects fail. How to become part of the 5%
MIT reports 95% of GenAI investments produce zero returns. Learn the 5 failure modes keeping AI projects stuck in pilot limbo and how to ship production AI.
Hidden Magic Commands in Databricks Notebooks
Discover 12 powerful Databricks notebook magic commands beyond %sql and %python. Learn shortcuts for file operations, performance testing, and debugging.