Building the Data Foundation for a Digital Banking Ecosystem
SunnyData partnered with a leading digital financial services provider to migrate its overloaded IBM DataStage infrastructure to a modern Lakehouse on Databricks over AWS. We implemented governed data pipelines, CI/CD automation, and an AI-powered executive agent, all while keeping a high-volume operation running without disruption.
Key Metrics
Industry: Banking & Financial Services
Solution: Data Platform Migration, Data Engineering, CI/CD & IaaC, GenAI Business Agent (RAG)
Platform: Delta Lake, Unity Catalog, Databricks Lakeflow, Databricks Apps, MLflow
Cloud: AWS
Business Context
The client is a leading digital financial ecosystem for businesses and fintechs in its region, processing hundreds of millions of monthly transactions. It provides the banking backbone for a majority of the region's digital wallets, impacting tens of millions of users daily.
The Problem
The client's original data infrastructure, built on IBM DataStage and DB2 Warehouse, supplemented by SQL Server and MariaDB, had reached its operational limits. IBM compute resources were running near full contracted capacity, creating saturation risk and constraining growth. Mid-to-high complexity ETL jobs were taking between 3 and 8 hours to run, delaying data availability for business teams. With a large and fast-growing active data footprint and no path to historical versioning or lineage, the infrastructure was becoming both a bottleneck and a liability.
Beyond performance, the platform had significant governance gaps:
Access controls and permissions were distributed without centralized auditing
Metadata was fragmented
No consolidated data quality policies
This made it difficult to build reusable analytical models and effectively blocked any path to AI and ML in production.
The Solution
To tackle these issues, SunnyData migrated the client's entire data ecosystem to a unified Lakehouse platform on Databricks on AWS. Designed under an iterative, secure, and governed approach, it addressed both immediate operational pain points and the client's long-term strategic ambitions.
Data Platform Migration & Engineering Foundation
The migration covered several hundred ETL jobs identified and classified by Lakebridge, migrated progressively from IBM DataStage to Databricks Lakeflow across multiple primary data sources. Rather than replicating existing jobs, the team used the migration as an opportunity to refactor processes using Databricks-native frameworks (Lakeflow, DLT, DBT), implementing distributed execution, automated observability, and CI/CD automation via Databricks Asset Bundles.
Storage was modernized from DB2 to Delta Lake, enabling open formats, complete data lineage, historical versioning, and meaningful cost reduction compared to IBM's fixed-license model. Unity Catalog provides the governance backbone: centralized access controls, audit trails, data quality policies, and lineage tracking. The governance foundation needed to meet the client's regulatory requirements and enable trustworthy self-service analytics.
The architecture follows a medallion model and is designed Data Mesh-compliant from the outset, mapping assets to business domains with clear ownership so that individual units can eventually operate independently without creating a fragile monolith.
AI Business Agent
Once the foundation was there, SunnyData also built a GenAI-powered business agent deployed on Databricks Apps. The system ingests the client's monthly management reporting (management reports, board presentations, and supporting Excel files) into a RAG pipeline backed by S3, enabling senior executives to query financial performance data in natural language. The MVP launched for a small group of directors and senior leaders, with a phased roadmap to extend access to middle management and eventually the full organization, with Unity Catalog managing role-based permissions at each stage.
The Results
The migration to Databricks delivered an immediate and measurable shift in how the client's data infrastructure performs. ETL jobs that previously took 3 to 8 hours to complete now run in under an hour. Business teams now have access to reliable and ready-to-act-on data.
Beyond speed, the move from IBM's fixed-capacity model to Databricks on AWS transformed the client's cost structure from a constrained, overloaded licensing arrangement to a transparent model where compute scales with demand and powers down automatically when idle.
The governance picture changed just as significantly: where access controls were previously distributed and unaudited, Unity Catalog now provides centralized lineage, role-based access, and the audit trail required to meet regulatory standards.
On the AI side, directors and senior managers who previously spent hours manually reviewing monthly management reports can now get answers in seconds by querying metrics and business highlights in plain language through the AI agent.
Looking Ahead
With a clean, governed, and modeled data foundation in place, the client is positioned to move fast on the initiatives that matter most. Wholesale banking analytics for recently acquired loan portfolios are being built directly on Databricks from day one. The architecture is being prepared for Kafka integration to enable near-real-time ingestion, and legacy reporting is migrating to Databricks Dashboards for a unified, auditable layer.
Most importantly, the client now has the infrastructure to do what wasn't previously possible: deploy AI and ML in production, from predictive analytics to advanced customer segmentation, on a platform built to support it.