Supporting Predictive Maintenance and Customer Insights on Databricks through IoT-enabled devices.

SunnyData partnered with a global manufacturing enterprise to modernize the data platform that powers its IoT-enabled filtration monitoring service. By replacing a fragile legacy Azure architecture with the Databricks Intelligence Platform, we implemented a scalable streaming Lakehouse capable of supporting predictive maintenance analytics, real-time alerts, and customer-facing insights.

The new platform consolidated multiple Azure-native services into a unified architecture, enabling advanced analytics while reducing operational complexity and expected cloud infrastructure costs by approximately 55%.

Client name withheld to protect confidentiality


Key Metrics


INDUSTRY: Manufacturing

SOLUTION: Predictive Maintenance, Alarms, and Customer Insights based on IoT data embedded on products.

PLATFORM USE CASE: Delta Lake, Streaming, Data Science, Machine Learning, ETLz


The client is a vertically integrated filtration manufacturing company engaged in the production and marketing of air filters used across a variety of sectors, including commercial and industrial applications (engines, exhausts, transmissions, vents in private vehicles, hydraulics), aerospace (helicopters, planes), chemical, alternative energy (windmills), and pharmaceuticals. The company generates multi-billion-dollar annual revenue and is headquartered in the US.

They had a strategic initiative to enable all of its products (either retrofit or “first fit”) to use IoT devices by the end of 2025; these devices stream data back to the organization to support several customer initiatives. The IoT-enabled devices also allow them to sell a subscription service enabled by the industrial IoT. This service monitors industrial dust collectors and sends real-time data and maintenance alerts directly to facility management teams.

Streaming data enables predictive analytics using ML/AI to help customers better plan for outages, maintenance, and replacements, and to understand their products’ performance metrics.

The IoT data and associated analytics enable additional value-added services beyond the physical products to their customer devices, impacting their top-line revenue. 

Databricks solution reduces operational costs (compared to native Azure), lowers maintenance overhead, and minimizes data replication and data movement.

Client challenges

The legacy customer platform to ingest, store, and analyze the IoT data was built on the native Azure platform with these technologies:

  • Azure StreamAnalytics and Azure Functions: used to consume and process IoT messages from EventHub and write into data stores such as CosmosDB and SQL Server, and send alarms/notifications to users. 

  • Azure CosmosDB: a data store for both operational and analytical workloads.

Some of the challenges associated with the legacy platform were: 

  • StreamAnalytics is very fragile - breaks often

  • Costs of the overall solution

  • Limited ability to perform advanced machine learning use cases

  • Need a scalable solution with fewer data hops for ETL and data science workloads.

Solution Implementation

The customer team supporting the subscription product resides within the Filtration business unit and was new to the Databricks platform. The team had to migrate their existing jobs from the native Azure stack (Azure Streaming Analytics and Azure Functions for processing streaming data and CosmosDB for reporting).

The project was initiated in 3 phases with the help of SunnyData resources, who helped architect, design, and migrate existing data pipelines and code

Phase 1 

Ingest (streaming), cleanse, and enrich fact-level IoT datastreaming from devices using streaming pipelines. Binary data from devices was converted to JSON format as it moved from the Bronze layer to the Silver layer.

Phase 2 

Build data pipelines that produce aggregates to meet business needs across various consumption layers in Silver and Gold.

Customers access data through standard reports via the customer portal or perform custom reporting. Data is accessed through a Databricks SQL endpoint.

Phase 3

Support DevOps and CI/CD processes aligned with standard SDLC practices to automatically promote data engineering pipelines to production. Production support was provided to stabilize and optimize operations.

The end-to-end project timeline for this implementation/migration was 10 weeks, which DB Accelerate partially funded.

With the go-live of new products and migration of existing devices to the Databricks Intelligence Platform, the client was in a position to meet their business timelines for enabling all devices with IoT by the end of 2025.

Key Benefits Achieved

Post go-live, Databricks consumption increased from approximately $3K/month to ~$28K/month starting in July 2024.

Business Benefits

Streaming IoT data on the Databricks Lakehouse enables predictive analytics using ML/AI to help customers better plan for outages, maintenance, and replacements, and understand performance metrics of their products.

Support Efficient Maintenance and Operation: The operational status of dust collectors is available from a single web-based dashboard. Potential issues are identified before requiring larger, more time-intensive corrective action.

Reduce Unplanned Downtime: Key parameters are continuously monitored, enabling proactive troubleshooting and flagging of maintenance needs. Alerts are sent when pre-set thresholds are breached or when equipment operates outside the pre-set parameters.

Financial Benefits

The Databricks Intelligence Platform replaced several native Azure technologies (Azure Functions, Stream Analytics, CosmosDB), creating a more stable and forward-looking platform, which sets them up for future use cases (Delta Sharing, Auto ML, GenAI use cases).

Cost Reduction from Azure: The Databricks platform is expected to reduce its total Azure cloud computing cost by 55%. 

Go-live of the Databricks platform enabled alignment with the corporate vision and timeline of making all retrofit and new devices IoT-enabled.

Technical and Operational Benefits

Data Science use cases: Data Science teams can now run their Python-based workloads directly on the Databricks platform, compared to the previous process of requesting and pulling data from Snowflake (their EDW) and running their processes on the Azure platform. 

Single platform to serve multiple use cases:  In additional to supporting operational analytics and the customer portal, the Lakehouse also supports some initial data science use cases, but is primed to support their future cases on advanced machine learning, GenAI use cases as well as data sharing with partners/customers through Delta share all within the same platform without moving data around the Enterprise.

Streamlined devOps and CI/CD processes aligned to Enterprise standards: Clean separation of dev, QA, UAT, Demo, and production environments, with automatic promotion of software assets on code merge and validation with automated unit tests. Leverages Azure DevOps for the CI/CD pipeline and Databricks Asset Bundles for defining workloads in the Databricks Intelligence Platform, adhering to best practices such as using Service Principals and Serverless Compute where applicable.

Data Sharing with partners and customers: The IoT data that is conformed and ready for sharing can be shared directly with OEM partners, dealers, and direct customers in the future through Delta Share. 

Post Go Live: Optimization and onboarding of new use cases. 

 Post go-live, SunnyData continues to:

  • Review, onboarding, and architecting new use cases as they come to the subscription services team

  • Monitor and stabilize the current production environment. 


Previous
Previous

Automating Healthcare Contract Analysis with AI

Next
Next

How an InsurTech Company Achieved Sub-Hour Processing Times from 20 Hour Data Loads