Engineering posts about Data Lake
Curated summaries and key learnings for engineers working with Data Lake.
How World Bank Group uses databricks to eradicate poverty through shared knowledge
The World Bank Group has developed a unified data and AI platform on Databricks to integrate structured operational data with unstructured documents, thereby eliminating manual research bottlenecks....
Scaling for MHHS: how Octopus Energy achieved a 50x cost reduction in margin data engineering
The article discusses the significant data engineering challenges faced by Octopus Energy as the UK transitions to a Market-wide Half-Hourly Settlement (MHHS) model, which increases the frequency of...
Unlock seamless and cost-effective marketing campaigns with Lakebase
The article discusses the implementation and benefits of Lakebase, an architecture that combines the advantages of transactional databases with the flexibility of data lakes. It highlights the...
How to Build Real-Time Fraud Detection using Spark Real-Time Mode and Lakebase
This article discusses the implementation of a real-time fraud detection system leveraging Apache Spark's Real-Time Mode (RTM) and Lakebase on the Databricks platform. It highlights the challenges of...
Scaling Airbnb’s identity graph with a unified knowledge graph infrastructure
The article outlines Airbnb's shift from a PaaS model to an internally managed knowledge graph infrastructure, focusing on the identity graph that captures user relationships. It details the...
Announcing the Databricks analytics engineer learning pathway
The Databricks Analytics Engineer Learning Pathway is designed to equip SQL practitioners with the skills necessary to transform raw data into governed, AI-ready semantic models and metrics. The...
Backstage with Lakebase, part 2
In this second part of the series, the article discusses the integration of Backstage with Databricks Lakebase, emphasizing the transformation of database management from a complex, multi-service...
Expanded interoperability with Unity Catalog Open APIs
The article elaborates on the advancements brought by Unity Catalog's Open APIs, which enhance interoperability in data management by allowing enterprises to maintain a single copy of data while...
Clinical operations intelligence belongs on the Lakehouse
The article presents the Site Feasibility Workbench, an open-source application designed to enhance clinical operations intelligence by integrating data, models, and applications within a single...
The Rosetta stone of CPS: Claroty’s AI-powered library
The article presents Claroty's AI-Powered CPS Library, a groundbreaking solution designed to address the identity crisis in Cyber-Physical Systems (CPS). It highlights the challenges faced by...
Data quality is the AI strategy
The article emphasizes the critical role of data quality in leveraging AI effectively within healthcare systems. It highlights NYU Langone Health's strategic approach to data management, where the...
How CFOs in consulting can recover margin with Databricks
The article outlines the financial challenges faced by consulting firms, particularly in managing data across disparate systems, which leads to inefficiencies and margin pressures. It emphasizes the...
The Rise of Sports Intelligence: How the Lakehouse Turns Tracking Data into Competitive Advantage
The article explores the transformative impact of the Databricks Data Intelligence Platform on professional sports through the integration of vast amounts of tracking and biomechanical data. It...
Amazon Redshift introduces AWS Graviton-based RG instances with an integrated data lake query engine
Amazon Redshift has launched RG instances powered by AWS Graviton, enhancing performance for data warehouse workloads and integrating a data lake query engine. This new instance type offers up to...
Migrating Data Ingestion Systems at Meta Scale
The article outlines the comprehensive migration of Meta's data ingestion system, which was essential for maintaining the efficiency and reliability of their social graph data processing. It details...
Growth Analytics Is What Comes After Growth Hacking
The article explores the evolution of growth analytics as a critical component in modern user acquisition strategies. It highlights the shift from tactical growth hacking to a more analytical...
Why telecom churn prediction misses the intervention window
The article explores the challenges faced by telecommunications companies in effectively predicting and intervening in customer churn. Despite the sophistication of churn propensity models,...
Operating room utilization is hiding in your scheduling data
The article highlights the critical importance of operating room (OR) utilization in healthcare systems, emphasizing that underutilized ORs represent significant revenue losses and unmet patient...
Energy trading analytics in a real-time market
The article highlights the challenges faced in energy trading analytics due to the fast-paced nature of price changes and the limitations of traditional batch processing methods. It emphasizes the...
Peril Predicts: Precision Payouts for a Volatile World
The article explores the implementation of parametric insurance, which automates payouts based on predefined conditions triggered by objective event data. It highlights the role of modern catastrophe...