Speedrunning the Lakehouse
The lakehouse architecture has become a foundational design for modern data and AI workloads. But this flexibility comes at a cost: users and system developers must navigate multiple APIs, conflicting abstractions, and overlapping execution models. What if we started from scratch, with simplicity in mind? In this talk, we discuss the technical challenges of building a "Function-as-a-Service" (FaaS) lakehouse: if workloads were “just” chained functions, users and developers could easily reason about the full data lifecycle!
We argue that existing FaaS platforms were never designed for data-intensive workflows. To address this, we built a new system from the ground up using object storage and open formats. Re-purposing lessons from OpenLambda, we deploy functions up to 15× faster than AWS Lambda. By extending Apache Iceberg’s isolation with Git-like primitives, we support multi-language transactions with formal correctness proofs. Finally, we show how ephemeral functions, Arrow-native caching, and decoupled catalogs can simulate a full warehouse.
We conclude by emphasizing the role of user-facing APIs for adoption in real-world settings, and sharing late-breaking results from our ongoing research.
00:00 - Intro