Speedrunning the Lakehouse & AMA with Jacopo Tagliabue

Speedrunning the Lakehouse

The lakehouse architecture has become a foundational design for modern data and AI workloads. But this flexibility comes at a cost: users and system developers must navigate multiple APIs, conflicting abstractions, and overlapping execution models. What if we started from scratch, with simplicity in mind? In this talk, we discuss the technical challenges of building a "Function-as-a-Service" (FaaS) lakehouse: if workloads were “just” chained functions, users and developers could easily reason about the full data lifecycle!

We argue that existing FaaS platforms were never designed for data-intensive workflows. To address this, we built a new system from the ground up using object storage and open formats. Re-purposing lessons from OpenLambda, we deploy functions up to 15× faster than AWS Lambda. By extending Apache Iceberg’s isolation with Git-like primitives, we support multi-language transactions with formal correctness proofs. Finally, we show how ephemeral functions, Arrow-native caching, and decoupled catalogs can simulate a full warehouse.

We conclude by emphasizing the role of user-facing APIs for adoption in real-world settings, and sharing late-breaking results from our ongoing research.

00:00 - Intro

Building Scalable Data Pipelines

Speedrunning the Lakehouse & AMA with Jacopo Tagliabue

Open Lakehouse + AI: From Functions to AI Agents: Reimagining the Lakehouse for an Agentic Future

Speedrunning the Lakehouse

Inside the Minds of Two CTOs: The Future Is Apache Iceberg | Fireside Chat by OLake

Getting started with Bauplan Labs

Safe, Untrusted, "Proof-Carrying" AI Agents: Toward the Agentic Lakehouse with Jacopo Tagliabue

Speedrunning the Lakehouse by Jacopo Tagliabue | DC Systems 011

An AI Agent Built My Entire Data Pipeline. Here's How I Kept It From Breaking Production

Agentic Data Lakehouses, Claudio and More w/ Bauplan

Bauplan Platform Primitives Demo: Python, Iceberg, and Git-for-Data