An AI Agent Built My Entire Data Pipeline. Here's How I Kept It From Breaking Production

Aldrin from Bauplan closed the Data Valentine Challenge with a demo where he didn't write a single line of pipeline code. Claude Code did all of it — importing satellite telemetry into a lakehouse, building an ingestion pipeline, adding data validations, and upgrading the workflow from naive to Write-Audit-Publish — while Bauplan's transactional branches kept production data safe the entire time.

The setup: a lakehouse running Iceberg on S3, with a git-for-data catalog that supports branching and merging at the metadata level. Every pipeline runs on a staging branch. Nothing touches main until it passes validation and gets explicitly merged. The AI agent works in its own branch, makes its mistakes there, and production never sees them.

The demo followed a three-act narrative based on a real Intella case study (satellite fleet telemetry for anomaly detection):

‍*Act 1 — Naive pipeline.* The agent built an ingestion workflow: import raw telemetry into the bronze layer, run a simple pass-through pipeline to the silver layer, merge to main. No validation. The data landed, but it included duplicates and string-typed numeric columns. The anomaly detection use case would break on this data silently.

‍*Act 2 — Validation pipeline.* The agent wrote a separate validation pipeline using Bauplan's expectations framework. It checked for nulls, confirmed numeric compatibility, and tested uniqueness. The uniqueness check failed — duplicate rows in the silver table. The problem was visible, but the fix wasn't in place yet.

‍*Act 3 — WAP workflow.* The agent integrated validation directly into the ingestion pipeline. Expectations ran inline. Bad rows got filtered before reaching the silver layer. The commit-branch script checked that silver tables had non-zero valid rows before merging. After the upgrade: row count dropped by half (duplicates gone), all expectations passed, and the merge to main went through clean.

Along the way, the agent hallucinated a namespace decorator that didn't exist, tried to write directly to main (Bauplan blocked it), and defaulted to Pandas when PyArrow was preferred. Every mistake was recoverable because it happened on a branch. Aldrin's takeaway on prompting AI agents: "It's better to explicitly say 'don't use Pandas' rather than just encouraging other libraries."What you'll learn:

How transactional branches create safe sandboxes for AI-generated pipelines
The Write-Audit-Publish pattern for data quality in a lakehouse
How to structure a CLAUDE.md file and narration steps so an AI agent can build pipelines from scratch
Why Bauplan blocks writes to main unless you use dry-run or a staging branch
‍

Host: Recce Guest: Aldrin, Founding Engineer at Bauplan
Case study: Intella satellite fleet telemetry (anomaly detection)

00:00 - Intro

Building Scalable Data Pipelines

Speedrunning the Lakehouse & AMA with Jacopo Tagliabue

Open Lakehouse + AI: From Functions to AI Agents: Reimagining the Lakehouse for an Agentic Future

Speedrunning the Lakehouse

Inside the Minds of Two CTOs: The Future Is Apache Iceberg | Fireside Chat by OLake

Getting started with Bauplan Labs

Safe, Untrusted, "Proof-Carrying" AI Agents: Toward the Agentic Lakehouse with Jacopo Tagliabue

Speedrunning the Lakehouse by Jacopo Tagliabue | DC Systems 011

An AI Agent Built My Entire Data Pipeline. Here's How I Kept It From Breaking Production

Agentic Data Lakehouses, Claudio and More w/ Bauplan