Aldrin from Bauplan closed the Data Valentine Challenge with a demo where he didn't write a single line of pipeline code. Claude Code did all of it — importing satellite telemetry into a lakehouse, building an ingestion pipeline, adding data validations, and upgrading the workflow from naive to Write-Audit-Publish — while Bauplan's transactional branches kept production data safe the entire time.
The setup: a lakehouse running Iceberg on S3, with a git-for-data catalog that supports branching and merging at the metadata level. Every pipeline runs on a staging branch. Nothing touches main until it passes validation and gets explicitly merged. The AI agent works in its own branch, makes its mistakes there, and production never sees them.
The demo followed a three-act narrative based on a real Intella case study (satellite fleet telemetry for anomaly detection):
*Act 1 — Naive pipeline.* The agent built an ingestion workflow: import raw telemetry into the bronze layer, run a simple pass-through pipeline to the silver layer, merge to main. No validation. The data landed, but it included duplicates and string-typed numeric columns. The anomaly detection use case would break on this data silently.
*Act 2 — Validation pipeline.* The agent wrote a separate validation pipeline using Bauplan's expectations framework. It checked for nulls, confirmed numeric compatibility, and tested uniqueness. The uniqueness check failed — duplicate rows in the silver table. The problem was visible, but the fix wasn't in place yet.
*Act 3 — WAP workflow.* The agent integrated validation directly into the ingestion pipeline. Expectations ran inline. Bad rows got filtered before reaching the silver layer. The commit-branch script checked that silver tables had non-zero valid rows before merging. After the upgrade: row count dropped by half (duplicates gone), all expectations passed, and the merge to main went through clean.
Along the way, the agent hallucinated a namespace decorator that didn't exist, tried to write directly to main (Bauplan blocked it), and defaulted to Pandas when PyArrow was preferred. Every mistake was recoverable because it happened on a branch. Aldrin's takeaway on prompting AI agents: "It's better to explicitly say 'don't use Pandas' rather than just encouraging other libraries."What you'll learn:
Host: Recce Guest: Aldrin, Founding Engineer at Bauplan
Case study: Intella satellite fleet telemetry (anomaly detection)
00:00 - Intro