How Does Bauplan Work?

April 3, 2026

Bauplan provides a tight, tool-callable loop with a small API surface: write to an isolated branch, run and validate, then publish to main with an atomic merge. If a change fails validation or causes issues later, roll back to a previous commit.

Bauplan helps you with:

Branch-First Data Execution

Every change runs on an isolated data branch. The platform creates zero-copy branches over Apache Iceberg tables in your object storage. Writes remain isolated until an explicit merge. Multiple agents or engineers can work in parallel on separate branches without affecting production or each other.

Transactional Pipeline Execution

Every pipeline run is treated as a single deployable unit. When a run succeeds, all output tables are published as a coherent, atomic update. When a run fails, production remains unchanged. Intermediate state is preserved on the branch for inspection and debugging. Every operation that results in a write to the data catalog creates a new immutable reference (Ref) capturing the code, inputs, outputs, and environment.

Serverless Compute

Engineers write pipeline logic in Python (and SQL where it makes sense) in their existing repository and IDE. When you run bauplan run, the platform handles environment setup, execution, data movement, and tracking. Functions execute in ephemeral containers on a Function-as-a-Service (FaaS) layer. Data passes between functions through zero-copy Arrow tables (within hosts) or over Arrow Flight (across hosts). There is no Spark cluster, no Kubernetes configuration, and no infrastructure to manage.

Code-First Control Surface

All operations are accessible through a typed Python SDK, a CLI, and an MCP server. The system exposes a small set of composable primitives: branch, run, query, inspect, and merge. Agents and engineers interact with the platform through the same structured API. There is no required GUI.

Agent Skills

Agent Skills are reusable, declarative workflow templates that guide LLMs through multi-step data engineering tasks. Skills encode best practices for tasks that are otherwise easy to get wrong: creating new data pipelines, ingesting data safely using Write-Audit-Publish, exploring large datasets, investigating failed runs, and performing root-cause analysis. Each Skill defines the intent, constraints, and expected steps of a workflow while operating on the same underlying Bauplan primitives. Skills are distributed as a Claude Code plugin.

TABLE OF CONTENT

This is some text inside of a div block.

BACK TO RESOURCES