
Build data pipelines from your repo with AI coding assistants. Bauplan turns data changes into branch-isolated runs and atomic publishes. All exposed as simple APIs your IDE, CLI, and reviews can reason about.

Define transformations and runtime environments in code. Run from your IDE using the SDK and CLI with the same surface your AI assistant can call.

Every run happens on an isolated, zero-copy data branch. Publish with atomic data merges, keeping production unchanged and preserve artifacts for inspection and reruns.

Bauplan runs production-grade workloads without the overhead of traditional platforms. Deploy with single-tenant, private link and BYOC options. Your data stays in object storage, no data movement needed. SOC 2 Type 2 compliant with built-in isolation and access controls.




Create a zero-copy branch instantly. Use branches for development, changes, and safe backfills while main stays protected.
A pipeline run behaves like a database transaction. Outputs merge on success. On failure, production stays unchanged.
Revert a bad publish in seconds by rolling back to the last known-good commit, with full history of what changed.

Write transformations as Python functions and SQL. Bauplan handles execution, scaling, and table I/O. No configs, containers, or runtime plumbing.
The workflow is code-addressable end to end: branch, run, validate, publish. You can drive it from your IDE or let an assistant execute it safely.

Write every step of your pipeline as code. Version everything — business logic, data, environments — just like software.

A few predictable primitives (branch, query, run, commit, merge, inspect) give agents a reliable loop for iteration, validation, and publish.
Branch by default, protect prod. Each write produces immutable references that capture code, inputs, outputs, and environment, so rollbacks and replay are straightforward.



Great! Bauplan is built to be fully interoperable. All the tables produced with Bauplan are persisted as Iceberg tables in your S3, making them accessible to any engine and catalog that supports Iceberg. Our clients use Bauplan together with Databricks, Snowflake, Trino, AWS Athena and AWS Glue, Kafka, Sagemaker, etc.

Bauplan consolidates pipeline execution and data versioning into one workflow: branch, run, validate, merge. You can keep S3 and your orchestrator; you remove a lot of cluster complexity and glue. For example, an Airflow DAG that spins up an EMR cluster, submits Spark steps, then runs an AWS Glue crawler to refresh the Glue Data Catalog before triggering downstream jobs becomes: Airflow triggers a Bauplan run on an isolated branch that writes Iceberg tables directly to S3.

Your data stays in your own S3 bucket at all times. Bauplan processes it securely using either Private Link (connecting your S3 to your dedicated single-tenant environment) or entirely within your own VPC using Bring Your Own Cloud (BYOC).

No. Bauplan is just Python (and SQL for queries). That why your AI assistant can immediately write Bauplan code with no problem.

Bauplan allows you to use git abstractions like branches, commits and merges to work with your data. You can create data branches for your data lake to isolate data changes safely and enable experimentation without affecting production, and use commits to time-travel to previous versions of your data, code and environments in one line of code. All this, while ensuring transactional consistency and integrity across branches, updates, merges, and queries. Learn more.