bauplan

GET STARTED

bauplan

Build the data layer for AI systems

Code-native platform for versioned pipelines on object storage with zero infrastructure management. Simple for developers, robust for systems.

GET STARTED

No credit card required.

Branch

Import

Run

Merge

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')
# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.namespace:<12} {table.name:<12} {table.kind}")

Create sandboxes instantly without data duplication

Branch

Import

Run

Merge

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')
# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<12} {table.kind}")

Zero copy data lake branches

Branch

Import

Run

Merge

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')
# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<12} {table.kind}")

Zero copy data lake branches

Built with Bauplan

See all Examples

Everything you need to build reliable data applications. Nothing you don’t.

PDF analysis with OpenAI

Analyze PDFs using Bauplan for data preparation and OpenAI’s GPT for text analysis.

PDF

Open AI

Pandas

Patrick Chia

Founding Eng

Unstructured → structured with LLMs

Convert PDFs into structured, analyzable tables using LLMs.

OpenAI

PDF

Streamlit

Jacopo

CTO @bauplan

Near Real-time Analytics

Build near real-time analytics pipeline with WAP pattern and visualize metrics with Streamlit.

duckdb

prefect

streamlit

Sam Jafari

Dir. Data and AI

Iceberg Lakehouse and WAP with Prefect

Orchestrated WAP pattern for ingesting parquet files to Iceberg tables.

orchestrator

pandas

iceberg

Chris White

CTO @Prefect

SFTP to lakehouse with Orchestra

Securely automate ingestion of remote files into your Iceberg data workflows.

orchestrator

SFTP

Ingestion

Hugo Lu

founder @Orchestra

WAP Pattern with DBOS

Write-Audit-Publish pattern with DBOS durable execution and Bauplan data branching for safe lakehouse ingestion.

orchestrator

quality

WAP

Peter Kraft

Co-founder

PDF analysis with OpenAI

Analyze PDFs using Bauplan for data preparation and OpenAI’s GPT for text analysis.

PDF

Open AI

Pandas

Patrick Chia

@atelier-atelico

Unstructured → structured with LLMs

Convert PDFs into structured, analyzable tables using LLMs.

OpenAI

PDF

Streamlit

Jacopo

@bauplan

Near Real-time Analytics

Build near real-time analytics pipeline with WAP pattern and visualize metrics with Streamlit.

duckdb

prefect

streamlit

Sam Jafari

@lucid

Iceberg Lakehouse and WAP with Prefect

Orchestrated WAP pattern for ingesting parquet files to Iceberg tables.

orchestrator

pandas

iceberg

Chris White

@prefect

SFTP to lakehouse with Orchestra

Securely automate ingestion of remote files into your Iceberg data workflows.

orchestrator

SFTP

Ingestion

Hugo Lu

Orchestra

WAP Pattern with DBOS

Write-Audit-Publish pattern with DBOS durable execution and Bauplan data branching for safe lakehouse ingestion.

orchestrator

quality

WAP

Peter Kraft

DBOS

dbt-style Pipelines with CI/CD and Version Control

dbt workflows VS Bauplan pipelines with branching, testing, and CI/CD.

dbt

CI/CD

marts

Yuki Kakegawa

Jump

Playlist recommendations with MongoDB

Embedding-based recommender system for music playlists.

mongoDB

Vectors

Recs

Ciro

@bauplan

ML Training and Deployment Pipeline

End-to-end ML pipeline for predicting taxi trip tips with Streamlit data viz.

Scikit-Learn

Notebook

Streamlit

Christine Yu

@sama

Entity matching with OpenAI

End-to-end entity matching for e-commerce, using OpenAI’s off-the-shelf LLM APIs for accurate matching.

LLM

OpenAI

DuckDB

Nate

@bauplan

Medallion Architecture + WAP Pattern

End-to-end data engineering repo using Mage & the medallion architecture.

Medallion

Mage

Polars

George Zefkilis

@Novo Nordisk

RAG system with Pinecone and OpenAI

Build a RAG system with Pinecone and OpenAI over StackOverflow data.

RAG

Pinecone

OpenAI

Ciro

@bauplan

Serverless Data Products

Serverless data product with built-in quality checks using Lambda and Bauplan.

Dataprod

Lamba

Quality

Andrea Gioia

Quantyca

AI-Powered NetFlow Observability at Scale

A scalable pipeline turning high-volume NetFlow traffic into AI-ready features.

NetFlow

Data Quality

Claude

Marco Graziano

Graziano Labs

Data Quality and Expectations

Implement data quality checks using expectations.

PyArrow

Pandas

DuckDB

Jacopo

@bauplan

Interactive Data App with Streamlit

Build an interactive dashboard to visualize taxi pickup locations in NYC.

Streamlit

Pandas

DuckDB

Ciro

@bauplan

Less complexity, more robustness

Less complexity,
more robustness

View Docs

The simplest way to build AI data apps, using only Python and familiar abstractions over S3.

Build AI data apps, using only Python and familiar abstractions over S3.

Branch your Data

Like Git for your data systems

Git for data

Reproducibility, by design

Know exactly what code produced what data, when, and why with Git-style commits. Everything is versioned, traceable, and auditable by default.

Instant branching for dev and prod

Instant data branching

Create isolated branches in seconds—zero copy, zero wait. Power experiments and Write-Audit-Publish workflows in production.

Safe, composable data operations

Run, test, and validate changes in complete isolation. Merge confidently, automate quality gates, and revert at any time.

Launch development environments in seconds without data duplication, saving time and storage.

Use Git-like workflows for your data lake—branch, checkout, and merge seamlessly.

Keep your production environment safe. Collaborate in fully isolated, sandboxed environments.

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<30}")

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<30}")

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<30} {table.kind}")

import bauplan

@bauplan.model()
# Define Python env with package versions

@bauplan.python(pip={'pandas': '2.2.0'})
def clean_data(
   # Input model reference
   data=bauplan.Model('my_data')
):
   import pandas as pd
   
   # Your data transformation logic here
   ...       
   
   return clean_data

import bauplan

@bauplan.model()
# Define Python env with package versions

@bauplan.python(pip={'pandas': '2.2.0'})
def clean_data(
   # Input model reference
   data=bauplan.Model('my_data')
):
   import pandas as pd
   
   # Your data transformation logic here
   ...       
   
   return clean_data

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<30} {table.kind}")

Build applications not platforms

No infrastructure

Serverless Functions

Bauplan handles the tasks that usually require a platform team—packaging, scaling, execution, and environment isolation—so your developers can focus on writing Python.

Just serverless functions

Write modular functions in plain Python or SQL. Bauplan handles execution, scaling, and table I/O automatically—no config files, containers, or DAGs.

One interface, zero setup

Build and run everything from your IDE with a simple SDK. No Dockerfiles, no local hacks, no divergence between dev and prod.

Run Python workloads in the cloud with automatic scaling—no cluster setup needed.

Build and test in your IDE without new frameworks or DSLs.

Focus on models—Bauplan handles containers, dependencies, and resources.

Develop with no Ops

The code-first platform for the AI era

Code-driven data automation

A programmable data platform

Programmable data platform

Write modular, testable functions for every step of your data workflow. Version everything—tables, logic, environments—just like software.

Ship with confidence from commit to CI/CD

From commit to CI/CD

Validate before merge. Every run is tied to a commit, so results are deterministic, traceable, and rollback-ready.

Built for developers. Ready for agents

For developers and agents

Stop deploying notebooks with hidden states. Use typed APIs and versioned logic easy to script or automate—whether by a team or by a model.

Deploy with confidence—integrate validated data into production seamlessly.

Prevent issues early with embedded data quality checks.

Connect to tools and platforms with a single line of code.

Automate your Workflows

import bauplan

client = bauplan.Client()

# create a dev data branch
client.create_branch(dev_branch, from_ref='main')
# import data into tables
client.import_data(table_name, dev_branch)
# run a pipeline in dev
client.run('./my_project_dir', dev_branch)
# merge the new tables into the main data lake
client.merge_branch(dev_branch, into_branch='main')

import bauplan

client = bauplan.Client()

# Create a new branch
branch = client.create_branch(
   branch="dev_branch",
   from_ref="main"
)
print(f'Created branch "{branch.name}"')

# List tables in the branch
for table in client.get_tables(ref=branch):
   print(f"{table.name:<30} {table.kind}")

Latest from Our Blog

Latest from our blog

Read all Posts

Python, Go, serverless, data lakes, Iceberg, and more than anything, superb DevEX.

Data engineer agents

From Prompt to Pipeline: Cloud-Native Agents for ETL

Ciro Greco, Jacopo Tagliabue, Federico Bianchi

From notebook to prod with Bauplan and marimo

Bridging the gap from prototype to prod with Pythonic data workflows

Ciro Greco

Python Over Data Lakes

Functions, Declarative Environments and Data Management

Ciro Greco

Hello Bauplan

Bauplan is a serverless data platform that treats pipelines, models, and tables like software.

Ciro Greco

Data engineer agents

From Prompt to Pipeline: Cloud-Native Agents for ETL

Ciro Greco, Jacopo Tagliabue, Federico Bianchi

From notebook to prod with Bauplan and marimo

Bridging the gap from prototype to prod with Pythonic data workflows

Ciro Greco

Python Over Data Lakes

Functions, Declarative Environments and Data Management

Ciro Greco

Hello Bauplan

Bauplan is a serverless data platform that treats pipelines, models, and tables like software.

Ciro Greco

Announcing Bauplan's seed round

Announcing $7.5M seed round led by Innovation Endeavors

Bauplan Team

FAQ

Don't see an answer to your question? Check our docs.

How do you keep my data secure?

What if I have already Databricks or Snowflake?

What does Bauplan replace in my AWS data stack?

Do I need to learn an entirely new data framework?

What does Git for Data mean?

How do you keep my data secure?

What if I have already Databricks or Snowflake?

What does Bauplan replace in my AWS data stack?

Do I need to learn an entirely new data framework?

What does Git for Data mean?