Bauplan | Your data lakehouse, built like software

The data platform your team would build
…if they had the time

Built for speed and simplicity

Bauplan replaces Spark, Kubernetes, metadata catalogs, and custom platform glue with one cohesive system. Your team just writes Python and SQL. We handle execution, isolation, data versioning, and scale.

Experiment freely and roll back instantly with Git-for-Data

Use data branches to experiment, validate, and test your workloads in isolation. Every change is versioned, auditable, and reversible. No mistake is ever final.

Production ready with no migration

Bauplan runs production-grade workloads without the overhead of traditional platforms. Deploy with single-tenant, private link and BYOC options. Your data stays in object storage, no data movement needed.
SOC 2 Type 2 compliant with built-in isolation and access controls.

Bauplan infrastructure - Production ready with no migration

Integrations

Read more about bauplan integrations in our docs.

Orchestra

A managed orchestration platform that lets data teams build, schedule, and monitor pipelines efficiently within a unified interface.

Metabase

An open-source and enterprise-ready BI tool that enables teams to explore, visualize, and share data insights in real time.

Snowflake

Connect Snowflake to Bauplan to access your data as an Iceberg REST catalog, integrating seamlessly with your object store.

Marimo

A reactive Python notebook and app framework that allows developers to build and share interactive tools entirely in Python.

Streamlit

A Python framework that turns scripts into interactive web apps for data exploration, dashboards, and internal tools.

Jupiter Notebooks

An interactive environment for Python that supports reproducible research, data analysis, and collaborative experimentation.

Temporal

A Python SDK that lets developers build resilient workflows and manage complex asynchronous tasks directly from their code.

Dagster

An orchestrator built for data applications, offering type-safe, observable, and testable pipelines for production workloads.

DBOS

A distributed operating system that provides high reliability, scalability, and consistency for workflow-driven applications.

Airflow

One of the most widely adopted orchestrators in data engineering, designed to programmatically author, schedule, and monitor workflows.

Prefect

A modern workflow orchestration platform built in Python that simplifies task automation, monitoring, and error handling across teams.

Built With Bauplan

Examples from the field. Real data applications built with Bauplan.

SEE ALL EXAMPLES

prefect

pandas

iceberg

Iceberg Lakehouse and WAP

Orchestrated Write-Audit-Publish pattern for ingesting parquet files to Iceberg tables.

Chris White

CTO @Prefect

RAG

Pinecone

OpenAI

RAG system with Pinecone and OpenAI

Build a RAG system with Pinecone and OpenAI over StackOverflow data.

Ciro

CEO @bauplan

PyArrow

Pandas

DuckDB

Data Quality and Expectations

Implement data quality checks using expectations.

Jacopo

CTO @bauplan

PDF

Open AI

Pandas

PDF analysis with OpenAI

Analyze PDFs using Bauplan for data preparation and OpenAI’s GPT for text analysis.

Patrick Chia

Founding Eng

duckdb

prefect

streamlit

Near Real-time Analytics

Build near real-time analytics pipeline with WAP pattern and visualize metrics with Streamlit.

Sam Jafari

Dir. Data and AI

dbt

CI/CD

marts

dbt-style Pipelines with CI/CD and Version Control

dbt workflows VS Bauplan pipelines with branching, testing, and CI/CD

Yuki Kakegawa

Staff Data Eng

A whole data platform in your code

Like Git for your data systems

Branch your data instantly

Create data branches in seconds. Power sandboxing, write-audit-publish workflows, and safe experimentation at scale.

Safe, declarative data pipelines

Test and run pipelines in isolated branches. Automate validations, merge with confidence, and roll back anytime.

LEARN MORE

import bauplan 
client = bauplan.Client() 
# Create a new branch
my_b = client.create_branch(
    branch='import_branch',
    from_ref='main'
)
    
# Create a new table in the branch
new_table = client.create_table(
    table='your_table_name',
    search_uri='s3://bucket/*.parquet',
    branch=my_b
)

Zero infrastructure, full control

import bauplan 

@bauplan.model()
# Define Python env with package versions in code
@bauplan.python(pip={'pandas': '2.2.0'})
def clean_data(data=bauplan.Model('my_data')):
    
  import pandas as pd
  df = data.to_pandas()
  df_cleaned = df.dropna()
  return df_cleaned

Pythonic and fully managed

Write modular functions in Python or SQL. Bauplan handles execution, scaling, and table I/O. No configs, containers, or runtime plumbing.

Run everything from your IDE

Declare your infrastructure directly in code. No Dockerfiles, no divergence between dev and prod. What you test is what you ship.

LEARN MORE

Code-first, designed for automation

A programmable data lakehouse

Write every step of your pipeline as code. Version everything — business logic, data, environments — just like software.

From commit to CI/CD, reproducible by default

Each run is tied to a commit. Validate before merging. Everything is deterministic, traceable, and rollback-ready.

Built for developers…and AI agents

Modular, versioned pipelines as code, isolated, reproducible, and infrastructure-free. Built for developers. Turns out, perfect for agents too.

import bauplan 

client = bauplan.Client() 

# Create a development branch 
_b = client.create_branch('my_b', from_ref='main') 

# Run the pipeline on it 
client.run('./my_project', ref=_b) 

# Inspect recent commits 
for commit in client.get_commits(ref=_b): 
     print(commit.message) 

# Merge changes into main 
client.merge_branch(_b, into_branch='main')

FAQs

What if I already have Databricks or Snowflake?

Great! Bauplan is built to be fully interoperable. All the tables produced with Bauplan are persisted as Iceberg tables in your S3, making them accessible to any engine and catalog that supports Iceberg. Our clients use Bauplan together with Databricks, Snowflake, Trino, AWS Athena and AWS Glue, Kafka, Sagemaker, etc.

What does Bauplan replace in my AWS data stack?

Bauplan simplifies your AWS setup by consolidating EMR, Spark, Kubernetes and Athena with simple serverless functions running on S3 branches. You continue using S3, and optionally Glue or Airflow, while the rest of your stack becomes simpler.

How do you keep my data secured?

Your data stays in your own S3 bucket at all times. Bauplan processes it securely using either Private Link (connecting your S3 to your dedicated single-tenant environment) or entirely within your own VPC using Bring Your Own Cloud (BYOC).

Do I need to learn a new data framework or Domain-Specific Language (DSL)?

No. Bauplan provides a lightweight Python framework, not a DSL. We want Bauplan to fit naturally into well-established engineering workflows: functions, modular code, tests, CI/CD. We believe that there are enough custom frameworks, DataFrame APIs and DSLs. The world does not need another one.

What does Git-for-Data mean?

Bauplan allows you to use git abstractions like branches, commits and merges to work with your data. You can create data branches for your data lake to isolate data changes safely and enable experimentation without affecting production, and use commits to time-travel to previous versions of your data, code and environments in one line of code. All this, while ensuring transactional consistency and integrity across branches, updates, merges, and queries. Learn more.

Your data lakehouse, built like software

The data platform your team would build…if they had the time