Launching Bauplan MCP Server: the First Step towards the Agentic Lakehouse

Launching Bauplan MCP Server

Your data lakehouse, built like software

Bauplan is a cloud-native lakehouse platform for engineering teams who treat data like software.
Ship pipelines without managing infrastructure, using a specialized Python runtime, Git-for-Data built on Apache Iceberg, and just a few simple APIs.

01

The data platform your team would build
…if they had the time

Built for speed and simplicity

Bauplan replaces Spark, Kubernetes, metadata catalogs, and custom platform glue with one cohesive system. Your team just writes Python and SQL. We handle execution, isolation, data versioning, and scale.

Experiment freely and roll back instantly with Git-for-Data

Use data branches to experiment, validate, and test your workloads in isolation. Every change is versioned, auditable, and reversible. No mistake is ever final.

Production ready with no migration

Bauplan runs production-grade workloads without the overhead of traditional platforms. Deploy with single-tenant, private link and BYOC options. Your data stays in object storage, no data movement needed.
SOC 2 Type 2 compliant with built-in isolation and access controls.

Bauplan infrastructure - Production ready with no migration
02

Built With Bauplan

Examples from the field. Real data applications built with Bauplan.
SEE ALL EXAMPLES
Arrow
arrow white
prefect
pandas
iceberg

Iceberg Lakehouse and WAP

Orchestrated Write-Audit-Publish pattern for ingesting parquet files to Iceberg tables.

Chris White
CTO @Prefect
RAG
Pinecone
OpenAI

RAG system with Pinecone and OpenAI

Build a RAG system with Pinecone and OpenAI over StackOverflow data.

Ciro
CEO @bauplan
PyArrow
Pandas
DuckDB

Data Quality and Expectations

Implement data quality checks using expectations.

Jacopo
CTO @bauplan
PDF
Open AI
Pandas

PDF analysis with OpenAI

Analyze PDFs using Bauplan for data preparation and OpenAI’s GPT for text analysis.

Patrick Chia
Founding Eng
duckdb
prefect
streamlit

Near Real-time Analytics

Build near real-time analytics pipeline with WAP pattern and visualize metrics with Streamlit.

Sam Jafari
Dir. Data and AI
dbt
CI/CD
marts

dbt-style Pipelines with CI/CD and Version Control

dbt workflows VS Bauplan pipelines with branching, testing, and CI/CD

Yuki Kakegawa
Staff Data Eng
arrow left
arrow right
03

A whole data platform in your code

Like Git for your data systems

Branch your data instantly

Create data branches in seconds. Power sandboxing, write-audit-publish workflows, and safe experimentation at scale.

Safe, declarative data pipelines

Test and run pipelines in isolated branches. Automate validations, merge with confidence, and roll back anytime.

import bauplan 
client = bauplan.Client() 
# Create a new branch
my_b = client.create_branch(
    branch='import_branch',
    from_ref='main'
)
    
# Create a new table in the branch
new_table = client.create_table(
    table='your_table_name',
    search_uri='s3://bucket/*.parquet',
    branch=my_b
)

Zero infrastructure, full control

import bauplan 

@bauplan.model()
# Define Python env with package versions in code
@bauplan.python(pip={'pandas': '2.2.0'})
def clean_data(data=bauplan.Model('my_data')):
    
  import pandas as pd
  df = data.to_pandas()
  df_cleaned = df.dropna()
  return df_cleaned

Pythonic and fully managed

Write modular functions in Python or SQL. Bauplan handles execution, scaling, and table I/O. No configs, containers, or runtime plumbing.

Run everything from your IDE

Declare your infrastructure directly in code. No Dockerfiles, no divergence between dev and prod. What you test is what you ship.

Code-first, designed for automation

A programmable data lakehouse

Write every step of your pipeline as code. Version everything — business logic, data, environments — just like software.

From commit to CI/CD, reproducible by default

Each run is tied to a commit. Validate before merging. Everything is deterministic, traceable, and rollback-ready.

Built for developers…and AI agents

Modular, versioned pipelines as code, isolated, reproducible, and infrastructure-free. Built for developers. Turns out, perfect for agents too.

import bauplan 

client = bauplan.Client() 

# Create a development branch 
_b = client.create_branch('my_b', from_ref='main') 

# Run the pipeline on it 
client.run('./my_project', ref=_b) 

# Inspect recent commits 
for commit in client.get_commits(ref=_b): 
     print(commit.message) 

# Merge changes into main 
client.merge_branch(_b, into_branch='main')
04

Latest From Our Blog

Python, SQL, Iceberg, version control, and superb DevEx.
READ ALL POSTS
Arrow
arrow white
05

FAQs

What if I already have Databricks or Snowflake?
Plus

Great! Bauplan is built to be fully interoperable. All the tables produced with Bauplan are persisted as Iceberg tables in your S3, making them accessible to any engine and catalog that supports Iceberg. Our clients use Bauplan together with Databricks, Snowflake, Trino, AWS Athena and AWS Glue, Kafka, Sagemaker, etc.

What does Bauplan replace in my AWS data stack?
Plus

Bauplan simplifies your AWS setup by consolidating EMR, Spark, Kubernetes and Athena with simple serverless functions running on S3 branches. You continue using S3, and optionally Glue or Airflow, while the rest of your stack becomes simpler.

How do you keep my data secured?
Plus

Your data stays in your own S3 bucket at all times. Bauplan processes it securely using either Private Link (connecting your S3 to your dedicated single-tenant environment) or entirely within your own VPC using Bring Your Own Cloud (BYOC).

Do I need to learn a new data framework or Domain-Specific Language (DSL)?
Plus

No. Bauplan provides a lightweight Python framework, not a DSL. We want Bauplan to fit naturally into well-established engineering workflows: functions, modular code, tests, CI/CD. We believe that there are enough custom frameworks, DataFrame APIs and DSLs. The world does not need another one.

What does Git-for-Data mean?
Plus

Bauplan allows you to use git abstractions like branches, commits and merges to work with your data. You can create data branches for your data lake to isolate data changes safely and enable experimentation without affecting production, and use commits to time-travel to previous versions of your data, code and environments in one line of code. All this, while ensuring transactional consistency and integrity across branches, updates, merges, and queries. Learn more.