Launching Bauplan MCP Server: the First Step towards the Agentic Lakehouse

Launching Bauplan MCP Server

Write, Audit, Publish.

Ship Data Safely to Production. Every data change tested. Every deployment validated. Zero production incidents.

Quote from Tim, Director of Data Engineering @ Trust & Will
01

What is Write-Audit-Publish?

Write-Audit-Publish (WAP) is the gold standard for reliable and secure data pipelines:
create isolated data branches, run quality checks automatically, and merge only when validated. All with simple Python commands.

1. Write

Changes are written to an isolated branch of your data lake, not in production

2. Audit

Automated tests are performed to validate data quality and business logic

3. Publish

Only the data that pass the tests is merged in the production data lake

Data pipelines fail in mysterious ways

A type mismatch, a missing column, or a bad Parquet file upstream can silently poison dashboards, models, and APIs. By the time anyone notices, it’s 2 AM, and the incident is already burning hours of triage while eroding trust with stakeholders.

Bauplan gives you WAP out of the box: one line to branch, one line to test, and one line to merge. Every change runs in isolation, gets audited automatically, and only lands in production when it’s proven safe.

Write audit publish infographic - bauplan
02

Why teams choose Bauplan for WAP

Zero production incidents

Changes are tested in complete isolation. Bad data never reaches your dashboards, ML models, or downstream consumers. Your data SLAs stay intact.

Deploy 10x faster

No more 2 AM rollbacks or weekend firefighting. Developers work with familiar git-like workflows: branch, test, merge. What took months to implement now takes hours.

Automated data quality

Define data quality tests and expectations using Bauplan's standard expectations library or writing your custom tests. Every merge automatically validates data quality, before reaching production.

Instant rollbacks

Every change is versioned. Made a mistake? One command rolls back to any previous state. Everything in your data lakehouse has now an undo button.

03

How Bauplan makes WAP simple

Zero-copy branches in one line

Create unlimited test environments without storage costs.
Each branch is a full zero-copy of your data lake that uses no additional storage.

import bauplan


# Initialize a Bauplan client
client = bauplan.Client()

# Name of the branch you want to create
branch = "import_branch"

# create a new isolated data branch
client.create_branch(
    branch=branch,       # new branch name
    from_ref="main"      # create branch from "main"
)

# Confirm success
print(f"✅ '{branch}' created.")
from bauplan import standard_expectations

@bauplan.expectation()
@bauplan.python("3.11")
def test_nulls(data=bauplan.Model("your_table")):
    col_to_check = "timestamp"
    
    is_passed = expect_column_no_nulls(data, col_to_check)
    assert is_passed, f"❌ nulls found in {col}"
    
    return is_passed

Automated data quality tests

Run quality checks with Bauplan's standard expectations library, custom Python logic or SQL assertions. Bauplan checks the quality of your data at runtime, before creating the data asset so you don’t waste time and compute.

Atomic merges with built-in validation

Merges are transactional and gated by expectations. Either all changes apply with validated quality, or none do. Production stays consistent even if merges fail.

import bauplan
client = bauplan.Client()
# Create a development branch *b = client.createbranch('my*b', from_ref='main')
# Run the pipeline on it client.run('./my_project', ref=_b)
# Inspect recent commits for commit in client.get_commits(ref=_b): print(commit.message)
# Merge changes into main client.merge_branch(_b, into_branch='main')

import bauplan
client = bauplan.Client()
# 1) Create a zero-copy branch
branch = client.create_branch(branch="import_branch", from_ref="main")
try:
    # 2). Create a new Iceberg table
    client.create_table(table="new_table", search_uri="s3://...", branch=branch)
    client.import_data(table="new_table", search_uri="s3://...", branch=branch)
    
    # 3) Run data quality tests
    run_state = client.run(project_dir="your/pipeline/dir", ref=branch)
    if run_state.job_status.lower() == "failed":
        raise Exception(f"{run_state.job_id} failed: {run_state.job_status}")
    
    # 4) Merge only if the run + expectations passed
    client.merge_branch(branch, "main")
    print(f"{branch} has been merged into main.")
except bauplan.exceptions.BauplanError as e:
    raise Exception(f"Something went wrong on {branch}: {e}")
04

Integrations

Prefect logoprefect logo
Orchestra logoOrchestra logo
dbos logoDbos logo
Apache Airflow logoApache Airflow logo
arrow left
arrow right
05

Built with Bauplan

prefect
pandas
iceberg

Iceberg Lakehouse and WAP

Orchestrated Write-Audit-Publish pattern for ingesting parquet files to Iceberg tables.

Chris White
CTO @Prefect
RAG
Pinecone
OpenAI

RAG system with Pinecone and OpenAI

Build a RAG system with Pinecone and OpenAI over StackOverflow data.

Ciro
CEO @bauplan
PyArrow
Pandas
DuckDB

Data Quality and Expectations

Implement data quality checks using expectations.

Jacopo
CTO @bauplan
PDF
Open AI
Pandas

PDF analysis with OpenAI

Analyze PDFs using Bauplan for data preparation and OpenAI’s GPT for text analysis.

Patrick Chia
Founding Eng
duckdb
prefect
streamlit

Near Real-time Analytics

Build near real-time analytics pipeline with WAP pattern and visualize metrics with Streamlit.

Sam Jafari
Dir. Data and AI
dbt
CI/CD
marts

dbt-style Pipelines with CI/CD and Version Control

dbt workflows VS Bauplan pipelines with branching, testing, and CI/CD

Yuki Kakegawa
Staff Data Eng
arrow left
arrow right