Launching Bauplan MCP Server: the First Step towards the Agentic Lakehouse

Launching Bauplan MCP Server

Medallion Architecture

Build your medallion architecture in pure Python. No vendor lock-in, no infrastructure management.

Quote from Davide, Senior Data Scientist at Mediaset, explaining how Bauplan’s medallion architecture eliminated outages during traffic spikes and improved system efficiency.
01

What is the Medallion Architecture?

The medallion architecture is the industry-standard pattern for organize your data in a reliable and scalable way by moving it through three successive layers of validation and transformation in the Lakehouse.

Bronze

Raw, immutable data: your source of truth for audits and reprocessing.

Silver

Cleaned, flattened, validated, and standardized data, used to build downstream pipelines.

Gold

Business-ready metrics, aggregates, and ML features optimized for consumption.

The Medallion architecture separation delivers for you and your team:

Quality: Data quality issues are caught early, before they cascade.

Performance: Heavy transformations are done once, punctual aggregations are done.

Governance: Complete audit trail with documented transformation rules.

Team velocity: Engineers own Bronze/Silver, analysts and business users work directly with Gold.

Medallion architecture diagram
02

Why Teams Choose Bauplan for their Medallion Architecture

Pure Python, No Heavyweight Stack

Build the entire medallion flow in pure Python and SQL using Pandas, Polars and DuckDB. No Spark, no JVM, no cluster overhead. Bauplan handles execution with Arrow under the hood, so you focus purely on transformations and logic.

Data Quality Built In

Define data quality tests as code right next to your pipeline steps. Checks like null detection and uniqueness run directly on in-memory tables, failing fast if issues appear. No extra pipelines, no waiting to validate data, no extra compute wasted.

Integrated Versioning and Orchestration

Bauplan unifies your Iceberg lakehouse, data branching, and execution in one simple system. Spin up branches for medallion layers, run DAGs in isolation, and merge back seamlessly: no catalog wrangling, no custom glue code.

Namespace Separation for Medallion Layers

Bronze, Silver, and Gold datasets live in distinct namespaces, making lineage and promotion clear. Combined with branch-test-merge, this ensures production remains stable and reproducible as data progresses through each stage.

03

How Bauplan Makes Medallion Simple

Import data into the bronze layer

Raw data lands in the Bronze layer. Bauplan handles ingestion into Iceberg tables on isolated branches, so you can safely load from S3 without touching production.

import bauplan


def import_data_in_bronze_layer(table_name, import_branch, source_s3):
    client = bauplan.Client()
    try:
        # 1) Create + import table
        client.create_table(
            table=table_name, 
            search_uri=source_s3, 
            branch=import_branch, 
            namespace="bronze"
        )
        client.import_data(
            table=table_name, 
            search_uri=source_s3, 
            branch=import_branch, 
            namespace="bronze"
        )
    except bauplan.exceptions.BauplanError as e:
        raise Exception(f"🔴 Bronze import failed: {e}")
    
    # 2) Merge into main
    if not client.merge_branch(source_ref=import_branch, into_branch="main"):
        raise Exception("🔴 Merge failed.")
    print(f"✅ Bronze '{import_branch}' merged into main.")
import bauplan
from bauplan.standard_expectations import expect_column_no_nulls

@bauplan.model()
@bauplan.python('3.11', pip={'polars': '1.33.1'})
def silver_table(data=bauplan.Model('bronze_table')):
    import polars as pl
    
    # Convert input data into Polars and clean it
    df = data.to_polars()
    time_filter_utc = pl.Series(
        [pl.datetime(2022, 1, 1, 0, 0, 0, "UTC")]
    )[0]
    df = df.filter(pl.col("timestamp") >= time_filter_utc)
   
    return df.to_arrow()


@bauplan.expectation()
@bauplan.python("3.11")
def test_nulls(data=bauplan.Model('bronze_table')):
    col_to_check = "timestamp"    
    # Run expectation: Check for nulls and fail if nulls are found
    is_passed = expect_column_no_nulls(data, col_to_check)
    assert is_passed, f"❌ nulls found in {col_to_check}"
    
    return is_passed

Transform and check for data quality

Silver tables are created with Python models and SQL. Transformations live alongside expectations, so data quality checks run as part of the pipeline. Fail fast, fix early.

Validate tables and promote to Gold

Validated data moves into the Gold layer. Aggregations, joins, and business-ready tables are promoted only after tests succeed, keeping production consistent and reliable.

import bauplan


def promote_to_gold(
    client: bauplan.Client,
    pipeline_dir: list,
    branch: str,
):
    # 1) Run the pipeline on a separate branch
    run_state = client.run(
        project_dir=pipeline_dir, 
        ref=branch, 
        namespace="gold"
    )
    # 2) Check for failures
    if run_state.job_status.lower() == "failed":
        raise Exception(f"{run_state.job_id} failed: {run_state.job_status}")
        
    # 3) Merge the silver branch into main (publish to gold)
    assert client.merge_branch(source_ref=branch, into_branch="main"), (
        "Something went wrong while merging into main."
    )
04

Integrations

Prefect logoprefect logo
Orchestra logoOrchestra logo
dbos logoDbos logo
Apache Airflow logoApache Airflow logo
arrow left
arrow right
04

Case Study: Mediaset

Challenge: Europe's 2nd largest broadcaster with 65M daily viewers. Breaking news events overloaded their Spark-based medallion architecture, causing dashboard failures during critical moments.
Solution: Rebuilt medallion architecture with Bauplan

Bronze

Raw event ingestion from 50+ sources

Silver

Standardized events with business validation

Gold

Real-time metrics and aggregations

Results:

Dashboard response: 45 seconds → 2 seconds

New data source integration: 2 weeks → 2 hours

Infrastructure cost reduction: 85%

System stability during peak events: Achieved

04

Built with Bauplan

prefect
pandas
iceberg

Iceberg Lakehouse and WAP

Orchestrated Write-Audit-Publish pattern for ingesting parquet files to Iceberg tables.

Chris White
CTO @Prefect
RAG
Pinecone
OpenAI

RAG system with Pinecone and OpenAI

Build a RAG system with Pinecone and OpenAI over StackOverflow data.

Ciro
CEO @bauplan
PyArrow
Pandas
DuckDB

Data Quality and Expectations

Implement data quality checks using expectations.

Jacopo
CTO @bauplan
PDF
Open AI
Pandas

PDF analysis with OpenAI

Analyze PDFs using Bauplan for data preparation and OpenAI’s GPT for text analysis.

Patrick Chia
Founding Eng
duckdb
prefect
streamlit

Near Real-time Analytics

Build near real-time analytics pipeline with WAP pattern and visualize metrics with Streamlit.

Sam Jafari
Dir. Data and AI
dbt
CI/CD
marts

dbt-style Pipelines with CI/CD and Version Control

dbt workflows VS Bauplan pipelines with branching, testing, and CI/CD

Yuki Kakegawa
Staff Data Eng
arrow left
arrow right