Mar 17 | Trust AI with your data

You trust AI with your codebase. What about your data? Live demo Mar 17

Frequently Asked Questions

March 3, 2026

Is Bauplan an orchestrator like Airflow?

No. Bauplan provides an optimized data runtime and data management, while Airflow is an orchestrator. They solve different problems and sit at different layers of the stack. Airflow handles scheduling, retry logic, and fan-out. Bauplan handles execution, isolation, and transactional publication of data changes. Teams using Bauplan keep their orchestrator for scheduling and workflow coordination. Bauplan integrates with Airflow, Prefect, Dagster, Temporal, and others: you call Bauplan functions and DAGs as orchestrator tasks, and Bauplan's runtime executes them.

Does Bauplan replace dbt?

It can, but it does not have to. Bauplan sits at a lower level of the stack, closer to Databricks, Snowflake or BigQuery than to dbt. It provides the execution runtime, the data versioning layer, and the transactional guarantees. dbt compiles SQL templates and relies on a warehouse to execute them. Bauplan runs Python and SQL pipelines on its own serverless compute with branch isolation and atomic publication. Some teams migrate dbt pipelines to Bauplan entirely (including through fully AI-driven migrations). Others use both, running dbt on top of Bauplan as the execution substrate.

Does Bauplan work with my existing data lake?

Yes. Bauplan runs on Apache Iceberg tables stored in your object storage (S3, GCS). It connects to your existing Iceberg catalog or provides its own. Existing tables remain where they are. Every table produced by Bauplan is accessible to any engine that supports Apache Iceberg: Snowflake, Databricks, Trino, Athena, and others.

Does Bauplan work with my existing data warehouse?

Yes. Bauplan integrates with Snowflake, BigQuery, and other Iceberg-capable warehouses through catalog federation. Snowflake connects to Bauplan's Iceberg REST catalog and creates externally managed Iceberg tables that point to your Bauplan-managed data in S3. Snowflake users query the data with standard SQL. No data is copied. Your existing ingestion tools (Fivetran, Airbyte) continue to land data where they always have. Your BI tools (Metabase, Looker) continue to query from the warehouse. Bauplan adds the execution and change-management layer underneath.

Does Bauplan run Spark under the hood?

Bauplan does not run Spark. It operates with a serverless execution model using ephemeral containers specifically optimized for Big Data workloads. PySpark functions are supported for migration convenience, but execute on a single-node architecture.

How many new tools or frameworks do I need to learn?

Zero. Bauplan uses standard Python and SQL. Pipelines are written in familiar Python with standard libraries. There is no proprietary DSL, no new dataframe API, and no complex framework to learn.

What is the pricing model?

Bauplan has an agent friendly pricing model. Instead of charging purely per usage, which proves to be problematic when agents run compute instead of human developers, Bauplan uses flat monthly pricing based on reserved memory capacity.

Plans start at $700/month (Mini, 10GB memory) through $1,500/month (Small, 25GB) and $2,500/month (Medium, 50GB), with custom Enterprise pricing for 100GB+ workloads.

All plans include unlimited queries, unlimited users, and unlimited agents. There are no per-query fees, no warehouse credits, and no platform surcharges.

Enterprise plans include BYOC deployment and dedicated technical support. All tiers include SOC2 Type II compliance and GDPR compliance. A 14-day free trial is available.

Is Bauplan open source?

Bauplan is built on open-source foundations (Apache Iceberg, Apache DataFusion, Apache Arrow) and contributes to these communities. The platform itself is a commercial product.

What does the name Bauplan mean?

Bauplan is a term from evolutionary biology meaning "structural plan." It refers to the fundamental body plan shared by all members of a group. In data engineering, it represents the idea that safe, repeatable execution should be the fundamental architecture of every data workflow.