Bauplan is a cloud-native lakehouse platform for engineering teams who treat data like software.
Ship pipelines without managing infrastructure, using a specialized Python runtime, Git-for-Data built on Apache Iceberg, and just a few simple APIs.
Bauplan replaces Spark, Kubernetes, metadata catalogs, and custom platform glue with one cohesive system. Your team just writes Python and SQL. We handle execution, isolation, data versioning, and scale.
Use data branches to experiment, validate, and test your workloads in isolation. Every change is versioned, auditable, and reversible. No mistake is ever final.
Bauplan runs production-grade workloads without the overhead of traditional platforms. Deploy with single-tenant, private link and BYOC options. Your data stays in object storage, no data movement needed.
SOC 2 Type 2 compliant with built-in isolation and access controls.
Create data branches in seconds. Power sandboxing, write-audit-publish workflows, and safe experimentation at scale.
Test and run pipelines in isolated branches. Automate validations, merge with confidence, and roll back anytime.
Write modular functions in Python or SQL. Bauplan handles execution, scaling, and table I/O. No configs, containers, or runtime plumbing.
Declare your infrastructure directly in code. No Dockerfiles, no divergence between dev and prod. What you test is what you ship.
Write every step of your pipeline as code. Version everything — business logic, data, environments — just like software.
Each run is tied to a commit. Validate before merging. Everything is deterministic, traceable, and rollback-ready.
Modular, versioned pipelines as code, isolated, reproducible, and infrastructure-free. Built for developers. Turns out, perfect for agents too.
Great! Bauplan is built to be fully interoperable. All the tables produced with Bauplan are persisted as Iceberg tables in your S3, making them accessible to any engine and catalog that supports Iceberg. Our clients use Bauplan together with Databricks, Snowflake, Trino, AWS Athena and AWS Glue, Kafka, Sagemaker, etc.
Bauplan simplifies your AWS setup by consolidating EMR, Spark, Kubernetes and Athena with simple serverless functions running on S3 branches. You continue using S3, and optionally Glue or Airflow, while the rest of your stack becomes simpler.
Your data stays in your own S3 bucket at all times. Bauplan processes it securely using either Private Link (connecting your S3 to your dedicated single-tenant environment) or entirely within your own VPC using Bring Your Own Cloud (BYOC).
No. Bauplan provides a lightweight Python framework, not a DSL. We want Bauplan to fit naturally into well-established engineering workflows: functions, modular code, tests, CI/CD. We believe that there are enough custom frameworks, DataFrame APIs and DSLs. The world does not need another one.
Bauplan allows you to use git abstractions like branches, commits and merges to work with your data. You can create data branches for your data lake to isolate data changes safely and enable experimentation without affecting production, and use commits to time-travel to previous versions of your data, code and environments in one line of code. All this, while ensuring transactional consistency and integrity across branches, updates, merges, and queries. Learn more.