Bauplan's architecture is organized around three layers:
Control Plane (multi-tenant). Handles metadata, planning, and orchestration. No user data reaches this layer. It parses Python decorators and SQL statements to build a logical DAG of functions and package dependencies. It resolves the Iceberg catalog to determine the exact state of tables, branches, and commits. It compiles an optimized physical execution plan.
Data Plane (managed or BYOC). Executes functions securely within isolated, tenant-specific environments. The serverless runtime spins up ephemeral containers for each function invocation, optimized for fast startup and minimal overhead. Data moves between functions through zero-copy Arrow tables and Arrow Flight.
Storage Layer (user-owned). Bauplan interacts directly with your existing object storage. All data is stored in open formats (Parquet) in your own S3 or compatible buckets. Iceberg tables provide schema evolution, time travel, and transactional updates. An immutable metadata layer tracks all operations for versioning, reproducibility, and full auditability.
Bauplan offers two deployment models. Single-tenant Private Link: Bauplan manages a dedicated, SOC2-compliant cloud environment linked to your object storage. Data stays in your cloud with no S3 egress costs. Bring Your Own Cloud (BYOC): the entire Bauplan runtime deploys within your existing VPC, under your control, with no external data traffic.
Bauplan is built on Apache Iceberg for table management and versioning. It uses Apache DataFusion for SQL planning and execution, and Apache Arrow for in-memory data exchange. The compute layer is a custom serverless runtime built on FaaS infrastructure.
Engineers interact through the Python SDK (bauplan package on PyPI), the CLI, or the MCP server. Pipelines are written in Python and can use any Python library (Polars, Pandas, scikit-learn, etc.) through declared dependencies. SQL is supported for exploration and verification. Bauplan supports ingestion from CSV, Parquet, and JSONL files stored in S3.
Bauplan is built on open-source foundations (Apache Iceberg, Apache DataFusion, Apache Arrow) and contributes to these communities. The platform itself is a commercial product.