Bring your data and code, we do the rest.
Branch. Create sandboxed branches of your data lake to develop pipelines without disrupting your production applications.
Run. Build complex SQL and Python pipelines, without dealing with containers, compute clusters and infrastructure.
Query. Run complex queries to explore data and power your data applications with the same runtime.
Merge. Integrate all your data workflows with your orchestration and CI/CD.
Data Lake version control
Instant branching of your data lake
Enable teams to develop new pipelines and create new tables, while maintaining data integrity and system performance. Move fast, don’t break things.
Make everything reproducible
Keep track of all changes in both your data and your code and program all your workflows with a few lines of Python: every issue can be reproduced, every incident can be rolled back.
Avoid lock-in
Keep your data in object storage and use Iceberg tables for seamless query engine and system integration. Your code is fully abstracted from infrastructure, eliminating the need for refactoring.
Serverless runtime
10x better developer experience
Deploy data pipelines in the cloud in seconds from code. No special skills required, no need to deal with containerization, compute provisioning and cluster configurations ever again. Just SQL and Python.
No environment management
Define containers and environment requirements directly in code for each workload function. Never worry about environment maintenance and backward compatibility.
Interactive SQL analytics
Explore data and build real-time analytics applications. Use one compute engine for for both pipelines and synchronous queries.