Fireside chat: Rethinking the Semantic Layer | June 16

Fireside chat: Rethinking the Semantic Layer- The Builders Response | June 16 | 9am PT

Talks & Podcasts

How to Make Your Data Science Reproducible (and Why You Should Care)

As the Lakehouse architecture becomes more widespread, ensuring the reproducibility of data workloads over data lakes emerges as a crucial concern for data engineers. However, achieving reproducibility remains challenging. The size of data pipelines contributes to slow testing and iterations, while the intertwining of business logic and data management complicates debugging and increases error susceptibility. In this paper, we highlight recent advancements made at Bauplan in addressing this challenge. We introduce a system designed to decouple compute from data management, by leveraging a cloud runtime alongside Nessie, an open-source catalog with Git semantics. Demonstrating the system's capabilities, we showcase its ability to offer time-travel and branching semantics on top of object storage, and offer full pipeline reproducibility with a few CLI commands.

Ciro Greco, Ex- VP of AI at Coveo. Ph.D. in Linguistics and Cognitive Neuroscience at Milano-Bicocca. Ciro worked as visiting scholar at MIT and as a post-doctoral fellow at Ghent University. Currently "Building something new" at BauplanIn 2017, Ciro founded Tooso.ai, a San Francisco-based startup specializing in Information Retrieval and Natural Language Processing. Tooso was acquired by Coveo in 2019. Since then Ciro has been helping Coveo with DataOps and MLOps throughout the turbulent road to IPO.

Chapters

00:00 - Intro

Previous
Next