In this workshop, we discuss our “Post-Modern Data Stack”, that is, a deconstruction of the MLOps stack we previously shared with the community. In particular, we join the “modern data stack” (Snowflake + dbt) with “modern MLOps” practices, using Metaflow to bridge the gap between data, training, and inference in a pure serverless fashion.
As usual, we refuse to work in a toy stack, and with toy data: leveraging our huge data release from last year, we walk through a real-world recommendation pipeline, going from raw data to a live endpoint serving predictions.
// BioEducated in several acronyms across the globe (UNISR, SFI, MIT), Jacopo Tagliabue was co-founder of Tooso, an A.I. company acquired by Coveo in 2019. Jacopo is currently the Director of A.I. at Coveo, shipping models to hundreds of customers and millions of users.
When not busy building products, Jacopo teaches MLSys at NYU and explores topics at the intersection of language, reasoning, and learning (with research work presented at NAACL, RecSys, ACL, SIGIR). In previous lives, he managed to get a Ph.D., do sciency things for a pro basketball team, and simulate a pre-Columbian civilization.
00:00 - Introduction to Jacopo Tagliabue
05:04 - You're not Google, and thats OK!
07:52 - Glamorizing ML
09:05 - Glamorizing = misinterpreting
10:46 - Anyone can do great ML!
12:53 - The Principles for ML at RS
14:07 - You don't need a bigger boat
19:47 - Pre-requisites[
21:05 - Snowflake
24:57 - dbt
32:50 - In line sql as opposed to within dbt
34:05 - Turn into a macro
38:37 - Artifacts in remote environment
39:38 - End step to tear down
47:57 - If Metaflow fails
55:53 - Integrations per specific products
57:42 - Self annotation metaphor to serialize and store
1:03:53 - Wrap up