Abstract
Python is the most widespread and the fastest growing programming language today, driven by AI and ML.
Yet, ML infrastructure remains fragmented — engineers struggle with low-level file management, manual versioning, and inefficient data movement. Thanks to PyIceberg we can build a solid foundation for ML engineers with table abstractions and ACID transactions over object storage.
With time travel, versioning, and zero-copy, Iceberg provides the right primitives for dataset iteration, model versioning and reproducibility for ML workflows. While PyIceberg provides a simple way to work with Iceberg tables in Python, running Python in the cloud to support large datasets, optimized compute, and minimal DevOps remains a challenge.
In this talk, we’ll show how an optimized Python runtime integrated with Iceberg unlocks seamless pipeline development.
Speaker

Jacopo Tagliabue
Founder and CTO at Bauplan
Throughout his career, he has been fortunate enough to collaborate with incredible folks in industry and academia (e.g. Netflix, NVIDIA, Stanford, Univ. of Wisconsin-Madison), and publish contributions in a variety of fields: Information Retrieval (RecSys, SIGIR), Data Science (KDD), Artificial Intelligence and NLP (ICML, NAACL), Data Management (SIGMOD, VLDB), Computer Systems (Middleware). While building his new company, he is teaching ML Systems at NYU, which is mostly notable because it is the only job he ever had that his parents understand.