PyTorch Day France: Full Schedule

7 May 2025 | Paris, France
View More Details & Registration

The Sched app allows you to build your schedule but is separate from your event registration. Please visit the GOSIM AI Paris Website registration page for more details.

This schedule is automatically displayed in Central European Summer Time. To see the schedule in your preferred timezone, select from the drop-down menu located at the bottom of the menu to the right.

09:30 CEST

GOSIM Keynote - Matt White, Linux Foundation

Wednesday May 7, 2025 09:30 - 10:00 CEST

Station 5

Speakers

Matt White

GM of AI, Executive Director, PyTorch, Linux Foundation

Matt White is the Executive Director of the PyTorch Foundation and GM of AI at the Linux Foundation. He is also the Director of the Generative AI Commons. Matt has nearly 30 years of experience in applied research and standards in AI and data in telecom, media and gaming industries... Read More →

Wednesday May 7, 2025 09:30 - 10:00 CEST
Station 5

10:00 CEST

AM BREAK

Wednesday May 7, 2025 10:00 - 10:30 CEST

STATION F

Wednesday May 7, 2025 10:00 - 10:30 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Registration/Breaks/Special Events

10:30 CEST

Welcome & Opening Remarks - Matt White, Linux Foundation

Wednesday May 7, 2025 10:30 - 10:50 CEST

STATION F

Speakers

Matt White

GM of AI, Executive Director, PyTorch, Linux Foundation

Wednesday May 7, 2025 10:30 - 10:50 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

10:50 CEST

Real-World Robotics as the Next Frontier for AI? - Pierre Rouanet, Pollen Robotics

Wednesday May 7, 2025 10:50 - 11:10 CEST

STATION F

As AI continues to push the boundaries of perception and decision-making, robotics emerges as one of its most exciting and demanding playground. In this talk, we’ll explore how the intersection of machine learning and robotics opens up powerful avenues for interaction, manipulation, and embodied intelligence. We will emphasize the critical role of real-world experimentation and data collection in bridging the gap between simulation and deployment. Interestingly, tasks traditionally viewed as complex, like locomotion, have seen significant progress, while seemingly simple behaviors—such as dexterous manipulation—remain open challenges. By grounding AI systems in physical environments, we gain deeper insight into their capabilities and limitations, and identify new directions for research at the intersection of learning, control, and embodiment.

Speakers

Pierre Rouanet

Co-Founder & CTO, Pollen Robotics

Wednesday May 7, 2025 10:50 - 11:10 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

11:10 CEST

TorchCodec: The Media Decoding Library for PyTorch - Nicolas Hug, Meta

Wednesday May 7, 2025 11:10 - 11:30 CEST

STATION F

TorchCodec is a new PyTorch library for decoding video and audio data into tensors, on CPU and CUDA GPU. It aims to be fast, easy to install, easy to use, and well integrated into the PyTorch ecosystem. In this talk, we’ll present the various decoding capabilities of TorchCodec, how to sample video frames, and we’ll describe more advanced use-cases like streaming videos from the cloud.

Speakers

Nicolas Hug

ML Research Engineer, Meta

Nicolas is a software engineer in the PyTorch team at Meta, where he mainly contributes to the torchvision library. Prior to that, Nicolas was a research scientist at Columbia University, where he became part of the scikit-learn core development team. Nicolas holds a PhD in machine... Read More →

Wednesday May 7, 2025 11:10 - 11:30 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

11:30 CEST

Scaling LLM Inference with vLLM: Multi‑Accelerator Serving and Quantized LLMs - Erwan Gallen & Eldar Kurtic, Red Hat

Wednesday May 7, 2025 11:30 - 11:50 CEST

STATION F

vLLM has become the community-standard engine for low-latency LLM inference, achieving a 10× increase in usage in 2024 and surpassing 100,000 daily installs by January 2025. Supported by hundreds of contributors and productized through Red Hat AI, vLLM provides a vendor-neutral solution for serving cutting-edge models at scale. This talk outlines a practical blueprint for scaling LLM inference using vLLM, integrating both system-level techniques and model-level optimizations.

We begin by addressing the challenges of deploying LLMs with chain-of-thought reasoning in production. Leveraging vLLM’s engine architecture, multi-accelerator deployments using tensor parallelism, paged attention scheduling, and prefill–decode disaggregation demonstrate how a single node can efficiently drive multiple AI accelerators, enhancing throughput without compromising latency.

The second optimization layer focuses on quantization. Based on over 500,000 evaluations across language and vision-language models, we examine the accuracy–speed trade-offs of weight and activation quantization. We introduce new pathways that significantly reduce memory usage while maintaining model quality. Attendees will leave with data-driven insights and ready-to-use configurations for deploying state-of-the-art quantized models in scalable enterprise inference pipelines.

Speakers

Erwan Gallen

RedHat

Eldar Kurtic

RedHat

Wednesday May 7, 2025 11:30 - 11:50 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

12:00 CEST

LUNCH

Wednesday May 7, 2025 12:00 - 14:00 CEST

STATION F

Wednesday May 7, 2025 12:00 - 14:00 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Registration/Breaks/Special Events

14:00 CEST

Llama 4 - Christian Keller, Meta

Wednesday May 7, 2025 14:00 - 14:20 CEST

STATION F

This presentation explores the development of Llama 4, a state-of-the-art foundation model designed to excel in various tasks. We will discuss its key features, including long=context and multimodal understanding.
We will also examine Llama 4's potential uses in agentic settings, such as autonomous decision-making and human-AI collaboration, through real-world examples and case studies.

Speakers

Christian Keller

Product Manager, Meta

Christian Keller is a Product Manager at Meta AI leading product for PyTorch. He works on enabling AI at scale for the PyTorch community and billions of Meta AI users. Prior to this, Christian was an entrepreneur with a dual machine learning engineer and business background. He has... Read More →

Wednesday May 7, 2025 14:00 - 14:20 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

14:20 CEST

Harnessing Common Crawl for AI and ML applications - Pedro Ortis, Common Crawl

Wednesday May 7, 2025 14:20 - 14:40 CEST

STATION F

This presentation looks at effective strategies for using Common Crawl's web archive in large-scale research applications, specifically for AI and other ML applications. We will discuss practical approaches to processing and filtering Common Crawl’s datasets, with focus on how to overcome computational challenges and optimise data pipelines. We will also discuss some of the challenges that users might encounter related to the multilingual and heterogeneous nature of Common Crawl’s data. The talk will cover best practices for data filtering, pre-processing, and storage, to ensure the quality and relevance of extracted information for research tasks. Additionally, we will briefly discuss the ranking mechanism used to determine whether a URL is crawled, and demonstrate how to use the Web Graph as a framework for further research.

Speakers

Pedro Ortis

Senior Research Scientist, Common Crawl

Pedro is a senior research scientist at the Common Crawl Foundation. He holds a PhD in computer science and Natural Language Processing from Sorbonne Université. Pedro’s research has mainly focused on how data quality impacts ML models’ performance and how to improve these models... Read More →

Wednesday May 7, 2025 14:20 - 14:40 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

14:40 CEST

The Ultra-Scale Talk: Scaling Training to Thousands of GPUs - Nouamane Tazi, Hugging Face

Wednesday May 7, 2025 14:40 - 15:00 CEST

STATION F

Training large language models (LLMs) demands more than just raw compute—it requires infrastructure, strategy, and a deep understanding of parallelism. What begins as a single-GPU prototype must eventually scale across thousands of devices, each step introducing new complexity.

This talk dives into the practicalities of ultra-scale training. We'll explore how 5D parallelism—spanning data, tensor, pipeline, context, and expert dimensions—makes it possible to stretch a single training run across massive GPU clusters. Along the way, we’ll cover performance tuning, communication patterns, and architecture choices that impact throughput and hardware efficiency.

A key reference for this session is the Ultra-Scale Playbook, which distills best practices and hard-earned lessons from real-world LLM scaling efforts. We’ll walk through highlights of the playbook, tying them into case studies, benchmarks, and hands-on recommendations.

Scaling isn’t just about size—it’s about doing more with what you have. This webinar offers a comprehensive look at what it really takes to train state-of-the-art models at scale, designed for engineers, researchers, and practitioners ready to move beyond “it fits on one GPU” toward infrastructure that powers trillion-parameter models—efficiently, and at speed.

Speakers

Nouamane Tazi

ML Research Engineer, Hugging Face

Ultra-scaling wizard 🧙🏻

Wednesday May 7, 2025 14:40 - 15:00 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

15:00 CEST

Teaching Mistral to Reason: Post-Training with PyTorch and NVIDIA - Meriem Bendris, NVIDIA

Wednesday May 7, 2025 15:00 - 15:20 CEST

STATION F

Post-training techniques have become essential as demand for Reasoning AI systems explodes. This talk provides a practical overview of how to enhance the reasoning capabilities of open-weight models—using Mistral as a working example. We’ll explore the full pipeline: sourcing high-quality reasoning datasets, selecting the right model checkpoints, and using tools that extend the functionality of PyTorch like NVIDIA NeMo and TensorRT-LLM. Whether you’re working on chatbots, agents, or task-specific models, you’ll leave with a clear understanding of the tools and workflows to take advantage of open models.

Speakers

Meriem Bendris

NVIDIA

Wednesday May 7, 2025 15:00 - 15:20 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

15:20 CEST

PM BREAK

Wednesday May 7, 2025 15:20 - 15:40 CEST

STATION F

Wednesday May 7, 2025 15:20 - 15:40 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Registration/Breaks/Special Events

15:40 CEST

DeepSpeed – Efficient Training Scalability for Deep Learning Models - Olatunji Ruwase, SnowFlake

Wednesday May 7, 2025 15:40 - 16:00 CEST

STATION F

Deep Learning (DL) is driving unprecedented progress across Artificial Intelligence domains, including natural language processing, vision, speech, and multimodal. Sustaining this rapid pace of AI revolution, however, requires practical solutions to the extreme demands of scaling on the compute, memory, communication, and storage components of modern computing hardware. To address this challenge, we created a deep learning optimization library called DeepSpeed to make distributed model training efficient, effective, and easy on commodity hardware. This talk will focus on DeepSpeed optimizations for improving compute, communication, and I/O of extreme-scale model training.

Speakers

Olatunji Ruwase

SnowFlake

Deep Learning StorageCompilers

Wednesday May 7, 2025 15:40 - 16:00 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

16:00 CEST

Advancing Mamba in PyTorch - Garrett Goon, IBM

Wednesday May 7, 2025 16:00 - 16:20 CEST

STATION F

Mamba layers are efficient alternatives to standard attention: their training complexity is linear in sequence length, while inference is sequence-length-independent and only requires a small cache. I will discuss a selection of IBM's ongoing work in advancing the state of mamba training in pytorch, including: context-parallel training for long-sequence data, mamba + mixture-of-expert support with expert parallelism, torch-native associative scan ops, and improved DTensor op support.

Speakers

Garrett Goon

IBM

Wednesday May 7, 2025 16:00 - 16:20 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

16:20 CEST

Thunder: Supercharged PyTorch for Modern Hardware - Luca Antiga, Lightning AI

Wednesday May 7, 2025 16:20 - 16:40 CEST

STATION F

Modern GPUs like Hopper and Blackwell are fast, but only after careful optimization. Thunder compiles “education-style” PyTorch models into optimized, distributed PyTorch code. Through a composable plugin system, Thunder lets developers layer in kernel fusion, low-precision operations, memory optimizations, and flexible parallelism strategies, to achieve performance and scale while leaving the original PyTorch code unchanged. This talk will cover how Thunder bridges the gap between ease-of-use and peak performance, and enables teams to easily write custom code transformations to scale models efficiently, reduce GPU waste, and stay in control of their stack.

Speakers

Luca Antiga

CTO, Lightning AI

CTO @ Lightning AI, Founder (Orobix, Tensorwerk), early PyTorch core contributor, Manning Author (Deep Learning with PyTorch). PhD in Bioengineering.

Wednesday May 7, 2025 16:20 - 16:40 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

16:40 CEST

LLM Constrained Generation in PyTorch with Outlines - Rémi Louf, DotTXT

Wednesday May 7, 2025 16:40 - 17:00 CEST

STATION F

Parsing errors, unexpected outputs. If you've felt the frustration of trying to wrangle LLMs into producing consistently formatted results, you've likely built complex post-processing pipelines and elaborate prompting schemes. What if there was a way to guarantee structured outputs without these workarounds? Enter structured outputs.

In this talk, we'll explore how model outputs can be precisely constrained using formal specifications (e.g. JSON Schema), why this dramatically improves reliability, and how it reduces sensitivity to prompt engineering. We'll demonstrate advanced use cases using our open source library Outlines, which add structured outputs the `transformers`, `vllm`, etc inference libraries.

By the end of the session, you'll understand how to implement these techniques in your applications today, enabling your models to generate flawless JSON with minimal latency overhead compared to unconstrained generation.

Speakers

Rémi Louf

DotTXT

Wednesday May 7, 2025 16:40 - 17:00 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

17:00 CEST

Best Practices for Open Multilingual LLM Evaluation - Catherine Arnett, EleutherAI

Wednesday May 7, 2025 17:00 - 17:20 CEST

STATION F

Multilingual language models seem to be getting better, but how do we know? In general, language model evaluation is made more uncertain by automatic evaluations which correlate poorly with human ratings, low-quality datasets, and a lack of reproducibility. But for languages other than high-resource languages like English and Mandarin Chinese, these problems are even more consequential. We provide a set of best practices for using existing evaluations. Given the limited number of evaluations for many languages, we highlight languages and tasks that need more benchmarks and outline key considerations for developing new multilingual benchmarks.

Speakers

Catherine Arnett

NPL Researcher, EleutherAI

Wednesday May 7, 2025 17:00 - 17:20 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

17:20 CEST

To Be Announced

Wednesday May 7, 2025 17:20 - 17:40 CEST

STATION F

Wednesday May 7, 2025 17:20 - 17:40 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

17:40 CEST

PyTorch x Transformers journey: pythonicity, autodiff and modularity defining modern AI - Pablo Montalvo, Hugging Face

Wednesday May 7, 2025 17:40 - 18:00 CEST

STATION F

The HuggingFace Transformers library is a flagship example of what makes PyTorch special: a dynamic, readable, and hackable framework that scales from quick experiments to production-ready architectures. It began as an implementation of BERT, continued to a ""one model, one file"" setup—ideal for iteration—and grew into a modular codebase now defining 315+ models. Transformers has become a reference implementation for the field: a source of truth for model architectures, behaviors, and pretraining conventions. Its evolution reflects PyTorch’s own: grounded in Pythonic values, but pragmatic enough to diverge when needed.

PyTorch’s ecosystem has replaced entire toolchains. Scaling models has become simpler: torch.compile brings compiler-level speedups with minimal code changes, and new abstractions like DTensor offer serious performance gains without the low-level complexity.

Both PyTorch and Transformers inherit Python’s spirit—clarity, flexibility, expressiveness—without being bound by it. PyTorch leans on ATen and C++ kernels under the hood; Transformers increasingly rely on optimized community kernels and hardware-aware implementations from the hub.

Modularity and readability didn’t just improve maintainability—they grew the community. Lowering the barrier to entry encourages experimentation, contributions, and faster innovation. This talk tracks that journey—from how PyTorch enabled Transformers, to how the virtuous cycle of design, performance, and pragmatism continues to shape the tools driving modern AI.

Speakers

Pablo Montalvo

Hugging Face

Wednesday May 7, 2025 17:40 - 18:00 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Session Presentation

18:00 CEST

Social Gather

Wednesday May 7, 2025 18:00 - 22:00 CEST

TBD

Wednesday May 7, 2025 18:00 - 22:00 CEST
TBD

Registration/Breaks/Special Events