Loading…
PyTorch Day France
7 May 2025 | Paris, France
View More Details & Registration

The Sched app allows you to build your schedule but is separate from your event registration. Please visit the GOSIM AI Paris Website registration page for more details.

This schedule is automatically displayed in Central European Summer Time. To see the schedule in your preferred timezone, select from the drop-down menu located at the bottom of the menu to the right.
Wednesday May 7, 2025 11:30 - 11:50 CEST
vLLM has become the community-standard engine for low-latency LLM inference, achieving a 10× increase in usage in 2024 and surpassing 100,000 daily installs by January 2025. Supported by hundreds of contributors and productized through Red Hat AI, vLLM provides a vendor-neutral solution for serving cutting-edge models at scale. This talk outlines a practical blueprint for scaling LLM inference using vLLM, integrating both system-level techniques and model-level optimizations.

We begin by addressing the challenges of deploying LLMs with chain-of-thought reasoning in production. Leveraging vLLM’s engine architecture, multi-accelerator deployments using tensor parallelism, paged attention scheduling, and prefill–decode disaggregation demonstrate how a single node can efficiently drive multiple AI accelerators, enhancing throughput without compromising latency.

The second optimization layer focuses on quantization. Based on over 500,000 evaluations across language and vision-language models, we examine the accuracy–speed trade-offs of weight and activation quantization. We introduce new pathways that significantly reduce memory usage while maintaining model quality. Attendees will leave with data-driven insights and ready-to-use configurations for deploying state-of-the-art quantized models in scalable enterprise inference pipelines.
Speakers
Wednesday May 7, 2025 11:30 - 11:50 CEST
STATION F 5 Parv. Alan Turing, 75013 Paris, France

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link