How Pytorch Powers Training & Inference – Efficient Fine-tuning and Inference of Large Language Models

In this talk, we will discuss fine-tuning and deploying LLMs for local inference. First, we will discuss the importance of memory-efficient fine-tuning and a couple common architectural and algorithmic techniques to enable fine-tuning on consumer-grade hardware. The second half of the talk will cover challenges in deploying such large models for on-device deployment and some of the techniques such as quantization that make deployment possible.


To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy