How Pytorch Powers Training & Inference – Efficient Fine-tuning and Inference of Large Language Models

Kimish Patel

Evan Smothers

TOPIC: Machine Learning and AI

@SCALE SERIES: AI Infra @Scale

TYPE: video

YEAR: 2024

TAGS:

In this talk, we will discuss fine-tuning and deploying LLMs for local inference. First, we will discuss the importance of memory-efficient fine-tuning and a couple common architectural and algorithmic techniques to enable fine-tuning on consumer-grade hardware. The second half of the talk will cover challenges in deploying such large models for on-device deployment and some of the techniques such as quantization that make deployment possible.