In this talk, we will discuss fine-tuning and deploying LLMs for local inference. First, we will discuss the importance of memory-efficient fine-tuning and a couple common architectural and algorithmic techniques to enable fine-tuning on consumer-grade hardware. The second half of the talk will cover challenges in deploying such large models for on-device deployment and some of the techniques such as quantization that make deployment possible.