Optimizing and scaling LLM inference is crucial for enabling large-scale product applications at reasonable cost. This presentation will introduce key parallelism techniques that help scale model sizes and context windows, which in turn influence inference system designs. Additionally, we will discuss practical challenges associated with deploying these complex serving paradigms throughout our internal cloud to our data center of heterogeneous hardware, including the need for multi-faceted trade-offs when facing large-scale and dynamic real-world loads.