The Future of Inference for Complex AI Applications

Jake Mannix

Salesforce

TOPIC: Data, Machine Learning and AI

@SCALE SERIES: Data, Machine Learning and AI

TYPE: video

YEAR: 2022

TAGS:

Inference of single-model applications has, in recent years, become a multi-stage process, combining online Feature Stores with specialized model-hosting runtimes like ONNX and Triton. Less simplistic AI applications such as those in chatbots, mixed-mode recommenders, and search engines fold “candidate retrieval” steps into the mix. As these applications get more sophisticated, they typically now require all of feature-hydration, semantic encoding/embedding, candidate selection, and candidate re-ranking (not to mention any pre/post-processing or “format massaging” in between any of these steps). While for batch and streaming applications, composable DAG engines and DSLs have been developed to allow these applications to become arbitrarily deep, in the on-demand/realtime world, DAG engines which are open source / open-specification are few and far between, even though they are sorely needed.

SUBSCRIBE TO @SCALE

← Back