SEPTEMBER 28, 2022

The Future of Inference for Complex AI Applications

Inference of single-model applications has, in recent years, become a multi-stage process, combining online Feature Stores with specialized model-hosting runtimes like ONNX and Triton. Less simplistic AI applications such as those in chatbots, mixed-mode recommenders, and search engines fold “candidate retrieval” steps into the mix. As these applications get more sophisticated, they typically now require all of feature-hydration, semantic encoding/embedding, candidate selection, and candidate re-ranking (not to mention any pre/post-processing or “format massaging” in between any of these steps). While for batch and streaming applications, composable DAG engines and DSLs have been developed to allow these applications to become arbitrarily deep, in the on-demand/realtime world, DAG engines which are open source / open-specification are few and far between, even though they are sorely needed.

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy