In the past couple of years, the landscape of image and video generation has transformed dramatically. Despite such phenomenal progress, rigorous and holistic evaluation of the generative models continues to suffer. This is primarily due to the multi-faceted and highly subjective nature of the task: the generated image / video should be evaluated not just on overall visual quality and aesthetics, but also on its alignment to the input prompt , originality, lack of propagating stereotypical biases, and several more factors.
In this talk, I’ll give an overview of current metrics, their shortcomings, and the rapid progress in the research community to improve the rigor in evaluation.