Using Weighted Sampling to Understand the Prevalence of Spam

To effectively fight spam, we need an unbiased estimate of how much bad content there is in the ecosystem and where it resides. In this presentation we discuss sampling schemes to identify the small percentage of bad content viewed from both user generated content and commercially-motivated content such as ads and sponsored posts. These methods specifically employ ML-derived classifiers to weight the sampling, increasing the volume of bad content in the samples. With more bad content we are able to segment it further, allowing us to measure the prevalence of bad material in certain segments, or as identified by certain policies.

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy