Spam Fighting 2016

Using Weighted Sampling to Understand the Prevalence of Spam

Share

To effectively fight spam, we need an unbiased estimate of how much bad content there is in the ecosystem and where it resides. In this presentation we discuss sampling schemes to identify the small percentage of bad content viewed from both user generated content and commercially-motivated content such as ads and sponsored posts. These methods specifically employ ML-derived classifiers to weight the sampling, increasing the volume of bad content in the samples. With more bad content we are able to segment it further, allowing us to measure the prevalence of bad material in certain segments, or as identified by certain policies.

Related Topics

Join the @Scale Mailing List and Get the Latest News & Event Info

Code of Conduct