NOVEMBER 03, 2016

Using Weighted Sampling to Understand the Prevalence of Spam

David Radburn-Smith

Facebook

Emanuel Strauss

Facebook

TOPIC: Fighting Abuse and Security

@SCALE SERIES: Fighting Abuse @Scale

TYPE: video

YEAR: 2016

TAGS: spamfighting

To effectively fight spam, we need an unbiased estimate of how much bad content there is in the ecosystem and where it resides. In this presentation we discuss sampling schemes to identify the small percentage of bad content viewed from both user generated content and commercially-motivated content such as ads and sponsored posts. These methods specifically employ ML-derived classifiers to weight the sampling, increasing the volume of bad content in the samples. With more bad content we are able to segment it further, allowing us to measure the prevalence of bad material in certain segments, or as identified by certain policies.

SUBSCRIBE TO @SCALE

TOPICS

Data, Systems and Networking Dev Tools and Ops, Privacy, Sustainability and Performance Fighting Abuse and Security Machine Learning and AI Mobile, Video and Web