Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling
arXiv:2602.18518v1 Announce Type: new Abstract: Content safety teams need metrics that reflect what users actually experience, not only what is reported. We study prevalence: the …
Attila Dobi, Aravindh Manickavasagam, Benjamin Thompson, Xiaohan Yang, Faisal Farooq
5 views