From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models
arXiv:2603.04828v1 Announce Type: new Abstract: Pre-training data detection for LLMs is essential for addressing copyright concerns and mitigating benchmark contamination. Existing methods mainly focus on …
Ruiqi Zhang, Lingxiang Wang, Hainan Zhang, Zhiming Zheng, Yanyan Lan
3 views