No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Language Models
arXiv:2603.03203v1 Announce Type: new Abstract: CDD, or Contamination Detection via output Distribution, identifies data contamination by measuring the peakedness of a model's sampled outputs. We study the conditions under which this approach succeeds and fails on small language models ranging from 70M to 410M parameters. Using controlled contamination experiments on GSM8K, HumanEval, and MATH, we find that CDD's effectiveness depends critically on whether fine-tuning produces verbatim memorization. With low-rank adaptation, models can learn from contaminated data without memorizing it, and CDD performs at chance level even when the data is verifiably contaminated. Only when fine-tuning capacity is sufficient to induce memorization does CDD recover strong detection accuracy. Our results characterize a memorization threshold that governs detectability and highlight a practical consideration: parameter-efficient fine-tuning can produce contamination that output-distribution methods do n
arXiv:2603.03203v1 Announce Type: new Abstract: CDD, or Contamination Detection via output Distribution, identifies data contamination by measuring the peakedness of a model's sampled outputs. We study the conditions under which this approach succeeds and fails on small language models ranging from 70M to 410M parameters. Using controlled contamination experiments on GSM8K, HumanEval, and MATH, we find that CDD's effectiveness depends critically on whether fine-tuning produces verbatim memorization. With low-rank adaptation, models can learn from contaminated data without memorizing it, and CDD performs at chance level even when the data is verifiably contaminated. Only when fine-tuning capacity is sufficient to induce memorization does CDD recover strong detection accuracy. Our results characterize a memorization threshold that governs detectability and highlight a practical consideration: parameter-efficient fine-tuning can produce contamination that output-distribution methods do not detect. Our code is available at https://github.com/Sela-Omer/Contamination-Detection-Small-LM
Executive Summary
The study 'No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Language Models' examines the limitations of Contamination Detection via output Distribution (CDD) in identifying data contamination in small language models. The authors conduct controlled experiments to investigate the conditions under which CDD succeeds and fails. They find that CDD's effectiveness depends on whether fine-tuning produces verbatim memorization, and that low-rank adaptation can produce contamination that CDD fails to detect. The study highlights a memorization threshold that governs detectability and offers practical implications for model fine-tuning. This research emphasizes the importance of understanding the limitations of CDD and the need for parameter-efficient fine-tuning to prevent contamination.
Key Points
- ▸ CDD's effectiveness depends on whether fine-tuning produces verbatim memorization
- ▸ Low-rank adaptation can produce contamination that CDD fails to detect
- ▸ A memorization threshold governs detectability
Merits
Strength
The study provides a comprehensive analysis of CDD's limitations and the conditions under which it succeeds and fails. The controlled experiments and experimental design are well-suited to investigating the relationships between fine-tuning, memorization, and detectability.
Demerits
Limitation
The study focuses primarily on small language models, which may not generalize to larger models or other domains. Additionally, the assessment of CDD's performance is limited to a single task and dataset.
Expert Commentary
The study provides a critical examination of the limitations of CDD and highlights the importance of understanding the relationships between fine-tuning, memorization, and detectability. The findings have significant implications for the development and deployment of language models and emphasize the need for robust methods to detect and prevent data contamination. However, the study's focus on small language models and limited assessment of CDD's performance may limit its generalizability. Future research should investigate these limitations and explore alternative approaches to contamination detection.
Recommendations
- ✓ Further research is needed to investigate the generalizability of the study's findings to larger models and other domains.
- ✓ Alternative approaches to contamination detection should be explored, particularly those that do not rely on output distribution-based methods.