Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization
arXiv:2603.16105v1 Announce Type: new Abstract: Post-training model compression is essential for enhancing the portability of Large Language Models (LLMs) while preserving their performance. While several compression approaches have been proposed, less emphasis has been placed on selecting the most suitable set of data (the so-called \emph{calibration data}) for finding the compressed model configuration. The choice of calibration data is a critical step in preserving model capabilities both intra- and inter-tasks. In this work, we address the challenge of identifying high-performance calibration sets for both pruning and quantization by analyzing intrinsic data properties rather than model-specific signals. We introduce \texttt{\textbf{ZipCal}}, a model-agnostic data curation strategy that maximizes lexical diversity based on Zipfian power laws. Experiments demonstrate that our method consistently outperforms standard uniform random sampling across various pruning benchmarks. Notably
arXiv:2603.16105v1 Announce Type: new Abstract: Post-training model compression is essential for enhancing the portability of Large Language Models (LLMs) while preserving their performance. While several compression approaches have been proposed, less emphasis has been placed on selecting the most suitable set of data (the so-called \emph{calibration data}) for finding the compressed model configuration. The choice of calibration data is a critical step in preserving model capabilities both intra- and inter-tasks. In this work, we address the challenge of identifying high-performance calibration sets for both pruning and quantization by analyzing intrinsic data properties rather than model-specific signals. We introduce \texttt{\textbf{ZipCal}}, a model-agnostic data curation strategy that maximizes lexical diversity based on Zipfian power laws. Experiments demonstrate that our method consistently outperforms standard uniform random sampling across various pruning benchmarks. Notably, it also performs on par, in terms of downstream performance, with a state-of-the-art method that relies on model perplexity. The latter becomes prohibitively expensive at large-scale models and datasets, while \texttt{\textbf{ZipCal}} is on average $\sim$240$\times$ faster due to its tractable linear complexity\footnote{We make the code and the experiments available at https://anonymous.4open.science/r/zipcal-71CD/.}.
Executive Summary
This article presents ZipCal, a model-agnostic data curation strategy that leverages Zipfian power laws to maximize lexical diversity, thereby enhancing the performance of Large Language Models (LLMs) in pruning and quantization tasks. The authors demonstrate that ZipCal outperforms standard uniform random sampling and approaches the performance of a state-of-the-art method that relies on model perplexity, at a significantly lower computational cost. The proposed approach addresses a critical gap in post-training model compression, where the selection of calibration data is often overlooked. By introducing a tractable linear complexity, ZipCal offers a promising solution for large-scale models and datasets, with potential applications in various domains, including natural language processing, computer vision, and speech recognition.
Key Points
- ▸ ZipCal is a model-agnostic data curation strategy that utilizes Zipfian power laws to maximize lexical diversity.
- ▸ ZipCal outperforms standard uniform random sampling and approaches the performance of a state-of-the-art method that relies on model perplexity.
- ▸ ZipCal offers a tractable linear complexity, making it suitable for large-scale models and datasets.
Merits
Strength
The proposed method addresses a critical gap in post-training model compression, with a focus on selecting high-performance calibration data for pruning and quantization tasks.
Scalability
ZipCal offers a tractable linear complexity, making it suitable for large-scale models and datasets, which is a significant improvement over existing methods that rely on model perplexity.
Demerits
Limitation
The proposed method may not be directly applicable to other domains or tasks beyond pruning and quantization, requiring further adaptation and evaluation.
Expert Commentary
The article presents a well-structured and well-motivated approach to addressing the critical challenge of selecting high-performance calibration data for post-training model compression. The proposed method leverages Zipfian power laws to maximize lexical diversity, which is a novel and insightful application of this concept. The experimental results demonstrate the effectiveness of ZipCal, and the tractable linear complexity makes it a promising solution for large-scale models and datasets. However, further evaluation and adaptation of the proposed method are necessary to ensure its applicability to other domains and tasks. Additionally, the article raises important questions about the role of model compression in facilitating the deployment of AI models, which has significant implications for policy makers and regulators.
Recommendations
- ✓ Further evaluation and adaptation of the proposed method are necessary to ensure its applicability to other domains and tasks.
- ✓ The article highlights the importance of model compression in facilitating the deployment of AI models, which has significant implications for policy makers and regulators seeking to promote the adoption of AI technologies.