ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning
arXiv:2603.05878v1 Announce Type: new Abstract: Pruning is widely recognized as an effective method for reducing the parameters of large language models (LLMs), potentially leading to more efficient deployment and inference. One classic and prominent path of LLM one-shot pruning is to leverage second-order gradients (i.e., Hessian), represented by the pioneering work SparseGPT. However, the predefined left-to-right pruning order in SparseGPT leads to suboptimal performance when the weights exhibit columnar patterns. This paper studies the effect of pruning order under the SparseGPT framework. The analyses lead us to propose ROSE, a reordered SparseGPT method that prioritizes weights with larger potential pruning errors to be pruned earlier. ROSE first performs pre-pruning to identify candidate weights for removal, and estimates both column and block pruning loss. Subsequently, two-level reordering is performed: columns within each block are reordered in descending order of column loss
arXiv:2603.05878v1 Announce Type: new Abstract: Pruning is widely recognized as an effective method for reducing the parameters of large language models (LLMs), potentially leading to more efficient deployment and inference. One classic and prominent path of LLM one-shot pruning is to leverage second-order gradients (i.e., Hessian), represented by the pioneering work SparseGPT. However, the predefined left-to-right pruning order in SparseGPT leads to suboptimal performance when the weights exhibit columnar patterns. This paper studies the effect of pruning order under the SparseGPT framework. The analyses lead us to propose ROSE, a reordered SparseGPT method that prioritizes weights with larger potential pruning errors to be pruned earlier. ROSE first performs pre-pruning to identify candidate weights for removal, and estimates both column and block pruning loss. Subsequently, two-level reordering is performed: columns within each block are reordered in descending order of column loss, while blocks are reordered based on block loss. We introduce the relative range of block loss as a metric to identify columnar layers, enabling adaptive reordering across the entire model. Substantial empirical results on prevalent LLMs (LLaMA2-7B/13B/70B, LLaMA3-8B, Mistral-7B) demonstrate that ROSE surpasses the original SparseGPT and other counterpart pruning methods. Our code is available at https://github.com/mingluo-su/ROSE.
Executive Summary
The article introduces ROSE, a novel reordering mechanism for SparseGPT-style one-shot pruning of large language models, addressing a critical limitation in the conventional left-to-right pruning order. By identifying and prioritizing weights with higher potential pruning errors early in the process—using pre-pruning to estimate column and block loss and implementing a two-level reordering system—ROSE enhances pruning efficiency and accuracy. Empirical validation across multiple LLMs demonstrates superior performance relative to SparseGPT and other contemporary methods. This work advances the field by introducing a data-driven, adaptive reordering framework that better accommodates columnar weight patterns, improving deployment efficiency without compromising model quality.
Key Points
- ▸ ROSE reorders pruning order to prioritize high-error weights earlier
- ▸ Two-level reordering (columns within blocks, blocks overall) improves accuracy
- ▸ Empirical results show ROSE outperforms SparseGPT on major LLMs
Merits
Innovation
ROSE introduces a novel, empirically validated adaptive reordering mechanism that directly addresses a known bottleneck in SparseGPT pruning.
Demerits
Scope Limitation
The study focuses on specific LLMs and may not generalize to all architectures or training paradigms without further validation.
Expert Commentary
The paper presents a compelling advancement in the field of LLM pruning by recognizing and rectifying a fundamental flaw in the original SparseGPT approach: the assumption that a fixed left-to-right pruning order is optimal. The authors’ insight into the impact of columnar weight patterns—and their solution via a reordering mechanism that dynamically prioritizes error-prone weights—is both technically sound and practically significant. The use of the relative range of block loss as a diagnostic metric for identifying columnar layers is particularly elegant and demonstrates a sophisticated understanding of the underlying mathematical structure. Moreover, the empirical validation across diverse model sizes (from 7B to 70B parameters) strengthens the generalizability claims. This work not only fills a gap in the literature but also sets a new benchmark for how pruning order is conceptualized in one-shot methods. The open-source code availability further enhances reproducibility and impact.
Recommendations
- ✓ Adopt ROSE as a default reordering strategy in academic and industry one-shot pruning projects for LLMs.
- ✓ Extend ROSE to evaluate performance on additional architectures (e.g., Transformer variants, quantized models) to validate broader applicability.