Academic

ActTail: Global Activation Sparsity in Large Language Models

arXiv:2603.12272v1 Announce Type: new Abstract: Activation sparsity is a promising approach for accelerating large language model (LLM) inference by reducing computation and memory movement. However, existing activation sparsity methods typically apply uniform sparsity across projections, ignoring the heterogeneous statistical properties of Transformer weights and thereby amplifying performance degradation. In this paper, we propose ActTail, a TopK magnitude-based activation sparsity method with global activation sparsity allocation grounded in Heavy-Tailed Self-Regularization (HT-SR) theory. Specifically, we capture this heterogeneity via the heavy-tail exponent computed from each projection's empirical spectral density (ESD), which is used as a quantitative indicator to assign projection-specific sparsity budgets. Importantly, we provide a theoretical analysis that establishes an explicit relationship between the activation sparsity ratio and the heavy-tail exponent under the HT-SR

W
Wenwen Hou, Xinyuan Song, Shiwei Liu
· · 1 min read · 13 views

arXiv:2603.12272v1 Announce Type: new Abstract: Activation sparsity is a promising approach for accelerating large language model (LLM) inference by reducing computation and memory movement. However, existing activation sparsity methods typically apply uniform sparsity across projections, ignoring the heterogeneous statistical properties of Transformer weights and thereby amplifying performance degradation. In this paper, we propose ActTail, a TopK magnitude-based activation sparsity method with global activation sparsity allocation grounded in Heavy-Tailed Self-Regularization (HT-SR) theory. Specifically, we capture this heterogeneity via the heavy-tail exponent computed from each projection's empirical spectral density (ESD), which is used as a quantitative indicator to assign projection-specific sparsity budgets. Importantly, we provide a theoretical analysis that establishes an explicit relationship between the activation sparsity ratio and the heavy-tail exponent under the HT-SR regime, offering principled guidance for sparsity allocation beyond heuristic design. Experiments on LLaMA and Mistral models show that our method improves both perplexity and downstream task performance at high sparsity compared to uniform allocation. At 80% sparsity, perplexity is reduced by 21.8% on LLaMA-2-7B, 40.1% on LLaMA-2-13B, and 9.4% on Mistral-7B.

Executive Summary

ActTail, a novel activation sparsity method, presents a promising approach to accelerate large language model inference by leveraging Heavy-Tailed Self-Regularization (HT-SR) theory. By capturing the heterogeneous statistical properties of Transformer weights, ActTail allocates sparsity budgets based on the heavy-tail exponent computed from each projection's empirical spectral density. This method demonstrates improved perplexity and downstream task performance compared to uniform allocation in experiments on LLaMA and Mistral models. ActTail's theoretical analysis provides principled guidance for sparsity allocation, overcoming the limitations of existing methods that amplify performance degradation. The contributions of ActTail have far-reaching implications for the acceleration of large language models.

Key Points

  • ActTail proposes a TopK magnitude-based activation sparsity method with global activation sparsity allocation grounded in HT-SR theory.
  • The approach captures the heterogeneous statistical properties of Transformer weights for improved sparsity allocation.
  • Experiments demonstrate improved perplexity and downstream task performance compared to uniform allocation.

Merits

Strength in Theoretical Analysis

ActTail provides a principled theoretical framework for sparsity allocation, offering an explicit relationship between the activation sparsity ratio and the heavy-tail exponent.

Improved Performance

ActTail demonstrates improved perplexity and downstream task performance compared to uniform allocation in experiments on LLaMA and Mistral models.

Demerits

Limited Generalizability

The method's performance and applicability may be limited to specific language models and datasets, requiring further evaluation and adaptation.

Expert Commentary

ActTail represents a crucial step forward in the development of efficient large language models, addressing the pressing need for acceleration in NLP applications. By leveraging HT-SR theory and capturing the heterogeneous properties of Transformer weights, ActTail's sparsity allocation method offers a principled approach to reducing computation and memory movement. The method's improved performance and theoretical foundations make it an attractive solution for industry and academia alike. As the field continues to evolve, ActTail's contributions will likely inspire further research and innovation in efficient neural network design and large language model acceleration.

Recommendations

  • Future research should focus on adapting ActTail to other neural network architectures and exploring its application in edge computing and IoT scenarios.
  • Industry and academia should collaborate to develop and deploy ActTail and similar methods, promoting the adoption of more efficient large language models in real-world applications.

Sources