Academic

AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression

Rui Cen, QiangQiang Hu, Hong Huang, Hong Liu, Song Liu, Xin Luo, Lin Niu, Yifan Tan, Decheng Wu, Linchuan Xie, Rubing Yang, Guanghua Yu, Jianchen Zhu · March 1, 2026 · 1 min read · 4 views

#cs.LG #cs.AI

arXiv:2602.21233v1 Announce Type: cross Abstract: This technical report introduces AngelSlim, a comprehensive and versatile toolkit for large model compression developed by the Tencent Hunyuan team. By consolidating cutting-edge algorithms, including quantization, speculative decoding, token pruning, and distillation. AngelSlim provides a unified pipeline that streamlines the transition from model compression to industrial-scale deployment. To facilitate efficient acceleration, we integrate state-of-the-art FP8 and INT8 Post-Training Quantization (PTQ) algorithms alongside pioneering research in ultra-low-bit regimes, featuring HY-1.8B-int2 as the first industrially viable 2-bit large model. Beyond quantization, we propose a training-aligned speculative decoding framework compatible with multimodal architectures and modern inference engines, achieving 1.8x to 2.0x throughput gains without compromising output correctness. Furthermore, we develop a training-free sparse attention framework that reduces Time-to-First-Token (TTFT) in long-context scenarios by decoupling sparse kernels from model architectures through a hybrid of static patterns and dynamic token selection. For multimodal models, AngelSlim incorporates specialized pruning strategies, namely IDPruner for optimizing vision tokens via Maximal Marginal Relevance and Samp for adaptive audio token merging and pruning. By integrating these compression strategies from low-level implementations, AngelSlim enables algorithm-focused research and tool-assisted deployment.

Executive Summary

The article introduces AngelSlim, a comprehensive toolkit for large model compression, developed by the Tencent Hunyuan team. AngelSlim consolidates cutting-edge algorithms, including quantization, speculative decoding, token pruning, and distillation, to streamline the transition from model compression to industrial-scale deployment. The toolkit achieves significant efficiency gains and facilitates the deployment of large models in various applications. With its unified pipeline and state-of-the-art algorithms, AngelSlim has the potential to revolutionize the field of model compression and deployment.

Key Points

▸ AngelSlim is a comprehensive toolkit for large model compression
▸ It consolidates cutting-edge algorithms, including quantization, speculative decoding, token pruning, and distillation
▸ The toolkit achieves significant efficiency gains and facilitates the deployment of large models

Merits

Unified Pipeline

AngelSlim provides a unified pipeline that streamlines the transition from model compression to industrial-scale deployment, making it easier to deploy large models

State-of-the-art Algorithms

The toolkit integrates state-of-the-art FP8 and INT8 Post-Training Quantization (PTQ) algorithms, as well as pioneering research in ultra-low-bit regimes

Demerits

Complexity

The toolkit's complexity may make it challenging for users without extensive technical expertise to fully utilize its features

Limited Customizability

The toolkit's unified pipeline may limit the ability of users to customize the compression process to their specific needs

Expert Commentary

The introduction of AngelSlim represents a significant advancement in the field of model compression, with its unified pipeline and state-of-the-art algorithms providing a streamlined and efficient solution for deploying large models. However, the toolkit's complexity and limited customizability may limit its adoption among certain users. As the field of AI continues to evolve, the development of toolkits like AngelSlim will play a crucial role in facilitating the deployment of large models and driving innovation in AI research and development. Further research is needed to fully explore the potential of AngelSlim and its applications in various fields.

Recommendations

✓ Further research is needed to fully explore the potential of AngelSlim and its applications in various fields
✓ The development of more user-friendly interfaces and customization options could increase the adoption of AngelSlim among a wider range of users

Sources

arXiv - cs.AI

Something extraordinary is coming.

AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression

AI Commentary

Executive Summary

Key Points

Merits

Unified Pipeline

State-of-the-art Algorithms

Demerits

Complexity

Limited Customizability

Expert Commentary

Recommendations

Sources

Related Articles

Budget-Aware Agentic Routing via Boundary-Guided Training

ImpRIF: Stronger Implicit Reasoning Leads to Better Complex Instruction Following

ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision …

Urban Vibrancy Embedding and Application on Traffic Prediction

JCG, PC

HSOLLC Co., Ltd.