Academic

Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats

arXiv:2602.12635v1 Announce Type: new Abstract: As LLMs scale, low-bit floating-point formats like MXFP and NVFP4 offer new opportunities for precision and efficiency. In this work, we evaluate HiFloat (HiF8 and HiF4), a family of formats tailored for Ascend NPUs. Through rigorous comparison across weight-activation and KV-cache tasks, we provide three key insights: (1) INT8 suits narrow-range data, while floating-point formats excel with high-variance data; (2) in 4-bit regimes, HiF4's hierarchical scaling prevents the accuracy collapse seen in integer formats; and (3) HiFloat is fully compatible with state-of-the-art post-training quantization frameworks. Overall, HiFloat provides a solution for high-efficiency LLM inference on NPUs.

Pengxiang Zhao, Hui-Ling Zhen, Xing Li, Han Bao, Weizhe Lin, Zhiyuan Yang, Ziwei Yu, Xin Wang, Mingxuan Yuan, Xianzhi Yu, Zhenhua Dong · March 7, 2026 · 1 min read · 16 views

#cs.CL #cs.AI #cs.LG

Executive Summary

The article 'Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats' explores the efficacy of HiFloat formats (HiF8 and HiF4) for low-bit floating-point inference on Ascend NPUs. The study compares these formats with traditional integer formats like INT8 across various tasks, highlighting the strengths of floating-point formats in handling high-variance data and the superiority of HiF4 in preventing accuracy collapse in 4-bit regimes. The research also emphasizes the compatibility of HiFloat with existing post-training quantization frameworks, positioning it as a viable solution for efficient LLM inference on NPUs.

Key Points

▸ INT8 is suitable for narrow-range data, while floating-point formats excel with high-variance data.
▸ HiF4's hierarchical scaling prevents accuracy collapse in 4-bit regimes, unlike integer formats.
▸ HiFloat is compatible with state-of-the-art post-training quantization frameworks.

Merits

Comprehensive Evaluation

The study provides a thorough comparison of HiFloat formats against traditional integer formats, offering valuable insights into their performance across different tasks.

Practical Relevance

The findings are directly applicable to the deployment of large language models (LLMs) on Ascend NPUs, addressing the need for efficient and accurate inference.

Technical Rigor

The research is methodologically sound, employing rigorous evaluation techniques to validate the performance of HiFloat formats.

Demerits

Limited Scope

The study focuses primarily on Ascend NPUs, which may limit the generalizability of the findings to other hardware platforms.

Potential Bias

The evaluation is conducted by the developers of HiFloat, which could introduce a bias in favor of the formats being studied.

Complexity

The hierarchical scaling mechanism of HiF4, while effective, adds complexity to the implementation and may require additional computational resources.

Expert Commentary

The article presents a significant advancement in the field of low-bit inference on NPUs, particularly for Ascend hardware. The comprehensive evaluation of HiFloat formats provides a robust framework for understanding their advantages over traditional integer formats. The study's findings are particularly noteworthy for their practical implications, as they address the critical need for efficient and accurate inference in large language models. However, the focus on Ascend NPUs limits the generalizability of the results, and the potential bias introduced by the developers of HiFloat warrants further independent validation. The hierarchical scaling mechanism of HiF4, while effective, adds complexity that may not be feasible for all applications. Overall, the study contributes valuable insights to the ongoing efforts in model compression and hardware-specific optimizations, paving the way for more efficient deployment of large language models.

Recommendations

✓ Further independent studies should be conducted to validate the performance of HiFloat formats across different hardware platforms.
✓ Researchers should explore the scalability and implementation complexity of HiF4's hierarchical scaling mechanism to assess its feasibility in various applications.

Sources

arXiv - cs.CL

Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive Evaluation

Practical Relevance

Technical Rigor

Demerits

Limited Scope

Potential Bias

Complexity

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs