Skip to main content
Academic

Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression

arXiv:2602.17063v1 Announce Type: new Abstract: Sub-bit model compression seeks storage below one bit per weight; as magnitudes are aggressively compressed, the sign bit becomes a fixed-cost bottleneck. Across Transformers, CNNs, and MLPs, learned sign matrices resist low-rank approximation and are spectrally indistinguishable from an i.i.d. Rademacher baseline. Despite this apparent randomness, most weights retain their initialization signs; flips primarily occur via rare near-zero boundary crossings, suggesting that sign-pattern randomness is largely inherited from initialization. We formalize this behavior with sign lock-in theory, a stopping-time analysis of sign flips under SGD noise. Under bounded updates and a rare re-entry condition into a small neighborhood around zero, the number of effective sign flips exhibits a geometric tail. Building on this mechanism, we introduce a gap-based initialization and a lightweight outward-drift regularizer, reducing the effective flip rate t

A
Akira Sakai, Yuma Ichikawa
· · 1 min read · 16 views

arXiv:2602.17063v1 Announce Type: new Abstract: Sub-bit model compression seeks storage below one bit per weight; as magnitudes are aggressively compressed, the sign bit becomes a fixed-cost bottleneck. Across Transformers, CNNs, and MLPs, learned sign matrices resist low-rank approximation and are spectrally indistinguishable from an i.i.d. Rademacher baseline. Despite this apparent randomness, most weights retain their initialization signs; flips primarily occur via rare near-zero boundary crossings, suggesting that sign-pattern randomness is largely inherited from initialization. We formalize this behavior with sign lock-in theory, a stopping-time analysis of sign flips under SGD noise. Under bounded updates and a rare re-entry condition into a small neighborhood around zero, the number of effective sign flips exhibits a geometric tail. Building on this mechanism, we introduce a gap-based initialization and a lightweight outward-drift regularizer, reducing the effective flip rate to approximately $10^{-3}$ with only about a one-point increase in perplexity.

Executive Summary

This article introduces the concept of sign lock-in, where the signs of randomly initialized weights in neural networks persist and become a bottleneck for sub-bit model compression. The authors propose a theoretical framework to understand this phenomenon and introduce techniques to reduce the effective sign flip rate, resulting in improved model compression with minimal impact on performance. The study has significant implications for the development of more efficient neural network models.

Key Points

  • Sign lock-in theory explains the persistence of weight signs in neural networks
  • Sign-pattern randomness is largely inherited from initialization
  • Gap-based initialization and outward-drift regularizer reduce effective sign flip rate

Merits

Theoretical Framework

The article provides a rigorous theoretical framework to understand the sign lock-in phenomenon, which can be applied to various neural network architectures.

Demerits

Limited Experimental Evaluation

The article could benefit from more extensive experimental evaluations to demonstrate the effectiveness of the proposed techniques across different models and tasks.

Expert Commentary

The article provides a significant contribution to the understanding of neural network behavior, particularly in the context of model compression. The proposed sign lock-in theory and techniques have the potential to improve the efficiency of neural networks, making them more suitable for a wide range of applications. However, further research is needed to fully explore the implications of sign lock-in and to develop more effective techniques for reducing the effective sign flip rate.

Recommendations

  • Further experimental evaluations to demonstrate the effectiveness of the proposed techniques
  • Investigation of the applicability of sign lock-in theory to other neural network architectures and tasks

Sources