Academic

NativeTernary: A Self-Delimiting Binary Encoding with Unary Run-Length Hierarchy Markers for Ternary Neural Network Weights, Structured Data, and General Computing Infrastructure

arXiv:2604.03336v1 Announce Type: new Abstract: BitNet b1.58 (Ma et al., 2024) demonstrates that large language models can operate entirely on ternary weights {-1, 0, +1}, yet no native binary wire format exists for such models. NativeTernary closes this gap. We present NativeTernary, a binary encoding scheme that partitions the 2-bit pair space into three data symbols representing ternary values -- either balanced {-1, 0, +1} or unsigned {0, 1, 2} -- and a reserved structural delimiter. The central contribution is the use of unary run-length encoding to represent semantic hierarchy depth: a sequence of N consecutive delimiter pairs denotes a boundary of level N, encoding character, word, sentence, paragraph, and topic boundaries at cost 2, 4, 6, 8, and 10 bits respectively -- proportional to boundary rarity. The choice of which 2-bit pair serves as the delimiter is a design parameter: {11} is the primary embodiment, offering simple OR-gate detection; {00} is an alternative embodiment

M
Maharshi Savdhariya
· · 1 min read · 15 views

arXiv:2604.03336v1 Announce Type: new Abstract: BitNet b1.58 (Ma et al., 2024) demonstrates that large language models can operate entirely on ternary weights {-1, 0, +1}, yet no native binary wire format exists for such models. NativeTernary closes this gap. We present NativeTernary, a binary encoding scheme that partitions the 2-bit pair space into three data symbols representing ternary values -- either balanced {-1, 0, +1} or unsigned {0, 1, 2} -- and a reserved structural delimiter. The central contribution is the use of unary run-length encoding to represent semantic hierarchy depth: a sequence of N consecutive delimiter pairs denotes a boundary of level N, encoding character, word, sentence, paragraph, and topic boundaries at cost 2, 4, 6, 8, and 10 bits respectively -- proportional to boundary rarity. The choice of which 2-bit pair serves as the delimiter is a design parameter: {11} is the primary embodiment, offering simple OR-gate detection; {00} is an alternative embodiment optimised for ultra-low-power CMOS systems, minimising switching activity. All four bit-pair choices are covered by the patent claims. We present three encoding variants: (1) the primary scheme with {11} as sole delimiter; (2) a dual-starter variant where both {10} and {11} initiate distinct symbol namespaces; and (3) an analysis of unsigned versus balanced ternary data mappings. We describe a path toward ternary-native general computing infrastructure requiring no hardware changes, and outline applications spanning ternary neural network weight storage, hierarchical natural language encoding, edge computing, IoT and satellite telemetry, industrial sensors, automotive systems, medical devices, gaming, and financial tick data. The decoder is a 10-line stateless state machine resilient to bitstream corruption.

Executive Summary

This article presents NativeTernary, a binary encoding scheme that enables the efficient representation of ternary neural network weights and structured data. By utilizing unary run-length encoding, NativeTernary achieves a high degree of compression while preserving semantic hierarchy depth. The scheme is adaptable to various applications, including edge computing, IoT, and satellite telemetry. With its flexible delimiter choice and decoder resilience to bitstream corruption, NativeTernary offers a promising solution for ternary-native general computing infrastructure. The authors demonstrate the scheme's feasibility and potential for widespread adoption, highlighting its applications in various industries.

Key Points

  • NativeTernary is a binary encoding scheme that partitions the 2-bit pair space into three data symbols and a reserved structural delimiter.
  • The scheme utilizes unary run-length encoding to represent semantic hierarchy depth.
  • NativeTernary offers flexible delimiter choice and decoder resilience to bitstream corruption.
  • The scheme is adaptable to various applications, including edge computing, IoT, and satellite telemetry.

Merits

Efficient compression

NativeTernary achieves high compression ratios while preserving semantic hierarchy depth.

Flexibility

The scheme offers a flexible delimiter choice, making it adaptable to various applications.

Resilience

The decoder is resilient to bitstream corruption, ensuring reliable decoding.

Demerits

Patent implications

The article mentions patent claims, which may limit the scheme's widespread adoption and raise intellectual property concerns.

Limited experimentation

The article focuses on theoretical descriptions and simulations, with limited experimental validation of the scheme's performance and scalability.

Expert Commentary

NativeTernary presents a promising solution for the efficient representation and storage of ternary neural network weights and structured data. While the scheme's flexibility and resilience to bitstream corruption are notable strengths, the patent implications and limited experimentation raise concerns about its widespread adoption. Theoretical descriptions and simulations provide a solid foundation, but experimental validation is necessary to fully demonstrate the scheme's performance and scalability. The authors' vision for ternary-native general computing infrastructure is ambitious, highlighting the potential for NativeTernary to transform various industries, including edge computing, IoT, and satellite telemetry.

Recommendations

  • Further experimentation and validation of NativeTernary's performance and scalability are necessary to demonstrate its feasibility and potential for widespread adoption.
  • The authors should engage with industry stakeholders and standardization bodies to ensure the scheme's adoption and integration into existing systems and protocols.

Sources

Original: arXiv - cs.LG