Academic

CAWN: Continuous Acoustic Wave Networks for Autoregressive Language Modeling

arXiv:2604.04250v1 Announce Type: new Abstract: Modern Large Language Models (LLMs) rely on Transformer self-attention, which scales quadratically with sequence length. Recent linear-time alternatives, like State Space Models (SSMs), often suffer from signal degradation over extended contexts. We introduce the Continuous Acoustic Wave Network (CAWN), a fully continuous sequence-mixing architecture. Instead of discrete matrix-based attention, CAWN projects hidden states into multi-headed complex-domain phasors, achieving sequence mixing through a causal, $O(L)$ Phase Accumulation mechanism. To prevent signal degradation over ultra-long contexts, we introduce a dual-gated Selective Phase Resonance mechanism incorporating Frequency-Dependent Retention, Hard-Threshold Gating via Straight-Through Estimation, and a Temporal Syntax Cache to capture short-term local dependencies. We also replace standard dense linear projections with Depth-wise Harmonic Convolutions for optimal spatial freque

D
Dejan \v{C}ugalj, Aleksandar Jevremovic
· · 1 min read · 5 views

arXiv:2604.04250v1 Announce Type: new Abstract: Modern Large Language Models (LLMs) rely on Transformer self-attention, which scales quadratically with sequence length. Recent linear-time alternatives, like State Space Models (SSMs), often suffer from signal degradation over extended contexts. We introduce the Continuous Acoustic Wave Network (CAWN), a fully continuous sequence-mixing architecture. Instead of discrete matrix-based attention, CAWN projects hidden states into multi-headed complex-domain phasors, achieving sequence mixing through a causal, $O(L)$ Phase Accumulation mechanism. To prevent signal degradation over ultra-long contexts, we introduce a dual-gated Selective Phase Resonance mechanism incorporating Frequency-Dependent Retention, Hard-Threshold Gating via Straight-Through Estimation, and a Temporal Syntax Cache to capture short-term local dependencies. We also replace standard dense linear projections with Depth-wise Harmonic Convolutions for optimal spatial frequency mixing, augmented by Block Attention Residuals for depth-wise state routing. Scaled to a 150M-parameter model, CAWN utilizes custom Triton kernels for hardware-efficient, true-complex phase accumulation in float32. Trained via a continuous streaming loop on a 100-Billion-token corpus, the prototype is evaluated at a 5-Billion-token milestone. Empirical evaluations via a Targeted Semantic Retrieval protocol demonstrate robust vocabulary acquisition and extended explicitly learned contextual denoising. By leveraging $O(1)$ state-passing via chunked prefill, the model retrieves targeted information across 2,000,000 tokens while strictly plateauing at 8.72 GB of Peak VRAM, empirically overcoming the $O(L^2)$ context memory wall.

Executive Summary

The article introduces the Continuous Acoustic Wave Network (CAWN), a novel approach to autoregressive language modeling that addresses the quadratic scalability issue of Transformer self-attention. By leveraging a causal, linear-time Phase Accumulation mechanism and a dual-gated Selective Phase Resonance mechanism, CAWN achieves efficient sequence mixing and prevents signal degradation in ultra-long contexts. The model is evaluated on a 100-Billion-token corpus and demonstrates robust vocabulary acquisition, extended contextual denoising, and efficient memory usage. This breakthrough has significant implications for the development of large language models, enabling the processing of vast amounts of data and the acquisition of complex contextual knowledge.

Key Points

  • CAWN introduces a novel Phase Accumulation mechanism for linear-time sequence mixing
  • Dual-gated Selective Phase Resonance mechanism prevents signal degradation in ultra-long contexts
  • Efficient memory usage achieved through chunked prefill and Depth-wise Harmonic Convolutions

Merits

Strength in addressing Transformer scalability

CAWN's Phase Accumulation mechanism and Selective Phase Resonance mechanism efficiently address the quadratic scalability issue of Transformer self-attention, enabling the processing of vast amounts of data and the acquisition of complex contextual knowledge.

Robust vocabulary acquisition and contextual denoising

CAWN demonstrates robust vocabulary acquisition and extended contextual denoising, which is essential for real-world applications of language models.

Demerits

Limited evaluation dataset

The evaluation of CAWN is limited to a 100-Billion-token corpus, which may not be representative of the diverse range of real-world datasets.

Computational requirements

The computational requirements of CAWN, including the use of custom Triton kernels and float32 arithmetic, may be a limitation for widespread adoption.

Expert Commentary

The introduction of CAWN is a significant breakthrough in the field of autoregressive language modeling. By leveraging a novel Phase Accumulation mechanism and a dual-gated Selective Phase Resonance mechanism, CAWN achieves efficient sequence mixing and prevents signal degradation in ultra-long contexts. This breakthrough has the potential to significantly impact the development of large language models, enabling the processing of vast amounts of data and the acquisition of complex contextual knowledge. However, the limited evaluation dataset and computational requirements of CAWN are notable limitations that require further investigation.

Recommendations

  • Further evaluation of CAWN on a diverse range of real-world datasets
  • Investigation of the computational requirements of CAWN and potential optimizations for widespread adoption

Sources

Original: arXiv - cs.CL