Academic

Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space

arXiv:2604.05030v1 Announce Type: new Abstract: We present Phase-Associative Memory (PAM), a recurrent sequence model in which all representations are complex-valued, associations accumulate in a matrix state $S_{t}$ $\in$ $\mathbb{C}^{d \times d}$ via outer products, and retrieval operates through the conjugate inner product $K_t^* \cdot Q_t / \sqrt{d}$. At $\sim$100M parameters on WikiText-103, PAM reaches validation perplexity 30.0, within $\sim$10\% of a matched transformer (27.1) trained under identical conditions, despite $4\times$ arithmetic overhead from complex computation and no custom kernels. We trace the experimental path from vector-state models, where holographic binding fails due to the $O(1/\sqrt{n})$ capacity degradation of superposed associations, to the matrix state that resolves it. The competitiveness of an architecture whose native operations are complex-valued superposition and conjugate retrieval is consistent with recent empirical evidence that semantic inter

G
Gowrav Vishwakarma, Christopher J. Agostino
· · 1 min read · 4 views

arXiv:2604.05030v1 Announce Type: new Abstract: We present Phase-Associative Memory (PAM), a recurrent sequence model in which all representations are complex-valued, associations accumulate in a matrix state $S_{t}$ $\in$ $\mathbb{C}^{d \times d}$ via outer products, and retrieval operates through the conjugate inner product $K_t^* \cdot Q_t / \sqrt{d}$. At $\sim$100M parameters on WikiText-103, PAM reaches validation perplexity 30.0, within $\sim$10\% of a matched transformer (27.1) trained under identical conditions, despite $4\times$ arithmetic overhead from complex computation and no custom kernels. We trace the experimental path from vector-state models, where holographic binding fails due to the $O(1/\sqrt{n})$ capacity degradation of superposed associations, to the matrix state that resolves it. The competitiveness of an architecture whose native operations are complex-valued superposition and conjugate retrieval is consistent with recent empirical evidence that semantic interpretation in both humans and large language models exhibits non-classical contextuality, and we discuss what this implies for the choice of computational formalism in language modeling.

Executive Summary

The paper introduces Phase-Associative Memory (PAM), a novel recurrent sequence model leveraging complex-valued representations to model associations via matrix states and conjugate inner product retrieval. Tested on WikiText-103 with ~100M parameters, PAM achieves a validation perplexity of 30.0—within 10% of a matched transformer (27.1)—despite a 4× arithmetic overhead from complex operations. The authors trace the evolution from vector-state models with holographic binding limitations to matrix-state solutions, arguing that PAM’s performance challenges classical computational formalisms in light of empirical evidence suggesting non-classical contextuality in human and machine language semantics. The work bridges cognitive science, quantum-inspired computation, and deep learning, offering a provocative lens on sequence modeling.

Key Points

  • Phase-Associative Memory (PAM) is a recurrent sequence model using complex-valued representations, matrix states (Sₜ ∈ ℂᵈˣᵈ), and conjugate inner product retrieval.
  • PAM achieves competitive perplexity (30.0) on WikiText-103 relative to a transformer (27.1) under identical conditions, despite a 4× computational overhead from complex arithmetic.
  • The paper critiques vector-state holographic binding models (O(1/√n) capacity degradation) and proposes matrix states as a solution, linking PAM’s design to emerging evidence of non-classical contextuality in language processing.

Merits

Theoretical Novelty

PAM introduces a mathematically rigorous framework combining complex-valued associative memory with recurrent sequence modeling, offering a fresh alternative to attention-based architectures.

Empirical Competitiveness

Despite computational inefficiencies, PAM matches transformer performance, suggesting that complex-valued operations may encode semantic relationships more effectively than classical formalisms.

Interdisciplinary Synthesis

The paper bridges cognitive science (non-classical contextuality), quantum-inspired computation, and deep learning, providing a unifying perspective on language modeling.

Demerits

Computational Overhead

The 4× arithmetic overhead from complex operations may limit scalability, particularly in low-latency or resource-constrained environments.

Lack of Ablation Studies

The paper does not systematically compare PAM to other associative memory variants or conduct hyperparameter sweeps to isolate the impact of complex-valued operations.

Unclear Generalizability

Performance is demonstrated only on WikiText-103; broader evaluation across languages, domains, and model scales is needed to assess robustness.

Expert Commentary

The authors’ argument for non-classical computational formalisms in language modeling is compelling but warrants cautious scrutiny. While PAM’s performance is impressive, the reliance on complex-valued operations—historically niche in deep learning—introduces computational and theoretical challenges. The paper’s linkage to quantum contextuality is intriguing, but it risks overstating the implications without more direct evidence connecting PAM’s mechanics to cognitive or quantum phenomena. That said, the matrix-state approach to associative memory is a significant advancement over vector-state models, and the competitive perplexity suggests that complex-valued superposition may indeed capture semantic relationships more faithfully than classical embeddings. This work could catalyze a paradigm shift, but further research is needed to disentangle the contributions of complex arithmetic from the architectural innovations. The absence of ablation studies or broader benchmarks is a notable gap, and future work should address scalability and interpretability to fully realize PAM’s potential.

Recommendations

  • Conduct ablation studies to isolate the impact of complex-valued operations vs. architectural innovations (e.g., matrix-state retrieval) on performance.
  • Expand evaluation to diverse datasets, languages, and model scales to assess generalizability and robustness.
  • Explore hybrid architectures combining PAM with transformer components to mitigate computational overhead while retaining performance benefits.
  • Develop theoretical frameworks to connect PAM’s mechanics to cognitive or quantum-inspired models of language, enabling falsifiable predictions.
  • Investigate interpretability techniques for the complex-valued state matrix to enhance transparency and trust in PAM-based systems.

Sources

Original: arXiv - cs.CL