Academic

Additive Multi-Step Markov Chains and the Curse of Dimensionality in Large Language Models

arXiv:2603.04412v1 Announce Type: new Abstract: Large-scale language models (LLMs) operate in extremely high-dimensional state spaces, where both token embeddings and their hidden representations create complex dependencies that are not easily reduced to classical Markov structures. In this paper, we explore a theoretically feasible approximation of LLM dynamics using N-order additive Markov chains. Such models allow the conditional probability of the next token to be decomposed into a superposition of contributions from multiple historical depths, reducing the combinatorial explosion typically associated with high-order Markov processes. The main result of the work is the establishment of a correspondence between an additive multi-step chain and a chain with a step-wise memory function. This equivalence allowed the introduction of the concept of information temperature not only for stepwise but also for additive N-order Markov chains.

O. V. Usatenko, S. S. Melnyk, G. M. Pritula · March 7, 2026 · 1 min read · 9 views

#cs.CL

Executive Summary

This article explores the application of additive multi-step Markov chains to large language models, providing a theoretically feasible approximation of their dynamics. By decomposing the conditional probability of the next token into contributions from multiple historical depths, the authors reduce the combinatorial explosion associated with high-order Markov processes. The main result establishes a correspondence between additive multi-step chains and chains with a step-wise memory function, introducing the concept of information temperature for additive N-order Markov chains.

Key Points

▸ Introduction of additive multi-step Markov chains for large language models
▸ Decomposition of conditional probability into contributions from multiple historical depths
▸ Establishment of a correspondence between additive multi-step chains and chains with a step-wise memory function

Merits

Theoretical Foundation

The article provides a solid theoretical foundation for the application of additive multi-step Markov chains to large language models, addressing the curse of dimensionality.

Demerits

Computational Complexity

The implementation of additive multi-step Markov chains may still pose significant computational challenges, particularly for very large language models.

Expert Commentary

The article makes a significant contribution to the field of natural language processing by providing a novel approach to addressing the curse of dimensionality in large language models. The introduction of additive multi-step Markov chains offers a promising solution to the challenges posed by high-dimensional state spaces. However, further research is needed to fully explore the potential of this approach and to address the remaining computational complexities. The article's findings have important implications for the development of more efficient and effective language models, and its impact is likely to be felt across various applications and industries.

Recommendations

✓ Further research into the application of additive multi-step Markov chains to large language models
✓ Exploration of the potential benefits and challenges of implementing this approach in real-world language model deployments

Sources

arXiv - cs.CL

Additive Multi-Step Markov Chains and the Curse of Dimensionality in Large Language Models

AI Commentary

Executive Summary

Key Points

Merits

Theoretical Foundation

Demerits

Computational Complexity

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs