Academic

The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts

arXiv:2602.15843v1 Announce Type: cross Abstract: In "Compress or Route?" (Johnson, 2026), we found that code generation tolerates aggressive prompt compression (r >= 0.6) while chain-of-thought reasoning degrades gradually. That study was limited to HumanEval (164 problems), left the "perplexity paradox" mechanism unvalidated, and provided no adaptive algorithm. This paper addresses all three gaps. First, we validate across six code benchmarks (HumanEval, MBPP, HumanEval+, MultiPL-E) and four reasoning benchmarks (GSM8K, MATH, ARC-Challenge, MMLU-STEM), confirming the compression threshold generalizes across languages and difficulties. Second, we conduct the first per-token perplexity analysis (n=723 tokens), revealing a "perplexity paradox": code syntax tokens are preserved (high perplexity) while numerical values in math problems are pruned despite being task-critical (low perplexity). Signature injection recovers +34 percentage points in pass rate (5.3% to 39.3%; Cohen's h=0.890).

Warren Johnson · February 23, 2026 · 1 min read · 4 views

#cs.CL #cs.AI

Executive Summary

This article explores the 'perplexity paradox' in large language models (LLMs), where code generation tolerates aggressive prompt compression while math reasoning degrades. The study validates this phenomenon across multiple benchmarks, revealing that code syntax tokens are preserved while numerical values in math problems are pruned. The authors propose a task-aware adaptive compression algorithm, achieving significant cost reduction and quality preservation. The findings have implications for the development of more efficient and effective LLMs.

Key Points

▸ The 'perplexity paradox' is a phenomenon where code generation tolerates aggressive prompt compression while math reasoning degrades
▸ Code syntax tokens are preserved despite high perplexity, while numerical values in math problems are pruned despite being task-critical
▸ The proposed task-aware adaptive compression algorithm achieves 22% cost reduction with 96% quality preservation

Merits

Comprehensive validation

The study validates the 'perplexity paradox' across multiple code and reasoning benchmarks, providing strong evidence for its generalizability

Demerits

Limited scope

The study focuses primarily on the 'perplexity paradox' and its implications for LLMs, without exploring broader applications or potential limitations

Expert Commentary

The 'perplexity paradox' is a fascinating phenomenon that highlights the complex and nuanced nature of LLMs. The study's findings have significant implications for the development of more efficient and effective LLMs, and underscore the need for further research into the underlying mechanisms and limitations of these models. The proposed task-aware adaptive compression algorithm is a promising approach that could help mitigate the 'perplexity paradox' and improve overall performance. However, further study is needed to fully understand the implications and potential limitations of this approach.

Recommendations

✓ Further research into the underlying mechanisms and limitations of LLMs, particularly with regards to the 'perplexity paradox'
✓ Development of more robust and adaptive compression algorithms that can effectively balance efficiency and quality preservation

Sources

arXiv - cs.AI

Something extraordinary is coming.

The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive validation

Demerits

Limited scope

Expert Commentary

Recommendations

Sources

Related Articles

Humans and LLMs Diverge on Probabilistic Inferences

France or Spain or Germany or France: A Neural Account …

Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of …

JCG, PC

HSOLLC Co., Ltd.