The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts
arXiv:2602.15843v1 Announce Type: cross Abstract: In "Compress or Route?" (Johnson, 2026), we found that code generation tolerates aggressive prompt compression (r >= 0.6) while chain-of-thought reasoning degrades gradually. That study was limited to HumanEval (164 problems), left the "perplexity paradox" mechanism unvalidated, and provided no adaptive algorithm. This paper addresses all three gaps. First, we validate across six code benchmarks (HumanEval, MBPP, HumanEval+, MultiPL-E) and four reasoning benchmarks (GSM8K, MATH, ARC-Challenge, MMLU-STEM), confirming the compression threshold generalizes across languages and difficulties. Second, we conduct the first per-token perplexity analysis (n=723 tokens), revealing a "perplexity paradox": code syntax tokens are preserved (high perplexity) while numerical values in math problems are pruned despite being task-critical (low perplexity). Signature injection recovers +34 percentage points in pass rate (5.3% to 39.3%; Cohen's h=0.890).
arXiv:2602.15843v1 Announce Type: cross Abstract: In "Compress or Route?" (Johnson, 2026), we found that code generation tolerates aggressive prompt compression (r >= 0.6) while chain-of-thought reasoning degrades gradually. That study was limited to HumanEval (164 problems), left the "perplexity paradox" mechanism unvalidated, and provided no adaptive algorithm. This paper addresses all three gaps. First, we validate across six code benchmarks (HumanEval, MBPP, HumanEval+, MultiPL-E) and four reasoning benchmarks (GSM8K, MATH, ARC-Challenge, MMLU-STEM), confirming the compression threshold generalizes across languages and difficulties. Second, we conduct the first per-token perplexity analysis (n=723 tokens), revealing a "perplexity paradox": code syntax tokens are preserved (high perplexity) while numerical values in math problems are pruned despite being task-critical (low perplexity). Signature injection recovers +34 percentage points in pass rate (5.3% to 39.3%; Cohen's h=0.890). Third, we propose TAAC (Task-Aware Adaptive Compression), achieving 22% cost reduction with 96% quality preservation, outperforming fixed-ratio compression by 7%. MBPP validation (n=1,800 trials) confirms systematic variation: 3.6% at r=0.3 to 54.6% at r=1.0.
Executive Summary
This article explores the 'perplexity paradox' in large language models (LLMs), where code generation tolerates aggressive prompt compression while math reasoning degrades. The study validates this phenomenon across multiple benchmarks, revealing that code syntax tokens are preserved while numerical values in math problems are pruned. The authors propose a task-aware adaptive compression algorithm, achieving significant cost reduction and quality preservation. The findings have implications for the development of more efficient and effective LLMs.
Key Points
- ▸ The 'perplexity paradox' is a phenomenon where code generation tolerates aggressive prompt compression while math reasoning degrades
- ▸ Code syntax tokens are preserved despite high perplexity, while numerical values in math problems are pruned despite being task-critical
- ▸ The proposed task-aware adaptive compression algorithm achieves 22% cost reduction with 96% quality preservation
Merits
Comprehensive validation
The study validates the 'perplexity paradox' across multiple code and reasoning benchmarks, providing strong evidence for its generalizability
Demerits
Limited scope
The study focuses primarily on the 'perplexity paradox' and its implications for LLMs, without exploring broader applications or potential limitations
Expert Commentary
The 'perplexity paradox' is a fascinating phenomenon that highlights the complex and nuanced nature of LLMs. The study's findings have significant implications for the development of more efficient and effective LLMs, and underscore the need for further research into the underlying mechanisms and limitations of these models. The proposed task-aware adaptive compression algorithm is a promising approach that could help mitigate the 'perplexity paradox' and improve overall performance. However, further study is needed to fully understand the implications and potential limitations of this approach.
Recommendations
- ✓ Further research into the underlying mechanisms and limitations of LLMs, particularly with regards to the 'perplexity paradox'
- ✓ Development of more robust and adaptive compression algorithms that can effectively balance efficiency and quality preservation