Academic

Breaking the Correlation Plateau: On the Optimization and Capacity Limits of Attention-Based Regressors

Jingquan Yan, Yuwei Miao, Peiran Yu, Junzhou Huang · February 24, 2026 · 1 min read · 8 views

#cs.LG

arXiv:2602.17898v1 Announce Type: new Abstract: Attention-based regression models are often trained by jointly optimizing Mean Squared Error (MSE) loss and Pearson correlation coefficient (PCC) loss, emphasizing the magnitude of errors and the order or shape of targets, respectively. A common but poorly understood phenomenon during training is the PCC plateau: PCC stops improving early in training, even as MSE continues to decrease. We provide the first rigorous theoretical analysis of this behavior, revealing fundamental limitations in both optimization dynamics and model capacity. First, in regard to the flattened PCC curve, we uncover a critical conflict where lowering MSE (magnitude matching) can paradoxically suppress the PCC gradient (shape matching). This issue is exacerbated by the softmax attention mechanism, particularly when the data to be aggregated is highly homogeneous. Second, we identify a limitation in the model capacity: we derived a PCC improvement limit for any convex aggregator (including the softmax attention), showing that the convex hull of the inputs strictly bounds the achievable PCC gain. We demonstrate that data homogeneity intensifies both limitations. Motivated by these insights, we propose the Extrapolative Correlation Attention (ECA), which incorporates novel, theoretically-motivated mechanisms to improve the PCC optimization and extrapolate beyond the convex hull. Across diverse benchmarks, including challenging homogeneous data setting, ECA consistently breaks the PCC plateau, achieving significant improvements in correlation without compromising MSE performance.

Executive Summary

The article analyzes the correlation plateau phenomenon in attention-based regression models, where the Pearson correlation coefficient (PCC) stops improving despite decreasing Mean Squared Error (MSE). The authors identify optimization and capacity limitations, including a conflict between lowering MSE and suppressing PCC gradients, and a limitation in model capacity due to the convex hull of inputs. They propose the Extrapolative Correlation Attention (ECA) mechanism to improve PCC optimization and break the plateau, demonstrating significant improvements in correlation without compromising MSE performance.

Key Points

▸ The PCC plateau phenomenon is a common but poorly understood issue in attention-based regression models
▸ Optimization dynamics and model capacity limitations contribute to the PCC plateau
▸ The proposed ECA mechanism improves PCC optimization and breaks the plateau, achieving significant correlation improvements

Merits

Rigorous Theoretical Analysis

The article provides the first rigorous theoretical analysis of the PCC plateau phenomenon, offering valuable insights into optimization dynamics and model capacity limitations

Effective Solution

The proposed ECA mechanism demonstrates significant improvements in correlation without compromising MSE performance, making it a valuable contribution to the field

Demerits

Limited Scope

The article focuses primarily on attention-based regression models, which may limit its applicability to other areas of machine learning

Complexity

The theoretical analysis and proposed ECA mechanism may be complex and challenging to implement for some practitioners

Expert Commentary

The article provides a significant contribution to the understanding of attention-based regression models, shedding light on the PCC plateau phenomenon and proposing an effective solution. The authors' rigorous theoretical analysis and thorough experimentation demonstrate the value of their approach, which has the potential to improve the performance of various machine learning models. However, the complexity of the proposed ECA mechanism may require additional development and refinement to facilitate widespread adoption.

Recommendations

✓ Further research is needed to explore the applicability of the ECA mechanism to other areas of machine learning and to develop more efficient and scalable implementations
✓ Practitioners should consider the article's findings and proposed solution when developing and optimizing attention-based regression models

Sources

arXiv - cs.LG

Something extraordinary is coming.

Breaking the Correlation Plateau: On the Optimization and Capacity Limits of Attention-Based Regressors

AI Commentary

Executive Summary

Key Points

Merits

Rigorous Theoretical Analysis

Effective Solution

Demerits

Limited Scope

Complexity

Expert Commentary

Recommendations

Sources

Related Articles

Humans and LLMs Diverge on Probabilistic Inferences

France or Spain or Germany or France: A Neural Account …

Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of …

JCG, PC

HSOLLC Co., Ltd.