Academic

Understanding the Dynamics of Demonstration Conflict in In-Context Learning

arXiv:2603.04464v1 Announce Type: new Abstract: In-context learning enables large language models to perform novel tasks through few-shot demonstrations. However, demonstrations per se can naturally contain noise and conflicting examples, making this capability vulnerable. To understand how models process such conflicts, we study demonstration-dependent tasks requiring models to infer underlying patterns, a process we characterize as rule inference. We find that models suffer substantial performance degradation from a single demonstration with corrupted rule. This systematic misleading behavior motivates our investigation of how models process conflicting evidence internally. Using linear probes and logit lens analysis, we discover that under corruption models encode both correct and incorrect rules in intermediate layers but develop prediction confidence only in late layers, revealing a two-phase computational structure. We then identify attention heads for each phase underlying the

D
Difan Jiao, Di Wang, Lijie Hu
· · 1 min read · 2 views

arXiv:2603.04464v1 Announce Type: new Abstract: In-context learning enables large language models to perform novel tasks through few-shot demonstrations. However, demonstrations per se can naturally contain noise and conflicting examples, making this capability vulnerable. To understand how models process such conflicts, we study demonstration-dependent tasks requiring models to infer underlying patterns, a process we characterize as rule inference. We find that models suffer substantial performance degradation from a single demonstration with corrupted rule. This systematic misleading behavior motivates our investigation of how models process conflicting evidence internally. Using linear probes and logit lens analysis, we discover that under corruption models encode both correct and incorrect rules in intermediate layers but develop prediction confidence only in late layers, revealing a two-phase computational structure. We then identify attention heads for each phase underlying the reasoning failures: Vulnerability Heads in early-to-middle layers exhibit positional attention bias with high sensitivity to corruption, while Susceptible Heads in late layers significantly reduce support for correct predictions when exposed to the corrupted evidence. Targeted ablation validates our findings, with masking a small number of identified heads improving performance by over 10%.

Executive Summary

This article delves into the dynamics of demonstration conflict in in-context learning, a process where large language models are trained to perform novel tasks through few-shot demonstrations. The researchers discovered that models suffer significant performance degradation when presented with a single corrupted demonstration, indicating a systematic misleading behavior. Through linear probes and logit lens analysis, the study found a two-phase computational structure where models encode both correct and incorrect rules, but develop prediction confidence only in late layers. Vulnerability Heads and Susceptible Heads were identified as the primary attention heads responsible for reasoning failures. The findings have significant implications for understanding how language models process conflicting evidence and make predictions. The study's methodology and results provide valuable insights for improving the robustness and reliability of in-context learning models.

Key Points

  • Language models suffer significant performance degradation when presented with a single corrupted demonstration.
  • The study identified a two-phase computational structure in language models, where correct and incorrect rules are encoded in intermediate and late layers, respectively.
  • Vulnerability Heads and Susceptible Heads were identified as the primary attention heads responsible for reasoning failures.

Merits

Strength of Methodology

The study employed a robust methodology involving linear probes and logit lens analysis, providing a comprehensive understanding of language models' internal workings.

Insights into Language Model Behavior

The study's findings offer valuable insights into how language models process conflicting evidence and make predictions, with significant implications for improving model robustness and reliability.

Demerits

Limited Generalizability

The study's findings may not generalize to other types of language models or tasks, limiting the broader applicability of the research.

Simplistic Demonstration Design

The study's use of simplistic demonstration designs may not accurately reflect real-world scenarios, potentially limiting the study's external validity.

Expert Commentary

The study's findings are a significant contribution to the field of natural language processing, providing valuable insights into the internal workings of language models and their limitations. The study's methodology and results have implications for improving the robustness and reliability of language models, with significant practical and policy implications. However, the study's limitations, including its simplistic demonstration design and limited generalizability, should be carefully considered in future research.

Recommendations

  • Future research should focus on developing more robust and reliable language models, with a particular emphasis on improving the explainability and robustness of language models in the context of conflicting evidence and reasoning failures.
  • Researchers should consider more complex and diverse demonstration designs to improve the study's external validity and generalizability.

Sources