Academic

Training In-Context and In-Weights Mixtures Via Contrastive Context Sampling

arXiv:2604.01601v1 Announce Type: new Abstract: We investigate training strategies that co-develop in-context learning (ICL) and in-weights learning (IWL), and the ability to switch between them based on context relevance. Although current LLMs exhibit both modes, standard task-specific fine-tuning often erodes ICL, motivating IC-Train - fine-tuning with in-context examples. Prior work has shown that emergence of ICL after IC-Train depends on factors such as task diversity and training duration. In this paper we show that the similarity structure between target inputs and context examples also plays an important role. Random context leads to loss of ICL and IWL dominance, while only similar examples in context causes ICL to degenerate to copying labels without regard to relevance. To address this, we propose a simple Contrastive-Context which enforces two types of contrasts: (1) mix of similar and random examples within a context to evolve a correct form of ICL, and (2) varying grad

arXiv:2604.01601v1 Announce Type: new Abstract: We investigate training strategies that co-develop in-context learning (ICL) and in-weights learning (IWL), and the ability to switch between them based on context relevance. Although current LLMs exhibit both modes, standard task-specific fine-tuning often erodes ICL, motivating IC-Train - fine-tuning with in-context examples. Prior work has shown that emergence of ICL after IC-Train depends on factors such as task diversity and training duration. In this paper we show that the similarity structure between target inputs and context examples also plays an important role. Random context leads to loss of ICL and IWL dominance, while only similar examples in context causes ICL to degenerate to copying labels without regard to relevance. To address this, we propose a simple Contrastive-Context which enforces two types of contrasts: (1) mix of similar and random examples within a context to evolve a correct form of ICL, and (2) varying grades of similarity across contexts to evolve ICL-IWL mixtures. We present insights on the importance of such contrast with theoretical analysis of a minimal model. We validate with extensive empirical evaluation on four LLMs and several tasks. Diagnostic probes confirm that contrasted contexts yield stable ICL-IWL mixtures, avoiding collapse into pure ICL, IWL, or copying.

Executive Summary

This paper advances the understanding of training strategies for large language models (LLMs) by examining the interplay between in-context learning (ICL) and in-weights learning (IWL). The authors identify that traditional fine-tuning can degrade ICL, and propose a novel approach, Contrastive-Context training, to mitigate this issue. Their method introduces controlled contrasts in training examples—mixing similar and random contexts and varying similarity grades across contexts—to stabilize the balance between ICL and IWL. Theoretical and empirical evaluations across multiple LLMs and tasks demonstrate that this approach prevents collapse into pure ICL, IWL, or label-copying behaviors. The work contributes to the broader discourse on optimizing LLM training paradigms for robust and adaptive learning.

Key Points

  • The paper highlights the erosion of in-context learning (ICL) during standard fine-tuning and motivates the need for IC-Train to preserve ICL capabilities.
  • The authors demonstrate that the similarity structure between target inputs and context examples critically influences the balance between ICL and in-weights learning (IWL).
  • A novel Contrastive-Context training strategy is proposed, incorporating mixed and graded similarity contrasts to stabilize ICL-IWL mixtures and prevent degenerate behaviors such as label-copying.

Merits

Innovative Training Paradigm

The introduction of Contrastive-Context training represents a novel approach to co-developing ICL and IWL, addressing a critical gap in current LLM training methodologies.

Theoretical and Empirical Rigor

The paper combines theoretical analysis with extensive empirical validation across multiple LLMs and tasks, providing robust evidence for the efficacy of the proposed method.

Addressing Collapse Phenomena

The work effectively tackles the problem of training collapse into pure ICL, IWL, or copying behaviors, which is a significant challenge in the field.

Demerits

Scalability and Generalization

The empirical validation is limited to four LLMs and specific tasks, raising questions about the scalability and generalization of Contrastive-Context training to broader or more complex scenarios.

Computational Overhead

The proposed method may introduce additional computational overhead due to the need for carefully curated contrastive examples, which could be a barrier for resource-constrained settings.

Lack of Benchmark Comparisons

The paper does not comprehensively compare Contrastive-Context training against state-of-the-art alternatives, leaving its relative performance against other advanced training strategies unclear.

Expert Commentary

This paper makes a significant contribution to the field by addressing a critical challenge in LLM training: the preservation and co-development of in-context and in-weights learning mechanisms. The authors’ theoretical analysis of a minimal model provides a compelling foundation for their empirical findings, which are robust and well-validated. The introduction of Contrastive-Context training is particularly noteworthy, as it offers a principled approach to avoiding the collapse into degenerate behaviors that plague many current training paradigms. However, the scalability of the method to larger or more diverse models and tasks remains an open question. Additionally, the computational overhead of curating contrastive examples may limit adoption in resource-constrained environments. Nevertheless, the paper’s insights into the role of similarity structure in training contexts are likely to inspire further research in this area, with potential implications for curriculum learning, prompt engineering, and model interpretability. The work underscores the importance of balancing plasticity and stability in LLM training, a theme that resonates with broader debates in AI safety and robustness.

Recommendations

  • Further research should explore the scalability of Contrastive-Context training across a wider range of models, tasks, and domains to validate its generalizability and identify potential edge cases.
  • Developers and practitioners should experiment with Contrastive-Context training in their fine-tuning pipelines, particularly in scenarios where maintaining ICL capabilities is critical, and document its impact on model performance and robustness.
  • Future work could investigate automated or semi-automated methods for generating contrastive examples to reduce the manual curation overhead and improve scalability.
  • The research community should consider integrating similarity structure analysis into benchmarking frameworks for LLM training to better understand its impact on model capabilities across different tasks.

Sources

Original: arXiv - cs.LG