CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training
arXiv:2603.06610v1 Announce Type: new Abstract: Large language model (LLM) post-training enhances latent skills, unlocks value alignment, improves performance, and enables domain adaptation. Unfortunately, post-training is known to induce forgetting, especially in the ubiquitous use-case of leveraging third-party pre-trained models, which is typically understood as a loss of parametric or factual knowledge. We argue that this accuracy-centric view is insufficient for modern foundation models and instead define forgetting as systematic model drift that degrades behavior and user experience. In this context, we introduce \textbf{CapTrack}, a capability-centric framework for analyzing forgetting in LLMs that combines a behavioral taxonomy with an evaluation suite built on established benchmarks and targeted adaptations. Using CapTrack, we conduct a large-scale empirical study across post-training algorithms, domains, and model families, including models up to 80B parameters. We find that
arXiv:2603.06610v1 Announce Type: new Abstract: Large language model (LLM) post-training enhances latent skills, unlocks value alignment, improves performance, and enables domain adaptation. Unfortunately, post-training is known to induce forgetting, especially in the ubiquitous use-case of leveraging third-party pre-trained models, which is typically understood as a loss of parametric or factual knowledge. We argue that this accuracy-centric view is insufficient for modern foundation models and instead define forgetting as systematic model drift that degrades behavior and user experience. In this context, we introduce \textbf{CapTrack}, a capability-centric framework for analyzing forgetting in LLMs that combines a behavioral taxonomy with an evaluation suite built on established benchmarks and targeted adaptations. Using CapTrack, we conduct a large-scale empirical study across post-training algorithms, domains, and model families, including models up to 80B parameters. We find that forgetting extends beyond parametric knowledge, with pronounced drift in robustness and default behaviors. Instruction fine-tuning induces the strongest relative drift, while preference optimization is more conservative and can partially recover lost capabilities. Differences across model families persist, and no universal mitigation emerges.
Executive Summary
This study introduces CapTrack, a capability-centric framework for evaluating forgetting in large language models (LLMs) post-training. The authors argue that traditional accuracy-centric views of forgetting are insufficient and propose a behavioral taxonomy to assess model drift that degrades behavior and user experience. The study conducts a large-scale empirical analysis across post-training algorithms, domains, and model families, including models up to 80B parameters. The findings reveal that forgetting extends beyond parametric knowledge, with pronounced drift in robustness and default behaviors. The study's results have significant implications for the development and deployment of LLMs, highlighting the need for more comprehensive evaluation frameworks and mitigation strategies.
Key Points
- ▸ CapTrack framework evaluates forgetting in LLMs post-training using a behavioral taxonomy.
- ▸ Forgetting extends beyond parametric knowledge, affecting robustness and default behaviors.
- ▸ Instruction fine-tuning induces strongest relative drift, while preference optimization is more conservative.
Merits
Strength in Methodology
The authors develop a comprehensive framework for evaluating forgetting in LLMs, addressing a significant gap in the field.
Strength in Rigor
The study conducts a large-scale empirical analysis across various post-training algorithms, domains, and model families, providing robust findings.
Demerits
Limitation in Generalizability
The study's findings may not be directly generalizable to all LLMs and post-training scenarios, highlighting the need for further research.
Limitation in Scalability
The analysis is computationally intensive, requiring significant resources, which may limit the framework's practical applicability.
Expert Commentary
The CapTrack framework represents a significant advancement in the evaluation of forgetting in LLMs post-training. By shifting the focus from accuracy-centric views to behavioral taxonomy, the authors provide a more comprehensive understanding of model drift and adaptation. While the study's findings are robust, the limitations in generalizability and scalability highlight the need for further research and development. As the field continues to evolve, it is essential to address the implications of model forgetting on user experience and behavior, ensuring that LLMs are developed and deployed responsibly.
Recommendations
- ✓ Future research should focus on developing more scalable and computationally efficient evaluation frameworks, such as CapTrack, to facilitate widespread adoption.
- ✓ LLM developers and researchers should prioritize the development of more targeted optimization strategies to mitigate forgetting, particularly in instruction fine-tuning and preference optimization.