Academic

Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO

Xin Yang, Letian Li, Abudukelimu Wuerkaixi, Xuxin Cheng, Cao Liu, Ke Zeng, Xunliang Cai, Wenyuan Jiang · March 6, 2026 · 1 min read · 36 views

#cs.CL #cs.AI #cs.LG

arXiv:2603.03314v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated remarkable and steadily improving performance across a wide range of tasks. However, LLM performance may be highly sensitive to prompt variations especially in scenarios with limited openness or strict output formatting requirements, indicating insufficient robustness. In real-world applications, user prompts provided to LLMs often contain imperfections, which may undermine the quality of the model's responses. To address this issue, previous work has primarily focused on preprocessing prompts, employing external tools or even LLMs to refine prompt formulations in advance. However, these approaches overlook the intrinsic robustness of LLMs, and their reliance on external components introduces additional computational overhead and uncertainty. In this work, we propose a Contrastive Learning-based Inverse Direct Preference Optimization (CoIPO) method that minimizes the discrepancy between the label-aligned logits produced by the model under a clean prompt and its noisy counterpart, and conduct a detailed analysis using mutual information theory. We augment the FLAN dataset by constructing paired prompts, each consisting of a clean prompt and its corresponding noisy version for training. Additionally, to evaluate the effectiveness, we develop NoisyPromptBench, a benchmark enhanced and derived from the existing PromptBench. Experimental results conducted on NoisyPromptBench demonstrate that our proposed method achieves a significant improvement in average accuracy over the current state-of-the-art approaches. The source code of CoIPO, pair-wise FLAN datasets, and NoisyPromptBench have already been released on https://github.com/vegetable-yx/CoIPO.

Executive Summary

This article presents a novel method, CoIPO, to enhance the intrinsic robustness of large language models (LLMs) against prompt noise. CoIPO employs a contrastive learning approach to minimize the discrepancy between clean and noisy prompt-aligned logits, thereby improving the model's ability to generate accurate responses. The authors develop a benchmark, NoisyPromptBench, and conduct experiments demonstrating the effectiveness of CoIPO. The proposed method achieves significant improvements in average accuracy over state-of-the-art approaches, showcasing its potential to address the limitations of existing robustness techniques. The CoIPO framework and related datasets are publicly available, facilitating further research and adoption.

Key Points

▸ CoIPO proposes a contrastive learning-based method to enhance intrinsic robustness of LLMs against prompt noise.
▸ The method minimizes the discrepancy between clean and noisy prompt-aligned logits using mutual information theory.
▸ NoisyPromptBench, a benchmark derived from PromptBench, is developed to evaluate the effectiveness of CoIPO.

Merits

Robustness Enhancement

CoIPO improves the intrinsic robustness of LLMs against prompt noise, enabling them to generate accurate responses even in the presence of imperfections.

Efficiency

The contrastive learning approach in CoIPO eliminates the need for external tools or preprocessing, reducing computational overhead and uncertainty.

Scalability

CoIPO is designed to be scalable, supporting training on large datasets and adapting to various applications.

Demerits

Data Requirements

CoIPO requires paired prompts with clean and noisy versions for training, which may be challenging to obtain in certain scenarios.

Model Complexity

The proposed method may introduce additional complexity to the LLM architecture, potentially affecting its overall performance.

Expert Commentary

This article makes a significant contribution to the field of language model robustness. CoIPO's innovative approach to enhancing intrinsic robustness has the potential to revolutionize the way we design and deploy language models in real-world applications. While the method requires paired prompts for training, it eliminates the need for external tools and preprocessing, making it more efficient and scalable. The development of NoisyPromptBench provides a valuable benchmark for evaluating the effectiveness of CoIPO and other robustness techniques. However, further research is needed to address the potential complexity introduced by CoIPO and to explore its applicability in various domains.

Recommendations

✓ Researchers should investigate the application of CoIPO in high-stakes domains, such as healthcare and finance, to ensure the development of robust language models.
✓ Developers should consider incorporating CoIPO into existing language models to enhance their intrinsic robustness against prompt noise.

Sources

arXiv - cs.AI

Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO

AI Commentary

Executive Summary

Key Points

Merits

Robustness Enhancement

Efficiency

Scalability

Demerits

Data Requirements

Model Complexity

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs