Academic

Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective

Yunhao Liu, Zian Jia, Xinyu Gao, Kanjun Xu, Yun Xiong · February 22, 2026 · 1 min read · 4 views

#cs.CL #cs.AI #cs.IR

arXiv:2602.15856v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) effectively grounds Large Language Models (LLMs) with external knowledge and is widely applied to Web-related tasks. However, its scalability is hindered by excessive context length and redundant retrievals. Recent research on soft context compression aims to address this by encoding long documents into compact embeddings, yet they often underperform non-compressed RAG due to their reliance on auto-encoder-like full-compression that forces the encoder to compress all document information regardless of relevance to the input query. In this work, we conduct an analysis on this paradigm and reveal two fundamental limitations: (I) Infeasibility, full-compression conflicts with the LLM's downstream generation behavior; and (II) Non-necessity: full-compression is unnecessary and dilutes task-relevant information density. Motivated by these insights, we introduce SeleCom, a selector-based soft compression framework for RAG that redefines the encoder's role as query-conditioned information selector. The selector is decoder-only and is trained with a massive, diverse and difficulty-graded synthetic QA dataset with curriculum learning. Extensive experiments show that SeleCom significantly outperforms existing soft compression approaches and achieves competitive or superior performance to non-compression baselines, while reducing computation and latency by 33.8%~84.6%.

Executive Summary

This article presents a novel soft compression framework, SeleCom, for Retrieval-Augmented Generation (RAG) that addresses the limitations of existing soft compression approaches. By redefining the encoder's role as a query-conditioned information selector, SeleCom significantly outperforms existing methods and achieves competitive or superior performance to non-compression baselines while reducing computation and latency. The framework is trained with a massive, diverse, and difficulty-graded synthetic QA dataset using curriculum learning. The article's findings have important implications for the development of more efficient and effective RAG models, particularly in web-related tasks where scalability is a significant concern.

Key Points

▸ RAG's scalability is hindered by excessive context length and redundant retrievals
▸ Existing soft compression approaches often underperform non-compressed RAG due to their reliance on auto-encoder-like full-compression
▸ SeleCom introduces a selector-based soft compression framework that redefines the encoder's role as query-conditioned information selector

Merits

Strength

SeleCom's query-conditioned selector framework allows for more efficient and effective compression of context information, leading to improved performance and reduced computation and latency.

Demerits

Limitation

The article's findings are based on a specific synthetic QA dataset, and it is unclear whether SeleCom would perform equally well on other types of datasets or in different task settings.

Expert Commentary

This article makes a significant contribution to the field of language modeling and RAG by introducing a novel soft compression framework that addresses the limitations of existing approaches. SeleCom's query-conditioned selector framework has the potential to improve the efficiency and effectiveness of RAG models, particularly in web-related tasks. The article's findings are based on a well-designed experiment and have important implications for the development of more efficient and effective language models. However, further research is needed to fully explore the capabilities and limitations of SeleCom and to determine its applicability to other types of datasets and task settings.

Recommendations

✓ Further research is needed to explore the capabilities and limitations of SeleCom and to determine its applicability to other types of datasets and task settings.
✓ The development of more efficient and effective language models, particularly in the context of large language models like RAG, should be prioritized to improve the scalability and effectiveness of web-related tasks.

Sources

arXiv - cs.AI

Something extraordinary is coming.

Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective

AI Commentary

Executive Summary

Key Points

Merits

Strength

Demerits

Limitation

Expert Commentary

Recommendations

Sources

Related Articles

Humans and LLMs Diverge on Probabilistic Inferences

France or Spain or Germany or France: A Neural Account …

Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of …

JCG, PC

HSOLLC Co., Ltd.