Quality Over Clicks: Intrinsic Quality-Driven Iterative Reinforcement Learning for Cold-Start E-Commerce Query Suggestion
arXiv:2603.22922v1 Announce Type: new Abstract: Existing dialogue systems rely on Query Suggestion (QS) to enhance user engagement. Recent efforts typically employ large language models with Click-Through Rate (CTR) model, yet fail in cold-start scenarios due to their heavy reliance on abundant online click data for effective CTR model training. To bridge this gap, we propose Cold-EQS, an iterative reinforcement learning framework for Cold-Start E-commerce Query Suggestion (EQS). Specifically, we leverage answerability, factuality, and information gain as reward to continuously optimize the quality of suggested queries. To continuously optimize our QS model, we estimate uncertainty for grouped candidate suggested queries to select hard and ambiguous samples from online user queries lacking click signals. In addition, we provide an EQS-Benchmark comprising 16,949 online user queries for offline training and evaluation. Extensive offline and online experiments consistently demonstrate a
arXiv:2603.22922v1 Announce Type: new Abstract: Existing dialogue systems rely on Query Suggestion (QS) to enhance user engagement. Recent efforts typically employ large language models with Click-Through Rate (CTR) model, yet fail in cold-start scenarios due to their heavy reliance on abundant online click data for effective CTR model training. To bridge this gap, we propose Cold-EQS, an iterative reinforcement learning framework for Cold-Start E-commerce Query Suggestion (EQS). Specifically, we leverage answerability, factuality, and information gain as reward to continuously optimize the quality of suggested queries. To continuously optimize our QS model, we estimate uncertainty for grouped candidate suggested queries to select hard and ambiguous samples from online user queries lacking click signals. In addition, we provide an EQS-Benchmark comprising 16,949 online user queries for offline training and evaluation. Extensive offline and online experiments consistently demonstrate a strong positive correlation between online and offline effectiveness. Both offline and online experimental results demonstrate the superiority of our Cold-EQS, achieving a significant +6.81% improvement in online chatUV.
Executive Summary
The article introduces Cold-EQS, an innovative iterative reinforcement learning framework designed to address cold-start challenges in e-commerce query suggestion by shifting focus from click-through rate (CTR) metrics to intrinsic quality indicators—answerability, factuality, and information gain. This approach circumvents the dependency on extensive online click data, which hampers traditional CTR-based models in cold-start contexts. The framework iteratively refines query suggestions through uncertainty estimation of grouped candidate queries, enabling targeted optimization on ambiguous or lacking-signal cases. Empirical results across offline and online evaluations validate the framework’s effectiveness, yielding a measurable +6.81% improvement in online chatUV. The inclusion of a benchmark dataset (16,949 queries) enhances reproducibility and applicability.
Key Points
- ▸ Shift from CTR to intrinsic quality metrics (answerability, factuality, information gain)
- ▸ Cold-EQS leverages uncertainty estimation to select hard/ambiguous samples without click signals
- ▸ Empirical validation shows +6.81% improvement in online engagement metrics
Merits
Innovative Framework
Cold-EQS introduces a novel paradigm by prioritizing intrinsic content quality over user behavior metrics, offering a sustainable solution for cold-start scenarios where data scarcity is inherent.
Empirical Validation
The consistent positive correlation between offline and online performance across experiments strengthens the credibility of the proposed methodology.
Benchmark Contribution
The provision of a curated benchmark dataset enhances transparency and facilitates broader adoption and replication.
Demerits
Scalability Concern
The reliance on uncertainty estimation for sample selection may introduce computational overhead in large-scale environments with high query volumes.
Limited Generalizability
Results are primarily validated within e-commerce query contexts; applicability to other domains (e.g., healthcare, legal search) remains unproven.
Expert Commentary
Cold-EQS represents a paradigm shift in the design of recommendation and suggestion systems. Historically, CTR-based optimization has dominated due to its measurable outcomes and alignement with revenue metrics. However, this article rightly identifies a critical flaw: CTR models are inherently reactive, dependent on post-hoc user behavior, making them inapplicable in cold-start contexts. By pivoting to intrinsic quality indicators, the authors address the root cause—data dependency—rather than its symptoms. The use of uncertainty estimation as a mechanism for sampling hard cases is particularly elegant, as it transforms a limitation (lack of click data) into a feature (opportunity for targeted refinement). Moreover, the benchmark dataset serves as a foundational resource, enabling empirical validation and comparative analysis across future studies. This work is seminal because it redefines the evaluation criteria for suggestion systems: quality is no longer a secondary consideration but the primary objective. It is a significant advancement that aligns with broader trends toward ethical, user-centric AI design. One potential avenue for future research is to integrate subjective quality assessments (e.g., user surveys) alongside objective metrics to create a hybrid evaluation framework.
Recommendations
- ✓ Platform developers should pilot Cold-EQS in A/B testing environments to measure impact on user engagement and retention.
- ✓ Academic institutions should replicate the benchmark dataset with domain-specific variations to test cross-sector applicability.
Sources
Original: arXiv - cs.CL