Academic

User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction

arXiv:2603.20939v1 Announce Type: new Abstract: Large language models are increasingly used as personal assistants, yet most lack a persistent user model, forcing users to repeatedly restate preferences across sessions. We propose Vector-Adapted Retrieval Scoring (VARS), a pipeline-agnostic, frozen-backbone framework that represents each user with long-term and short-term vectors in a shared preference space and uses these vectors to bias retrieval scoring over structured preference memory. The vectors are updated online from weak scalar rewards from users' feedback, enabling personalization without per-user fine-tuning. We evaluate on \textsc{MultiSessionCollab}, an online multi-session collaboration benchmark with rich user preference profiles, across math and code tasks. Under frozen backbones, the main benefit of user-aware retrieval is improved interaction efficiency rather than large gains in raw task accuracy: our full VARS agent achieves the strongest overall performance, matc

Yuren Hao, Shuhaib Mehri, ChengXiang Zhai, Dilek Hakkani-T\"ur · March 24, 2026 · 1 min read · 17 views

#cs.CL #cs.AI #cs.HC #cs.IR #stat.ML

Executive Summary

The article 'User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction' proposes a novel framework, Vector-Adapted Retrieval Scoring (VARS), to represent users with long-term and short-term vectors in a shared preference space. The VARS framework updates these vectors online from weak scalar rewards, enabling personalization without per-user fine-tuning. The authors evaluate VARS on the extsc{MultiSessionCollab} benchmark, demonstrating improved interaction efficiency and matching a strong Reflection baseline in task success. The results highlight the potential of VARS in enhancing conversational LLM agents and supporting the interpretability of dual-vector design. The authors' approach offers a promising solution for addressing the limitations of current LLM agents, which often lack persistent user models, leading to repeated re-statements of preferences across sessions.

Key Points

▸ The VARS framework proposes a pipeline-agnostic, frozen-backbone approach to represent users with long-term and short-term vectors.
▸ The framework updates vectors online from weak scalar rewards, enabling personalization without per-user fine-tuning.
▸ VARS demonstrates improved interaction efficiency and matches a strong Reflection baseline in task success on the extsc{MultiSessionCollab} benchmark.

Merits

Strength in Personalization

VARS enables personalization without per-user fine-tuning, addressing the limitations of current LLM agents that lack persistent user models.

Improved Interaction Efficiency

The VARS framework reduces timeout rate and user effort, making interactions more efficient and user-friendly.

Interpretability

The dual-vector design supports interpretability, with long-term vectors aligning with cross-user preference overlap and short-term vectors capturing session-specific adaptation.

Demerits

Limited Feedback

The VARS framework relies on weak scalar rewards for updating vectors, which may limit its effectiveness in scenarios with limited user feedback.

Dependency on Benchmark

The evaluation of VARS on the extsc{MultiSessionCollab} benchmark may not generalize to other domains or scenarios, highlighting the need for further experimentation.

Expert Commentary

The article 'User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction' makes a significant contribution to the field of conversational AI by proposing a novel framework for user preference modeling. The VARS framework offers a promising solution for addressing the limitations of current LLM agents, which often lack persistent user models. The evaluation on the extsc{MultiSessionCollab} benchmark demonstrates the effectiveness of VARS in improving interaction efficiency and matching a strong Reflection baseline in task success. However, the reliance on weak scalar rewards for updating vectors may limit its effectiveness in scenarios with limited user feedback. Further experimentation and evaluation on diverse domains and scenarios are necessary to fully realize the potential of VARS.

Recommendations

✓ Future work should focus on exploring alternative approaches for updating vectors, such as using more robust feedback mechanisms or incorporating multiple feedback sources.
✓ The development of VARS should be extended to other domains and scenarios, including areas such as education and customer service, to evaluate its effectiveness and generalizability.

Sources

Original: arXiv - cs.CL

arXiv - cs.CL

User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction

AI Commentary

Executive Summary

Key Points

Merits

Strength in Personalization

Improved Interaction Efficiency

Interpretability

Demerits

Limited Feedback

Dependency on Benchmark

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.