User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction
arXiv:2603.20939v1 Announce Type: new Abstract: Large language models are increasingly used as personal assistants, yet most lack a persistent user model, forcing users to repeatedly restate preferences across sessions. We propose Vector-Adapted Retrieval Scoring (VARS), a pipeline-agnostic, frozen-backbone framework that represents each user with long-term and short-term vectors in a shared preference space and uses these vectors to bias retrieval scoring over structured preference memory. The vectors are updated online from weak scalar rewards from users' feedback, enabling personalization without per-user fine-tuning. We evaluate on \textsc{MultiSessionCollab}, an online multi-session collaboration benchmark with rich user preference profiles, across math and code tasks. Under frozen backbones, the main benefit of user-aware retrieval is improved interaction efficiency rather than large gains in raw task accuracy: our full VARS agent achieves the strongest overall performance, matc
arXiv:2603.20939v1 Announce Type: new Abstract: Large language models are increasingly used as personal assistants, yet most lack a persistent user model, forcing users to repeatedly restate preferences across sessions. We propose Vector-Adapted Retrieval Scoring (VARS), a pipeline-agnostic, frozen-backbone framework that represents each user with long-term and short-term vectors in a shared preference space and uses these vectors to bias retrieval scoring over structured preference memory. The vectors are updated online from weak scalar rewards from users' feedback, enabling personalization without per-user fine-tuning. We evaluate on \textsc{MultiSessionCollab}, an online multi-session collaboration benchmark with rich user preference profiles, across math and code tasks. Under frozen backbones, the main benefit of user-aware retrieval is improved interaction efficiency rather than large gains in raw task accuracy: our full VARS agent achieves the strongest overall performance, matches a strong Reflection baseline in task success, and reduces timeout rate and user effort. The learned long-term vectors also align with cross-user preference overlap, while short-term vectors capture session-specific adaptation, supporting the interpretability of the dual-vector design. Code, model, and data are available at https://github.com/YurenHao0426/VARS.
Executive Summary
The article 'User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction' proposes a novel framework, Vector-Adapted Retrieval Scoring (VARS), to represent users with long-term and short-term vectors in a shared preference space. The VARS framework updates these vectors online from weak scalar rewards, enabling personalization without per-user fine-tuning. The authors evaluate VARS on the extsc{MultiSessionCollab} benchmark, demonstrating improved interaction efficiency and matching a strong Reflection baseline in task success. The results highlight the potential of VARS in enhancing conversational LLM agents and supporting the interpretability of dual-vector design. The authors' approach offers a promising solution for addressing the limitations of current LLM agents, which often lack persistent user models, leading to repeated re-statements of preferences across sessions.
Key Points
- ▸ The VARS framework proposes a pipeline-agnostic, frozen-backbone approach to represent users with long-term and short-term vectors.
- ▸ The framework updates vectors online from weak scalar rewards, enabling personalization without per-user fine-tuning.
- ▸ VARS demonstrates improved interaction efficiency and matches a strong Reflection baseline in task success on the extsc{MultiSessionCollab} benchmark.
Merits
Strength in Personalization
VARS enables personalization without per-user fine-tuning, addressing the limitations of current LLM agents that lack persistent user models.
Improved Interaction Efficiency
The VARS framework reduces timeout rate and user effort, making interactions more efficient and user-friendly.
Interpretability
The dual-vector design supports interpretability, with long-term vectors aligning with cross-user preference overlap and short-term vectors capturing session-specific adaptation.
Demerits
Limited Feedback
The VARS framework relies on weak scalar rewards for updating vectors, which may limit its effectiveness in scenarios with limited user feedback.
Dependency on Benchmark
The evaluation of VARS on the extsc{MultiSessionCollab} benchmark may not generalize to other domains or scenarios, highlighting the need for further experimentation.
Expert Commentary
The article 'User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction' makes a significant contribution to the field of conversational AI by proposing a novel framework for user preference modeling. The VARS framework offers a promising solution for addressing the limitations of current LLM agents, which often lack persistent user models. The evaluation on the extsc{MultiSessionCollab} benchmark demonstrates the effectiveness of VARS in improving interaction efficiency and matching a strong Reflection baseline in task success. However, the reliance on weak scalar rewards for updating vectors may limit its effectiveness in scenarios with limited user feedback. Further experimentation and evaluation on diverse domains and scenarios are necessary to fully realize the potential of VARS.
Recommendations
- ✓ Future work should focus on exploring alternative approaches for updating vectors, such as using more robust feedback mechanisms or incorporating multiple feedback sources.
- ✓ The development of VARS should be extended to other domains and scenarios, including areas such as education and customer service, to evaluate its effectiveness and generalizability.
Sources
Original: arXiv - cs.CL