Academic

HumanLM: Simulating Users with State Alignment Beats Response Imitation

arXiv:2603.03303v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used to simulate how specific users respond to a given context, enabling more user-centric applications that rely on user feedback. However, existing user simulators mostly imitate surface-level patterns and language styles, which fail to reflect the underlying states of real users (e.g., beliefs and emotions). To address these limitations, we propose a novel training framework, HumanLM, which builds user simulators that accurately reflect real users. Our key insight is that, in addition to generating responses, the model should generate natural-language latent states that align with ground-truth responses through reinforcement learning. These latent states correspond to a set of psychologically grounded state dimensions that drive how real users respond. HumanLM further synthesizes these aligned latent states into responses that accurately represent real users. For extensive evaluation, we

arXiv:2603.03303v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used to simulate how specific users respond to a given context, enabling more user-centric applications that rely on user feedback. However, existing user simulators mostly imitate surface-level patterns and language styles, which fail to reflect the underlying states of real users (e.g., beliefs and emotions). To address these limitations, we propose a novel training framework, HumanLM, which builds user simulators that accurately reflect real users. Our key insight is that, in addition to generating responses, the model should generate natural-language latent states that align with ground-truth responses through reinforcement learning. These latent states correspond to a set of psychologically grounded state dimensions that drive how real users respond. HumanLM further synthesizes these aligned latent states into responses that accurately represent real users. For extensive evaluation, we develop Humanual, a comprehensive benchmark for simulating real users based on public data. Humanual consists of six large-scale datasets with 26k users and 216k responses in total, spanning diverse tasks such as generating user responses to daily life issues, political blogs, and chat sessions with LLM assistants. Across datasets, HumanLM significantly outperforms alternative approaches, achieving an average relative improvement of 16.3% in alignment scores from an LLM judge. In a real-time simulation study with 111 participants, HumanLM achieves the highest similarity to real user responses and competitive human-likeness scores.

Executive Summary

This article proposes HumanLM, a novel training framework for user simulators that accurately reflect real users by aligning latent states with ground-truth responses through reinforcement learning. The framework significantly outperforms alternative approaches in extensive evaluation, achieving an average relative improvement of 16.3% in alignment scores. HumanLM demonstrates high similarity to real user responses and competitive human-likeness scores in a real-time simulation study with 111 participants. The framework has implications for user-centric applications and can be used to improve user experience and engagement. However, further research is needed to explore the limitations and potential biases of HumanLM. This study contributes to the development of more sophisticated user simulators and has the potential to revolutionize the field of human-computer interaction.

Key Points

  • HumanLM proposes a novel training framework for user simulators that aligns latent states with ground-truth responses
  • The framework significantly outperforms alternative approaches in extensive evaluation
  • HumanLM demonstrates high similarity to real user responses and competitive human-likeness scores in a real-time simulation study

Merits

Strength in addressing limitations of existing user simulators

HumanLM addresses the limitations of existing user simulators by aligning latent states with ground-truth responses, enabling the development of more sophisticated user simulators.

Significant improvement in alignment scores

HumanLM achieves an average relative improvement of 16.3% in alignment scores from an LLM judge, demonstrating its effectiveness in simulating real users.

High similarity to real user responses

HumanLM demonstrates high similarity to real user responses in a real-time simulation study with 111 participants, indicating its potential to improve user experience and engagement.

Demerits

Potential biases in latent state generation

The study assumes that latent states generated by HumanLM accurately reflect real user states, but the potential biases and limitations of this approach are not fully explored.

Limited generalizability to diverse user populations

The study focuses on a specific set of user populations and tasks, and it is unclear how well HumanLM will perform with more diverse user populations.

Expert Commentary

The article proposes a novel training framework for user simulators that addresses the limitations of existing approaches. The framework's significant improvement in alignment scores and high similarity to real user responses demonstrate its potential to improve user experience and engagement. However, further research is needed to explore the potential biases and limitations of HumanLM, particularly in terms of its generalizability to diverse user populations. Additionally, the study highlights the need for more sophisticated user simulators that can accurately reflect real user states and behaviors, which has implications for policy and practice in a variety of applications.

Recommendations

  • Further research is needed to explore the potential biases and limitations of HumanLM.
  • The development of more sophisticated user simulators should be a priority in the field of human-computer interaction.

Sources