Academic

Many Preferences, Few Policies: Towards Scalable Language Model Personalization

arXiv:2604.04144v1 Announce Type: new Abstract: The holy grail of LLM personalization is a single LLM for each user, perfectly aligned with that user's preferences. However, maintaining a separate LLM per user is impractical due to constraints on compute, memory, and system complexity. We address this challenge by developing a principled method for selecting a small portfolio of LLMs that captures representative behaviors across heterogeneous users. We model user preferences across multiple traits (e.g., safety, humor, brevity) through a multi-dimensional weight vector. Given reward functions across these dimensions, our algorithm PALM (Portfolio of Aligned LLMs) generates a small portfolio of LLMs such that, for any weight vector, the portfolio contains a near-optimal LLM for the corresponding scalarized objective. To the best of our knowledge, this is the first result that provides theoretical guarantees on both the size and approximation quality of LLM portfolios for personalizatio

arXiv:2604.04144v1 Announce Type: new Abstract: The holy grail of LLM personalization is a single LLM for each user, perfectly aligned with that user's preferences. However, maintaining a separate LLM per user is impractical due to constraints on compute, memory, and system complexity. We address this challenge by developing a principled method for selecting a small portfolio of LLMs that captures representative behaviors across heterogeneous users. We model user preferences across multiple traits (e.g., safety, humor, brevity) through a multi-dimensional weight vector. Given reward functions across these dimensions, our algorithm PALM (Portfolio of Aligned LLMs) generates a small portfolio of LLMs such that, for any weight vector, the portfolio contains a near-optimal LLM for the corresponding scalarized objective. To the best of our knowledge, this is the first result that provides theoretical guarantees on both the size and approximation quality of LLM portfolios for personalization. It characterizes the trade-off between system cost and personalization, as well as the diversity of LLMs required to cover the landscape of user preferences. We provide empirical results that validate these guarantees and demonstrate greater output diversity over common baselines.

Executive Summary

This article presents a novel approach to language model (LM) personalization, addressing the challenge of maintaining a separate LM for each user. The proposed method, PALM, develops a principled method for selecting a small portfolio of LMs that captures representative behaviors across heterogeneous users. By modeling user preferences through a multi-dimensional weight vector, PALM generates a near-optimal LM for any given weight vector. The authors provide theoretical guarantees on both the size and approximation quality of LM portfolios, characterizing the trade-off between system cost and personalization. Empirical results validate these guarantees, demonstrating greater output diversity over common baselines. This work has significant implications for the development of scalable and efficient LM personalization systems.

Key Points

  • PALM addresses the challenge of maintaining a separate LM for each user by developing a principled method for selecting a small portfolio of LMs.
  • The proposed method models user preferences through a multi-dimensional weight vector and generates a near-optimal LM for any given weight vector.
  • The authors provide theoretical guarantees on both the size and approximation quality of LM portfolios, characterizing the trade-off between system cost and personalization.

Merits

Strength

The proposed method provides theoretical guarantees on both the size and approximation quality of LM portfolios, addressing a significant challenge in LM personalization.

Strength

The empirical results demonstrate greater output diversity over common baselines, highlighting the practical effectiveness of the proposed method.

Demerits

Limitation

The proposed method assumes a multi-dimensional weight vector for modeling user preferences, which may not capture the complexity of real-world user behavior.

Limitation

The empirical results are based on a limited dataset and may not generalize to larger or more diverse user populations.

Expert Commentary

The proposed method, PALM, is a significant contribution to the field of LM personalization, addressing a major challenge in the development of efficient and scalable personalization systems. By providing theoretical guarantees on both the size and approximation quality of LM portfolios, PALM offers a principled approach to characterizing the trade-off between system cost and personalization. The empirical results demonstrate the practical effectiveness of the proposed method, highlighting its potential for deployment in a wide range of applications. However, the proposed method assumes a multi-dimensional weight vector for modeling user preferences, which may not capture the complexity of real-world user behavior. Additionally, the empirical results are based on a limited dataset and may not generalize to larger or more diverse user populations. Despite these limitations, PALM is a valuable contribution to the field of LM personalization, and its implications for practical and policy decisions are significant.

Recommendations

  • Further research is needed to explore the applicability of PALM to real-world user populations and to develop more sophisticated models of user behavior.
  • The proposed method should be extended to include additional features, such as handling missing or noisy data, to improve its robustness and scalability.

Sources

Original: arXiv - cs.CL