Academic

Controllable and explainable personality sliders for LLMs at inference time

arXiv:2603.03326v1 Announce Type: cross Abstract: Aligning Large Language Models (LLMs) with specific personas typically relies on expensive and monolithic Supervised Fine-Tuning (SFT) or RLHF. While effective, these methods require training distinct models for every target personality profile. Inference-time activation steering offers a parameter-efficient alternative, yet naive approaches fail to control multiple traits simultaneously due to destructive vector interference. In this work, we propose a modular framework for continuous, multi-dimensional personality control. Our key innovation is Sequential Adaptive Steering (SAS): a method that orthogonalizes steering vectors by training subsequent probes on the residual stream shifted by prior interventions. This approach transforms steering vectors into reusable primitives, allowing users to instantly synthesize complex, high-fidelity personality profiles by simply adjusting coefficients alpha. We validate our framework on the Big F

Florian Hoppe, David Khachaturov, Robert Mullins, Mark Huasong Meng · March 6, 2026 · 1 min read · 20 views

#cs.CL #cs.AI

Executive Summary

This article proposes a novel framework for controlling Large Language Models (LLMs) at inference time, enabling the synthesis of complex personality profiles. The framework, called Sequential Adaptive Steering (SAS), leverages a modular approach to orthogonalize steering vectors, allowing for precise and holistic personality modulation without updating model parameters. The method outperforms naive baselines in both goal adherence and coherence, demonstrating its potential as a parameter-efficient alternative to traditional Supervised Fine-Tuning (SFT) or Reinforcement Learning from Human Feedback (RLHF). The framework's ability to control multiple traits simultaneously and its high-fidelity personality profiles make it an attractive solution for applications requiring nuanced and adaptable language generation.

Key Points

▸ The article proposes a novel framework for controlling LLMs at inference time
▸ The framework, called Sequential Adaptive Steering (SAS), uses a modular approach to orthogonalize steering vectors
▸ SAS outperforms naive baselines in both goal adherence and coherence

Merits

Parameter-Efficient Approach

SAS offers a parameter-efficient alternative to traditional SFT or RLHF, reducing the computational costs associated with training distinct models for every target personality profile.

Modular and Reusable Steering Vectors

The use of orthogonalized steering vectors enables the synthesis of complex personality profiles by simply adjusting coefficients alpha, making the framework highly adaptable and flexible.

Demerits

Limited Evaluation Scope

The article primarily evaluates the framework's performance on the Big Five personality traits, limiting its generalizability to other personality dimensions or applications.

Lack of Interpretability

The article does not provide a thorough analysis of the interpretability of the steering vectors or the coefficients alpha, which may hinder the understanding and trustworthiness of the framework's outputs.

Expert Commentary

While the article presents a novel and promising approach to controlling LLMs at inference time, its limitations and potential vulnerabilities highlight the need for further research and development. The author's decision to focus on the Big Five personality traits may have limited the framework's generalizability, and the lack of interpretability analysis may hinder the understanding and trustworthiness of the framework's outputs. Nonetheless, the SAS framework demonstrates significant potential as a parameter-efficient and adaptable solution for applications requiring nuanced and adaptable language generation.

Recommendations

✓ Future research should focus on evaluating the framework's performance on a broader range of personality dimensions and applications, as well as addressing its potential vulnerabilities to adversarial attacks.
✓ The author should provide a more thorough analysis of the interpretability of the steering vectors and coefficients alpha to enhance the framework's understanding and trustworthiness.

Sources

arXiv - cs.AI

Controllable and explainable personality sliders for LLMs at inference time

AI Commentary

Executive Summary

Key Points

Merits

Parameter-Efficient Approach

Modular and Reusable Steering Vectors

Demerits

Limited Evaluation Scope

Lack of Interpretability

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs