Academic

Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment

Soham Gadgil, Chris Lin, Su-In Lee · April 7, 2026 · 1 min read · 21 views

#cs.LG

arXiv:2604.03867v1 Announce Type: new Abstract: Steering vectors have emerged as a lightweight and effective approach for aligning large language models (LLMs) at inference time, enabling modulation over model behaviors by shifting LLM representations towards a target behavior. However, existing methods typically apply steering vectors at a globally fixed layer, implicitly assuming that the optimal intervention layer is invariant across inputs. We argue that this assumption is fundamentally limited, as representations relevant to a target behavior can be encoded at different layers depending on the input. Theoretically, we show that different inputs can require steering at different layers to achieve alignment with a desirable model behavior. We also provide empirical evidence that the optimal steering layer varies substantially across inputs in practice. Motivated by these observations, we introduce Where to Steer (W2S), a framework that adaptively selects the intervention layer conditioned on the input, by learning a mapping from input embeddings to optimal steering layers. Across multiple LLMs and alignment behaviors, W2S consistently outperforms fixed-layer baselines, with improvements in both in-distribution and out-of-distribution settings. Our findings highlight the importance of input-dependent control in LLM alignment and demonstrate that adaptive layer selection is a key design dimension missing in the current methodology of steering vectors.

Executive Summary

This article presents Where to Steer (W2S), a novel framework for input-dependent layer selection in steering large language models (LLMs). By learning a mapping from input embeddings to optimal steering layers, W2S adaptively selects the intervention layer to achieve alignment with a desirable model behavior. The authors demonstrate that W2S consistently outperforms fixed-layer baselines across multiple LLMs and alignment behaviors, with improvements in both in-distribution and out-of-distribution settings. This research highlights the importance of input-dependent control in LLM alignment and underscores the need for adaptive layer selection in steering vectors.

Key Points

▸ W2S is a framework that adaptively selects the intervention layer conditioned on the input
▸ The optimal steering layer varies substantially across inputs in practice
▸ W2S consistently outperforms fixed-layer baselines across multiple LLMs and alignment behaviors

Merits

Strength in Theoretical Foundations

The authors provide a solid theoretical foundation for W2S by demonstrating that different inputs can require steering at different layers to achieve alignment with a desirable model behavior.

Strength in Empirical Evidence

The authors provide empirical evidence that W2S consistently outperforms fixed-layer baselines across multiple LLMs and alignment behaviors.

Demerits

Limitation in Scalability

The authors do not thoroughly address the scalability of W2S, which may pose challenges in large-scale applications.

Limitation in Generalizability

The authors focus on a specific type of LLMs and alignment behaviors, which may limit the generalizability of W2S to other settings.

Expert Commentary

The article presents a significant contribution to the field of LLM alignment by introducing W2S, a novel framework for input-dependent layer selection. The authors' emphasis on adaptive control and the empirical evidence supporting W2S's effectiveness make this research compelling. However, the limitations in scalability and generalizability should be addressed in future work. Furthermore, the policy implications of this research warrant further exploration.

Recommendations

✓ Future research should investigate the scalability of W2S and explore ways to adapt the framework to large-scale applications
✓ The authors should expand their analysis to include a broader range of LLMs and alignment behaviors to enhance the generalizability of W2S

Sources

Original: arXiv - cs.LG

arXiv - cs.LG

Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment

AI Commentary

Executive Summary

Key Points

Merits

Strength in Theoretical Foundations

Strength in Empirical Evidence

Demerits

Limitation in Scalability

Limitation in Generalizability

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs