Calibrated Adaptation: Bayesian Stiefel Manifold Priors for Reliable Parameter-Efficient Fine-Tuning
arXiv:2602.17809v1 Announce Type: new Abstract: Parameter-efficient fine-tuning methods such as LoRA enable practical adaptation of large language models but provide no principled uncertainty estimates, leading to poorly calibrated predictions and unreliable behavior under domain shift. We introduce Stiefel-Bayes Adapters (SBA), a Bayesian PEFT framework that places a Matrix Langevin prior over orthonormal adapter factors on the Stiefel manifold $\St$ and performs approximate posterior inference via tangent space Laplace approximation with geodesic retraction. Unlike Gaussian priors in flat space projected onto orthogonality constraints, our prior on the manifold naturally encodes the inductive bias that adapter subspaces should be well conditioned and orthogonal, while the posterior provides calibrated predictive uncertainty without recalibration. We prove formally that the tangent space approximation strictly avoids the structural variance inflation inherent in projecting from ambie
arXiv:2602.17809v1 Announce Type: new Abstract: Parameter-efficient fine-tuning methods such as LoRA enable practical adaptation of large language models but provide no principled uncertainty estimates, leading to poorly calibrated predictions and unreliable behavior under domain shift. We introduce Stiefel-Bayes Adapters (SBA), a Bayesian PEFT framework that places a Matrix Langevin prior over orthonormal adapter factors on the Stiefel manifold $\St$ and performs approximate posterior inference via tangent space Laplace approximation with geodesic retraction. Unlike Gaussian priors in flat space projected onto orthogonality constraints, our prior on the manifold naturally encodes the inductive bias that adapter subspaces should be well conditioned and orthogonal, while the posterior provides calibrated predictive uncertainty without recalibration. We prove formally that the tangent space approximation strictly avoids the structural variance inflation inherent in projecting from ambient space, establishing a rigorous theoretical advantage for intrinsic manifold inference. Across GLUE and SuperGLUE benchmarks on RoBERTa-large, LLaMA-2-7B, LLaMA-2-13B, Mistral-7B, and Qwen2.5-7B, domain shift evaluations, selective prediction protocols, and an abstractive summarization task, SBA achieves task performance comparable to LoRA and DoRA while reducing Expected Calibration Error by 18 to 34\% over deterministic baselines, improving selective prediction AUROC by 12 to 25\% under domain shift, and outperforming deep ensembles of five LoRA models on OOD detection at a fraction of the parameter cost. Our results demonstrate that where you place uncertainty, on the right geometric structure, matters more than simply adding any Bayesian treatment to adapters.
Executive Summary
This article proposes a novel Bayesian framework, Stiefel-Bayes Adapters (SBA), for parameter-efficient fine-tuning of large language models. SBA addresses the issue of poorly calibrated predictions and unreliable behavior under domain shift by utilizing a Matrix Langevin prior over orthonormal adapter factors on the Stiefel manifold. The approach provides calibrated predictive uncertainty without recalibration and outperforms existing methods in various benchmarks. The authors' rigorous theoretical analysis demonstrates a strict avoidance of structural variance inflation inherent in projecting from ambient space. This work highlights the importance of geometric structure in Bayesian treatment of adapters and has significant implications for the development of more reliable and efficient language models.
Key Points
- ▸ Introduction of Stiefel-Bayes Adapters (SBA) as a Bayesian parameter-efficient fine-tuning framework
- ▸ Utilization of Matrix Langevin prior over orthonormal adapter factors on the Stiefel manifold
- ▸ Approximate posterior inference via tangent space Laplace approximation with geodesic retraction
- ▸ Rigorous theoretical analysis demonstrating avoidance of structural variance inflation
Merits
Strength in Theoretical Foundation
The article provides a rigorous theoretical analysis that demonstrates the advantages of intrinsic manifold inference over ambient space projections, establishing a strong foundation for the proposed framework.
Practical Performance
The experimental results demonstrate that SBA achieves task performance comparable to existing methods while reducing Expected Calibration Error and improving selective prediction AUROC.
Novelty and Originality
The article introduces a novel Bayesian framework that incorporates geometric structure, which is an important contribution to the field of parameter-efficient fine-tuning.
Demerits
Limited Generalizability
The experimental results are limited to a specific set of benchmarks and models, and it is unclear whether SBA will generalize to other domains and tasks.
Computational Complexity
The article mentions that the proposed framework requires significant computational resources, which may be a limitation for large-scale applications.
Expert Commentary
The article proposes a novel Bayesian framework that addresses the issue of poorly calibrated predictions and unreliable behavior under domain shift in parameter-efficient fine-tuning. The authors' rigorous theoretical analysis demonstrates a strict avoidance of structural variance inflation inherent in projecting from ambient space. The experimental results demonstrate that SBA achieves task performance comparable to existing methods while reducing Expected Calibration Error and improving selective prediction AUROC. However, the limited generalizability of the experimental results and the computational complexity of the proposed framework are notable limitations. The article's findings have significant implications for the development of more reliable and efficient language models, and it highlights the importance of geometric structure in Bayesian treatment of adapters.
Recommendations
- ✓ Future research should investigate the generalizability of SBA to other domains and tasks.
- ✓ The authors should provide a more detailed analysis of the computational complexity of the proposed framework and explore ways to reduce it.