Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters
arXiv:2604.03388v1 Announce Type: new Abstract: When deploying large language models (LLMs) to safety-critical applications, uncertainty quantification (UQ) is of utmost importance to self-assess the reliability of the LLM-based decisions. However, such decisions typically suffer from overconfidence, particularly after parameter-efficient fine-tuning (PEFT) for downstream domain-specific tasks with limited data. Existing methods to alleviate this issue either rely on Laplace approximation based post-hoc framework, which may yield suboptimal calibration depending on the training trajectory, or variational Bayesian training that requires multiple complete forward passes through the entire LLM backbone at inference time for Monte Carlo estimation, posing scalability challenges for deployment. To address these limitations, we build on the Bayesian last layer (BLL) model, where the LLM-based deterministic feature extractor is followed by random last layer parameters for uncertainty reasoni
arXiv:2604.03388v1 Announce Type: new Abstract: When deploying large language models (LLMs) to safety-critical applications, uncertainty quantification (UQ) is of utmost importance to self-assess the reliability of the LLM-based decisions. However, such decisions typically suffer from overconfidence, particularly after parameter-efficient fine-tuning (PEFT) for downstream domain-specific tasks with limited data. Existing methods to alleviate this issue either rely on Laplace approximation based post-hoc framework, which may yield suboptimal calibration depending on the training trajectory, or variational Bayesian training that requires multiple complete forward passes through the entire LLM backbone at inference time for Monte Carlo estimation, posing scalability challenges for deployment. To address these limitations, we build on the Bayesian last layer (BLL) model, where the LLM-based deterministic feature extractor is followed by random last layer parameters for uncertainty reasoning. Since existing low-rank adapters (LoRA) for PEFT have limited expressiveness due to rank collapse, we address this with Polar-decomposed Low-rank Adapter Representation (PoLAR), an orthogonalized parameterization paired with Riemannian optimization to enable more stable and expressive adaptation. Building on this PoLAR-BLL model, we leverage the variational (V) inference framework to put forth a scalable Bayesian fine-tuning approach which jointly seeks the PoLAR parameters and approximate posterior of the last layer parameters via alternating optimization. The resulting PoLAR-VBLL is a flexible framework that nicely integrates architecture-enhanced optimization with scalable Bayesian inference to endow LLMs with well-calibrated UQ. Our empirical results verify the effectiveness of PoLAR-VBLL in terms of generalization and uncertainty estimation on both in-distribution and out-of-distribution data for various common-sense reasoning tasks.
Executive Summary
This article presents a novel approach to scalable variational Bayesian fine-tuning of large language models (LLMs) via orthogonalized low-rank adapters. The proposed method, PoLAR-VBLL, addresses limitations in existing uncertainty quantification (UQ) methods by integrating architecture-enhanced optimization with scalable Bayesian inference. PoLAR-VBLL uses a Polar-decomposed Low-rank Adapter Representation (PoLAR) paired with Riemannian optimization, enabling more stable and expressive adaptation. The approach jointly seeks the PoLAR parameters and approximate posterior of the last layer parameters via alternating optimization. Empirical results demonstrate the effectiveness of PoLAR-VBLL in terms of generalization and uncertainty estimation on various common-sense reasoning tasks.
Key Points
- ▸ The article proposes a novel approach to scalable variational Bayesian fine-tuning of LLMs.
- ▸ PoLAR-VBLL integrates architecture-enhanced optimization with scalable Bayesian inference.
- ▸ The method uses a Polar-decomposed Low-rank Adapter Representation (PoLAR) paired with Riemannian optimization.
Merits
Strength
The proposed method addresses limitations in existing UQ methods, enabling more stable and expressive adaptation.
Scalability
PoLAR-VBLL is designed to be scalable, making it suitable for deployment in safety-critical applications.
Flexibility
The method integrates architecture-enhanced optimization with scalable Bayesian inference, making it a flexible framework.
Demerits
Limitation
The article assumes a deterministic feature extractor, which may not be suitable for all applications.
Complexity
The proposed method may be computationally expensive due to the use of Riemannian optimization.
Expert Commentary
The article presents a novel and promising approach to scalable variational Bayesian fine-tuning of LLMs. The use of PoLAR and Riemannian optimization enables more stable and expressive adaptation, addressing limitations in existing UQ methods. However, the article assumes a deterministic feature extractor, which may not be suitable for all applications. Additionally, the proposed method may be computationally expensive due to the use of Riemannian optimization. Nevertheless, the article's findings have significant implications for the development of safety-critical systems that rely on LLMs, highlighting the need for more robust and scalable uncertainty quantification methods.
Recommendations
- ✓ Future research should investigate the applicability of the proposed method to other types of feature extractors, including non-deterministic ones.
- ✓ The development of more efficient optimization algorithms for Riemannian optimization could further improve the scalability of the proposed method.
Sources
Original: arXiv - cs.LG