GIST: Targeted Data Selection for Instruction Tuning via Coupled Optimization Geometry
arXiv:2602.18584v1 Announce Type: new Abstract: Targeted data selection has emerged as a crucial paradigm for efficient instruction tuning, aiming to identify a small yet influential subset of training examples for a specific target task. In practice, influence is often measured through the effect of an example on parameter updates. To make selection scalable, many approaches leverage optimizer statistics (e.g., Adam states) as an axis-aligned surrogate for update geometry (i.e., diagonal precondition), implicitly treating parameters as coordinate-wise independent. We show that this assumption breaks down in parameter-efficient fine-tuning (PEFT) methods such as LoRA. In this setting, the induced optimization geometry exhibits strong cross-parameter coupling with non-trivial off-diagonal interactions, while the task-relevant update directions are confined to a low-dimensional subspace. Motivated by this mismatch, we propose GIST (Gradient Isometric Subspace Transformation), a simple y
arXiv:2602.18584v1 Announce Type: new Abstract: Targeted data selection has emerged as a crucial paradigm for efficient instruction tuning, aiming to identify a small yet influential subset of training examples for a specific target task. In practice, influence is often measured through the effect of an example on parameter updates. To make selection scalable, many approaches leverage optimizer statistics (e.g., Adam states) as an axis-aligned surrogate for update geometry (i.e., diagonal precondition), implicitly treating parameters as coordinate-wise independent. We show that this assumption breaks down in parameter-efficient fine-tuning (PEFT) methods such as LoRA. In this setting, the induced optimization geometry exhibits strong cross-parameter coupling with non-trivial off-diagonal interactions, while the task-relevant update directions are confined to a low-dimensional subspace. Motivated by this mismatch, we propose GIST (Gradient Isometric Subspace Transformation), a simple yet principled alternative that replaces axis-aligned scaling with robust subspace alignment. GIST recovers a task-specific subspace from validation gradients via spectral filtering (SVD), projects training gradients into this coupled subspace, and scores examples by their alignment with target directions.Extensive experiments have demonstrated that GIST matches or outperforms the state-of-the-art baseline with only 0.29% of the storage and 25% of the computational time under the same selection budget.
Executive Summary
This article proposes GIST, a novel targeted data selection method for instruction tuning, which addresses the limitations of existing approaches that treat parameters as coordinate-wise independent. GIST leverages spectral filtering to recover a task-specific subspace from validation gradients, projecting training gradients into this coupled subspace and scoring examples by their alignment with target directions. Experimental results demonstrate that GIST matches or outperforms state-of-the-art baselines with significantly reduced storage and computational time requirements. The proposed method has the potential to streamline instruction tuning in parameter-efficient fine-tuning (PEFT) methods, such as LoRA, by enabling more efficient targeted data selection.
Key Points
- ▸ GIST addresses the limitations of existing targeted data selection methods by incorporating task-specific subspace alignment.
- ▸ The proposed method leverages spectral filtering to recover a coupled subspace from validation gradients.
- ▸ GIST demonstrates improved efficiency and effectiveness compared to state-of-the-art baselines.
Merits
Strength
GIST's ability to adapt to task-specific optimization geometries enables more accurate and efficient targeted data selection.
Robustness
The proposed method's reliance on spectral filtering and subspace alignment enhances its robustness to variations in optimization geometries.
Demerits
Limitation
The computational overhead of spectral filtering and subspace alignment may be significant for large-scale datasets.
Assumptions
GIST assumes that task-relevant update directions are confined to a low-dimensional subspace, which may not hold for all optimization geometries.
Expert Commentary
The article presents a timely and innovative contribution to the field of instruction tuning, addressing a critical limitation of existing targeted data selection methods. By incorporating task-specific subspace alignment, GIST demonstrates improved efficiency and effectiveness, making it a compelling solution for large-scale applications. However, the method's reliance on spectral filtering and subspace alignment may introduce computational overhead, which should be carefully evaluated in real-world deployments. The proposed approach also raises important questions about the role of optimization geometry in instruction tuning and the potential for more adaptive and informed strategies.
Recommendations
- ✓ Future research should investigate the application of GIST to other parameter-efficient fine-tuning methods and its potential for extension to more general optimization geometries.
- ✓ Careful evaluation of the computational overhead and assumptions underlying GIST is essential to ensure its successful deployment in real-world applications.