Academic

Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation

arXiv:2603.09987v1 Announce Type: cross Abstract: Feature Transformation (FT) is a core data-centric AI task that improves feature space quality to advance downstream predictive performance. However, discovering effective transformations remains challenging due to the large space of feature-operator combinations. Existing solutions rely on discrete search or latent generation, but they are frequently limited by sample inefficiency, invalid candidates, and redundant generations with limited coverage. Large Language Models (LLMs) offer strong priors for producing valid transformations, but current LLM-based FT methods typically rely on static demonstrations, resulting in limited diversity, redundant outputs, and weak alignment with downstream objectives. We propose a framework that optimizes context data for LLM-driven FT by evolving trajectory-level experiences in a closed loop. Starting from high-performing feature transportation sequences explored by reinforcement learning, we constr

X
Xinyuan Wang, Kunpeng Liu, Arun Vignesh Malarkkan, Yanjie Fu
· · 1 min read · 10 views

arXiv:2603.09987v1 Announce Type: cross Abstract: Feature Transformation (FT) is a core data-centric AI task that improves feature space quality to advance downstream predictive performance. However, discovering effective transformations remains challenging due to the large space of feature-operator combinations. Existing solutions rely on discrete search or latent generation, but they are frequently limited by sample inefficiency, invalid candidates, and redundant generations with limited coverage. Large Language Models (LLMs) offer strong priors for producing valid transformations, but current LLM-based FT methods typically rely on static demonstrations, resulting in limited diversity, redundant outputs, and weak alignment with downstream objectives. We propose a framework that optimizes context data for LLM-driven FT by evolving trajectory-level experiences in a closed loop. Starting from high-performing feature transportation sequences explored by reinforcement learning, we construct and continuously update an experience library of downstream task-verified transformation trajectories, and use a diversity-aware selector to form contexts along with a chain-of-thought and guide transformed feature generation toward higher performance. Experiments on diverse tabular benchmarks show that our method outperforms classical and LLM-based baselines and is more stable than one-shot generation. The framework generalizes across API-based and open-source LLMs and remains robust across downstream evaluators.

Executive Summary

This article proposes a novel framework for optimizing feature transformation in the context of Large Language Models (LLMs) driven by evolving demonstrations. The framework leverages reinforcement learning to explore high-performing feature transportation sequences and construct an experience library of downstream task-verified transformation trajectories. A diversity-aware selector is employed to form contexts, and a chain-of-thought is used to guide transformed feature generation towards higher performance. Experiments on diverse tabular benchmarks demonstrate that the proposed method outperforms classical and LLM-based baselines, exhibiting improved stability and robustness. The framework's generalizability across various LLMs and downstream evaluators is also demonstrated. This work presents a significant advancement in the field of feature transformation and has the potential to impact various applications where predictive performance is crucial.

Key Points

  • Evolving demonstration optimization framework for LLM-driven feature transformation
  • Reinforcement learning exploration of high-performing feature transportation sequences
  • Diversity-aware selector for context formation and chain-of-thought for guided feature generation
  • Improved performance and stability over classical and LLM-based baselines
  • Generalizability across various LLMs and downstream evaluators

Merits

Strength in Empirical Evaluation

The article presents a comprehensive empirical evaluation of the proposed framework, demonstrating its superiority over classical and LLM-based baselines on diverse tabular benchmarks.

Robustness and Generalizability

The framework's ability to generalize across various LLMs and downstream evaluators is a significant merit, making it a versatile solution for real-world applications.

Demerits

Theoretical Foundations

The article could benefit from a more detailed theoretical analysis of the proposed framework, particularly in terms of its underlying mechanisms and assumptions.

Scalability and Computational Complexity

The article does not provide a detailed analysis of the computational complexity and scalability of the proposed framework, which may be a concern for large-scale applications.

Expert Commentary

The article presents a novel and innovative approach to feature transformation, leveraging the strengths of LLMs and evolving demonstrations. However, a more detailed theoretical analysis and investigation of scalability and computational complexity are necessary to fully evaluate the framework's potential. Additionally, the article's implications for the development of more robust and explainable AI systems warrant further consideration. Overall, the proposed framework is a significant contribution to the field of feature transformation and has the potential to impact various applications.

Recommendations

  • Further investigation of the framework's theoretical foundations and assumptions
  • Detailed analysis of computational complexity and scalability
  • Exploration of the framework's potential for transfer learning and knowledge transfer across different downstream tasks and evaluators

Sources