Academic

MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

arXiv:2603.02221v1 Announce Type: new Abstract: In healthcare tabular predictions, classical models with feature engineering often outperform neural approaches. Recent advances in Large Language Models enable the integration of domain knowledge into feature engineering, offering a promising direction. However, existing approaches typically rely on a broad search over predefined transformations, overlooking downstream model characteristics and feature importance signals. We present MedFeat, a feedback-driven and model-aware feature engineering framework that leverages LLM reasoning with domain knowledge and provides feature explanations based on SHAP values while tracking successful and failed proposals to guide feature discovery. By incorporating model awareness, MedFeat prioritizes informative signals that are difficult for the downstream model to learn directly due to its characteristics. Across a broad range of clinical prediction tasks, MedFeat achieves stable improvements over va

Zizheng Zhang, Yiming Li, Justin Xu, Jinyu Wang, Rui Wang, Lei Song, Jiang Bian, David W Eyre, Jingjing Fu · March 5, 2026 · 1 min read · 2 views

#cs.LG #cs.AI

Executive Summary

This article presents MedFeat, a novel feature engineering framework that leverages Large Language Models (LLMs) to improve clinical tabular predictions. MedFeat is a feedback-driven, model-aware approach that integrates domain knowledge and provides feature explanations based on SHAP values. Compared to existing methods, MedFeat achieves stable improvements over various baselines and discovers clinically meaningful features that generalize under distribution shift. The framework's robustness across different clinical prediction tasks and datasets demonstrates its potential for real-world deployment. The authors will release the code and datasets used in their experiments, subject to certain agreements and policies. Overall, MedFeat is a significant advancement in the field of clinical tabular predictions, offering insights into the integration of LLMs and feature engineering for improved healthcare outcomes.

Key Points

▸ MedFeat is a model-aware and explainability-driven feature engineering framework that leverages LLMs for clinical tabular predictions.
▸ The framework integrates domain knowledge and provides feature explanations based on SHAP values.
▸ MedFeat achieves stable improvements over various baselines and discovers clinically meaningful features that generalize under distribution shift.

Merits

Strength in Model Awareness

MedFeat's model-aware approach enables it to prioritize informative signals that are difficult for the downstream model to learn directly, resulting in more accurate predictions.

Explainability through SHAP Values

The use of SHAP values provides feature explanations that help clinicians understand the importance of each feature in the prediction model, leading to more informed decision-making.

Robustness and Generalizability

MedFeat's ability to generalize under distribution shift and perform well across different clinical prediction tasks and datasets demonstrates its robustness and potential for real-world deployment.

Demerits

Limited Scope and Diversity of Datasets

The study's focus on a limited range of clinical prediction tasks and datasets may not fully generalize to other healthcare domains and datasets.

Dependence on LLMs and Domain Knowledge

MedFeat's reliance on LLMs and domain knowledge may limit its applicability to domains with limited access to such resources or expertise.

Potential for Overfitting and Over-Engineering

The feedback-driven approach of MedFeat may lead to overfitting or over-engineering if not carefully monitored and controlled.

Expert Commentary

The article presents a novel and promising approach to feature engineering in clinical tabular predictions, leveraging the capabilities of LLMs to integrate domain knowledge and provide feature explanations. While the study's results are encouraging, the potential limitations and challenges of MedFeat, such as dependence on LLMs and domain knowledge, need to be carefully considered. The article's contributions to the discussion on AI and clinical decision-making, explainability in AI-driven predictions, and domain adaptation and generalizability in AI are significant and timely. Overall, MedFeat is a valuable addition to the field of clinical tabular predictions, offering insights into the integration of LLMs and feature engineering for improved healthcare outcomes.

Recommendations

✓ Future research should focus on extending the scope and diversity of datasets used in MedFeat to better understand its generalizability across different healthcare domains and datasets.
✓ The development of more robust and interpretable feature engineering frameworks, such as MedFeat, is essential for the effective integration of AI and clinical decision-making in healthcare.

Sources

arXiv - cs.LG

MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

AI Commentary

Executive Summary

Key Points

Merits

Strength in Model Awareness

Explainability through SHAP Values

Robustness and Generalizability

Demerits

Limited Scope and Diversity of Datasets

Dependence on LLMs and Domain Knowledge

Potential for Overfitting and Over-Engineering

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs