Academic

Language Model Representations for Efficient Few-Shot Tabular Classification

Inwon Kang, Parikshit Ram, Yi Zhou, Horst Samulowitz, Oshani Seneviratne · February 23, 2026 · 1 min read · 3 views

#cs.CL #cs.AI

arXiv:2602.15844v1 Announce Type: cross Abstract: The Web is a rich source of structured data in the form of tables, from product catalogs and knowledge bases to scientific datasets. However, the heterogeneity of the structure and semantics of these tables makes it challenging to build a unified method that can effectively leverage the information they contain. Meanwhile, Large language models (LLMs) are becoming an increasingly integral component of web infrastructure for tasks like semantic search. This raises a crucial question: can we leverage these already-deployed LLMs to classify structured data in web-native tables (e.g., product catalogs, knowledge base exports, scientific data portals), avoiding the need for specialized models or extensive retraining? This work investigates a lightweight paradigm, $\textbf{Ta}$ble $\textbf{R}$epresentation with $\textbf{L}$anguage Model~($\textbf{TaRL}$), for few-shot tabular classification that directly utilizes semantic embeddings of individual table rows. We first show that naive application of these embeddings underperforms compared to specialized tabular models. We then demonstrate that their potentials can be unlocked with two key techniques: removing the common component from all embeddings and calibrating the softmax temperature. We show that a simple meta-learner, trained on handcrafted features, can learn to predict an appropriate temperature. This approach achieves performance comparable to state-of-the-art models in low-data regimes ($k \leq 32$) of semantically-rich tables. Our findings demonstrate the viability of reusing existing LLM infrastructure for efficient semantics-driven pathway to reuse existing LLM infrastructure for Web table understanding.

Executive Summary

The article proposes a novel approach, TaRL, for few-shot tabular classification using large language models. It demonstrates that semantic embeddings of individual table rows can be utilized for efficient classification, achieving comparable performance to state-of-the-art models in low-data regimes. The method involves removing common components from embeddings and calibrating softmax temperature, with a simple meta-learner predicting the optimal temperature. This approach enables the reuse of existing language model infrastructure for web table understanding, offering a lightweight and efficient solution for tabular classification.

Key Points

▸ Large language models can be leveraged for few-shot tabular classification
▸ Semantic embeddings of individual table rows are utilized for classification
▸ Removing common components and calibrating softmax temperature improves performance

Merits

Efficient Use of Existing Infrastructure

The proposed approach enables the reuse of existing language model infrastructure, reducing the need for specialized models or extensive retraining.

Improved Performance in Low-Data Regimes

TaRL achieves comparable performance to state-of-the-art models in low-data regimes, making it a viable solution for scenarios with limited training data.

Demerits

Limited Applicability to Complex Tables

The approach may not perform well on complex tables with diverse structures and semantics, requiring further adaptation or extension.

Dependence on Handcrafted Features

The meta-learner relies on handcrafted features, which may limit its applicability to scenarios with limited domain knowledge or expertise.

Expert Commentary

The article presents a compelling case for the use of large language models in few-shot tabular classification. By leveraging semantic embeddings and calibrating softmax temperature, the proposed approach achieves impressive results in low-data regimes. However, further research is needed to address the limitations of the approach, such as its dependence on handcrafted features and potential vulnerability to complex tables. Nevertheless, the study contributes significantly to the ongoing discussion on table understanding and representation, highlighting the potential of language models in this context. As the field continues to evolve, it will be essential to explore the applicability of this approach to various domains and tasks, as well as its potential integration with other machine learning techniques.

Recommendations

✓ Further investigation into the applicability of the approach to complex tables and diverse domains
✓ Exploration of alternative feature extraction methods to reduce dependence on handcrafted features

Sources

arXiv - cs.AI

Something extraordinary is coming.

Language Model Representations for Efficient Few-Shot Tabular Classification

AI Commentary

Executive Summary

Key Points

Merits

Efficient Use of Existing Infrastructure

Improved Performance in Low-Data Regimes

Demerits

Limited Applicability to Complex Tables

Dependence on Handcrafted Features

Expert Commentary

Recommendations

Sources

Related Articles

Humans and LLMs Diverge on Probabilistic Inferences

France or Spain or Germany or France: A Neural Account …

Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of …

JCG, PC

HSOLLC Co., Ltd.