Academic

ViTaB-A: Evaluating Multimodal Large Language Models on Visual Table Attribution

arXiv:2602.15769v1 Announce Type: new Abstract: Multimodal Large Language Models (mLLMs) are often used to answer questions in structured data such as tables in Markdown, JSON, and images. While these models can often give correct answers, users also need to know where those answers come from. In this work, we study structured data attribution/citation, which is the ability of the models to point to the specific rows and columns that support an answer. We evaluate several mLLMs across different table formats and prompting strategies. Our results show a clear gap between question answering and evidence attribution. Although question answering accuracy remains moderate, attribution accuracy is much lower, near random for JSON inputs, across all models. We also find that models are more reliable at citing rows than columns, and struggle more with textual formats than images. Finally, we observe notable differences across model families. Overall, our findings show that current mLLMs are u

Yahia Alqurnawi, Preetom Biswas, Anmol Rao, Tejas Anvekar, Chitta Baral, Vivek Gupta · February 19, 2026 · 1 min read · 5 views

#cs.CL

Executive Summary

This article evaluates the ability of Multimodal Large Language Models (mLLMs) to attribute answers to specific rows and columns in structured data, specifically tables. The study reveals a significant gap between question answering accuracy and attribution accuracy, with the latter being near random for JSON inputs across all models. The results suggest that current mLLMs are unreliable in providing fine-grained, trustworthy attribution, limiting their usage in applications requiring transparency and traceability. The study highlights differences in attribution accuracy across model families and table formats, with textual formats being more challenging than images. The findings have important implications for the development and deployment of mLLMs in applications where attribution is crucial.

Key Points

▸ Multimodal Large Language Models (mLLMs) struggle with attribution in structured data
▸ Attribution accuracy is near random for JSON inputs across all models
▸ Differences in attribution accuracy across model families and table formats exist

Merits

Contribution to the field

The study provides valuable insights into the limitations of mLLMs in structured data attribution, highlighting the need for further research and development in this area.

Methodological rigor

The study employs a systematic evaluation approach, assessing multiple mLLMs across different table formats and prompting strategies, ensuring robust results.

Demerits

Limited scope

The study focuses primarily on attribution accuracy, neglecting other important aspects of mLLMs, such as their ability to reason and generate text.

Lack of generalizability

The results may not be generalizable to other types of structured data, such as graphs or networks.

Expert Commentary

The study's findings on the limitations of mLLMs in structured data attribution are significant and warrant further investigation. The results demonstrate the need for developing techniques that can provide transparent and interpretable results, such as Explainable AI methods. The study's implications for the development and deployment of mLLMs in applications requiring transparency and traceability are substantial, highlighting the importance of considering the limitations of these models in critical applications.

Recommendations

✓ Future research should focus on developing Explainable AI techniques to improve the transparency and trustworthiness of mLLMs.
✓ Developers and deployers of mLLMs should carefully evaluate the limitations of these models in attribution and consider alternative approaches to ensure transparency and traceability in critical applications.

Sources

arXiv - cs.CL

Something extraordinary is coming.

ViTaB-A: Evaluating Multimodal Large Language Models on Visual Table Attribution

AI Commentary

Executive Summary

Key Points

Merits

Contribution to the field

Methodological rigor

Demerits

Limited scope

Lack of generalizability

Expert Commentary

Recommendations

Sources

Related Articles

How Large Language Models Get Stuck: Early structure with persistent …

Distribution-Aware Companding Quantization of Large Language Models

Policy Compliance of User Requests in Natural Language for AI …

LLM-Bootstrapped Targeted Finding Guidance for Factual MLLM-based Medical Report Generation

JCG, PC

HSOLLC Co., Ltd.