Academic

Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing

arXiv:2603.06503v1 Announce Type: new Abstract: Recent advances in multimodal Retrieval-Augmented Generation (RAG) enable Large Language Models (LLMs) to analyze enterprise spreadsheet workbooks containing millions of cells, cross-sheet dependencies, and embedded visual artifacts. However, state-of-the-art approaches exclude critical context through single-pass retrieval, lose data resolution through compression, and exceed LLM context windows through naive full-context injection, preventing reliable multi-step reasoning over complex enterprise workbooks. We introduce Beyond Rows to Reasoning (BRTR), a multimodal agentic framework for spreadsheet understanding that replaces single-pass retrieval with an iterative tool-calling loop, supporting end-to-end Excel workflows from complex analysis to structured editing. Supported by over 200 hours of expert human evaluation, BRTR achieves state-of-the-art performance across three frontier spreadsheet understanding benchmarks, surpassing prio

A
Anmol Gulati, Sahil Sen, Waqar Sarguroh, Kevin Paul
· · 1 min read · 17 views

arXiv:2603.06503v1 Announce Type: new Abstract: Recent advances in multimodal Retrieval-Augmented Generation (RAG) enable Large Language Models (LLMs) to analyze enterprise spreadsheet workbooks containing millions of cells, cross-sheet dependencies, and embedded visual artifacts. However, state-of-the-art approaches exclude critical context through single-pass retrieval, lose data resolution through compression, and exceed LLM context windows through naive full-context injection, preventing reliable multi-step reasoning over complex enterprise workbooks. We introduce Beyond Rows to Reasoning (BRTR), a multimodal agentic framework for spreadsheet understanding that replaces single-pass retrieval with an iterative tool-calling loop, supporting end-to-end Excel workflows from complex analysis to structured editing. Supported by over 200 hours of expert human evaluation, BRTR achieves state-of-the-art performance across three frontier spreadsheet understanding benchmarks, surpassing prior methods by 25 percentage points on FRTR-Bench, 7 points on SpreadsheetLLM, and 32 points on FINCH. We evaluate five multimodal embedding models, identifying NVIDIA NeMo Retriever 1B as the top performer for mixed tabular and visual data, and vary nine LLMs. Ablation experiments confirm that the planner, retrieval, and iterative reasoning each contribute substantially, and cost analysis shows GPT-5.2 achieves the best efficiency-accuracy trade-off. Throughout all evaluations, BRTR maintains full auditability through explicit tool-call traces.

Executive Summary

The article introduces Beyond Rows to Reasoning (BRTR), a multimodal agentic framework for spreadsheet understanding and editing. BRTR replaces single-pass retrieval with an iterative tool-calling loop, achieving state-of-the-art performance across three benchmarks. The framework supports end-to-end Excel workflows and maintains full auditability through explicit tool-call traces. Expert evaluation and ablation experiments confirm the effectiveness of BRTR's components, including the planner, retrieval, and iterative reasoning.

Key Points

  • Introduction of BRTR, a multimodal agentic framework for spreadsheet understanding
  • Replacement of single-pass retrieval with an iterative tool-calling loop
  • State-of-the-art performance across three frontier spreadsheet understanding benchmarks

Merits

Improved Performance

BRTR achieves state-of-the-art performance across three benchmarks, surpassing prior methods by significant margins.

Full Auditability

BRTR maintains full auditability through explicit tool-call traces, ensuring transparency and reliability.

Demerits

Complexity

The iterative tool-calling loop may introduce additional complexity, potentially affecting scalability and efficiency.

Expert Commentary

The introduction of BRTR marks a significant advancement in the field of multimodal spreadsheet understanding. By replacing single-pass retrieval with an iterative tool-calling loop, BRTR demonstrates improved performance and full auditability. However, the complexity of the framework may pose challenges for scalability and efficiency. Further research is needed to explore the applications and limitations of BRTR in various real-world scenarios. The implications of BRTR on data governance and regulatory policies also warrant attention, particularly in industries that rely heavily on spreadsheet analysis.

Recommendations

  • Further evaluation of BRTR in various real-world scenarios to assess its practical applications and limitations
  • Investigation into the potential implications of BRTR on data governance and regulatory policies

Sources