Academic

DataSTORM: Deep Research on Large-Scale Databases using Exploratory Data Analysis and Data Storytelling

Shicheng Liu, Yucheng Jiang, Sajid Farook, Camila Nicollier Sanchez, David Fernando Castro Pena, Monica S. Lam · April 9, 2026 · 1 min read · 53 views

#cs.CL

arXiv:2604.06474v1 Announce Type: new Abstract: Deep research with Large Language Model (LLM) agents is emerging as a powerful paradigm for multi-step information discovery, synthesis, and analysis. However, existing approaches primarily focus on unstructured web data, while the challenges of conducting deep research over large-scale structured databases remain relatively underexplored. Unlike web-based research, effective data-centric research requires more than retrieval and summarization and demands iterative hypothesis generation, quantitative reasoning over structured schemas, and convergence toward a coherent analytical narrative. In this paper, we present DataSTORM, an LLM-based agentic system capable of autonomously conducting research across both large-scale structured databases and internet sources. Grounded in principles from Exploratory Data Analysis and Data Storytelling, DataSTORM reframes deep research over structured data as a thesis-driven analytical process: discovering candidate theses from data, validating them through iterative cross-source investigation, and developing them into coherent analytical narratives. We evaluate DataSTORM on InsightBench, where it achieves a new state-of-the-art result with a 19.4% relative improvement in insight-level recall and 7.2% in summary-level score. We further introduce a new dataset built on ACLED, a real-world complex database, and demonstrate that DataSTORM outperforms proprietary systems such as ChatGPT Deep Research across both automated metrics and human evaluations.

Executive Summary

DataSTORM presents a significant advancement in autonomous deep research over large-scale structured databases, integrating LLM agents with principles of Exploratory Data Analysis (EDA) and Data Storytelling. The system addresses a critical gap by moving beyond unstructured web data, focusing on iterative hypothesis generation, quantitative reasoning, and narrative construction. By reframing data-centric research as a thesis-driven process, DataSTORM demonstrates superior performance on established benchmarks and a new real-world dataset, significantly outperforming existing proprietary systems. This work signals a crucial step towards more sophisticated, autonomous analytical capabilities in complex data environments, offering potential for profound impacts across various data-intensive fields.

Key Points

▸ DataSTORM is an LLM-based agentic system for autonomous deep research across structured databases and internet sources.
▸ It addresses the underexplored challenge of deep research over large-scale structured data, distinguishing it from web-based information retrieval.
▸ The system is grounded in Exploratory Data Analysis (EDA) and Data Storytelling, framing research as a thesis-driven analytical process.
▸ DataSTORM involves discovering candidate theses, validating them through cross-source investigation, and developing coherent analytical narratives.
▸ Evaluations show state-of-the-art performance, with significant improvements in insight-level recall and summary-level scores on InsightBench and a new ACLED dataset, outperforming proprietary systems like ChatGPT Deep Research.

Merits

Addresses a Critical Gap

Successfully tackles the relatively underexplored challenge of deep research on structured databases, a domain distinct from unstructured web data research.

Robust Methodological Foundation

Integrates established principles of Exploratory Data Analysis and Data Storytelling, providing a structured, thesis-driven analytical framework for LLM agents.

Demonstrated Superior Performance

Achieves state-of-the-art results on standard benchmarks (InsightBench) and a novel real-world dataset (ACLED), outperforming proprietary systems and showcasing practical utility.

Enhances Analytical Coherence

Focuses on developing 'coherent analytical narratives,' moving beyond mere retrieval and summarization to generate meaningful, synthesized insights.

Demerits

Complexity of Real-World Data

While using ACLED is commendable, the true complexity, noise, and semantic ambiguity of many enterprise-scale databases may still pose significant challenges not fully captured.

Interpretability and Trustworthiness

As an agentic LLM system, the 'black box' nature of its reasoning and hypothesis generation process may hinder interpretability and trust, especially in high-stakes analytical contexts.

Generalizability Across Domains

The effectiveness across highly diverse database schemas, data types, and domain-specific reasoning requirements beyond the tested datasets remains to be thoroughly validated.

Resource Intensity

Deep research involving iterative LLM interactions and cross-source validation is likely computationally intensive, posing scalability challenges for very large or time-sensitive applications.

Expert Commentary

DataSTORM represents a pivotal conceptual and technological leap in the application of LLM agents to structured data analysis. The reframing of deep research as a 'thesis-driven analytical process' is particularly insightful, moving beyond mere information retrieval to a more cognitive, hypothesis-testing paradigm. This aligns closely with the rigorous demands of scholarly inquiry and expert legal analysis. The integration of EDA and Data Storytelling principles provides a much-needed methodological scaffolding for LLMs, enhancing their capacity for coherent and actionable insight generation. While the performance metrics are impressive, the true innovation lies in the architectural design that facilitates iterative reasoning and cross-source validation. Future work must rigorously address the generalizability across highly heterogeneous enterprise data landscapes and, crucially, the mechanisms for ensuring interpretability and trustworthiness of the generated narratives, especially where causal inferences or policy recommendations are derived. The challenge remains to bridge the gap between automated insight and human-validated wisdom.

Recommendations

✓ Develop advanced explainability features to provide transparent insights into the LLM agent's reasoning, hypothesis generation, and data synthesis process, enhancing user trust and auditability.
✓ Conduct extensive validation across a wider variety of real-world, highly complex, and domain-diverse structured databases to assess generalizability and identify limitations.
✓ Investigate methods for human-in-the-loop interaction, allowing domain experts to guide, refine, or challenge hypotheses and narratives generated by DataSTORM, optimizing collaborative intelligence.
✓ Explore mechanisms for integrating uncertainty quantification into the generated narratives, signaling areas where data is sparse or conclusions are probabilistic, aligning with responsible analytical practices.
✓ Address the computational efficiency and scalability for very large datasets and complex multi-step investigations to ensure practical deployability in resource-constrained environments.

Sources

Original: arXiv - cs.CL

arXiv - cs.CL

DataSTORM: Deep Research on Large-Scale Databases using Exploratory Data Analysis and Data Storytelling

AI Commentary

Executive Summary

Key Points

Merits

Addresses a Critical Gap

Robust Methodological Foundation

Demonstrated Superior Performance

Enhances Analytical Coherence

Demerits

Complexity of Real-World Data

Interpretability and Trustworthiness

Generalizability Across Domains

Resource Intensity

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs