Academic

DQA: Diagnostic Question Answering for IT Support

arXiv:2604.05350v1 Announce Type: new Abstract: Enterprise IT support interactions are fundamentally diagnostic: effective resolution requires iterative evidence gathering from ambiguous user reports to identify an underlying root cause. While retrieval-augmented generation (RAG) provides grounding through historical cases, standard multi-turn RAG systems lack explicit diagnostic state and therefore struggle to accumulate evidence and resolve competing hypotheses across turns. We introduce DQA, a diagnostic question-answering framework that maintains persistent diagnostic state and aggregates retrieved cases at the level of root causes rather than individual documents. DQA combines conversational query rewriting, retrieval aggregation, and state-conditioned response generation to support systematic troubleshooting under enterprise latency and context constraints. We evaluate DQA on 150 anonymized enterprise IT support scenarios using a replay-based protocol. Averaged over three indepe

Vishaal Kapoor, Mariam Dundua, Sarthak Ahuja, Neda Kordjazi, Evren Yortucboylu, Vaibhavi Padala, Derek Ho, Jennifer Whitted, Rebecca Steinert · April 8, 2026 · 1 min read · 38 views

#cs.CL #cs.AI

Executive Summary

The paper introduces DQA (Diagnostic Question Answering), an innovative framework designed to enhance IT support interactions by explicitly modeling diagnostic reasoning. Unlike traditional multi-turn Retrieval-Augmented Generation (RAG) systems, DQA maintains a persistent diagnostic state and aggregates evidence at the root-cause level, enabling more efficient and accurate troubleshooting. Evaluated on 150 anonymized enterprise IT support scenarios, DQA achieved a 78.7% success rate—nearly double that of a multi-turn RAG baseline (41.3%)—while reducing the average number of turns required from 8.4 to 3.9. The framework combines conversational query rewriting, retrieval aggregation, and state-conditioned response generation to systematically address ambiguous user reports, demonstrating significant potential to improve enterprise IT support efficiency and accuracy.

Key Points

▸ DQA explicitly models diagnostic reasoning by maintaining a persistent diagnostic state, unlike standard multi-turn RAG systems.
▸ The framework aggregates retrieved cases at the root-cause level, enabling more targeted and efficient troubleshooting.
▸ DQA outperforms a multi-turn RAG baseline by 37.4 percentage points in success rate and reduces the average number of turns by 4.5, highlighting its efficiency in enterprise IT support scenarios.

Merits

Diagnostic State Modeling

DQA's explicit maintenance of diagnostic state allows it to systematically accumulate evidence and resolve competing hypotheses across conversational turns, addressing a critical gap in traditional RAG systems.

Root-Cause Aggregation

By aggregating retrieved cases at the root-cause level rather than individual documents, DQA achieves more precise and coherent troubleshooting, improving both accuracy and efficiency.

Empirical Superiority

The framework's performance metrics—78.7% success rate and reduced average turns—demonstrate its practical superiority over multi-turn RAG baselines in enterprise IT support scenarios.

Scalability and Adaptability

DQA's design is tailored to enterprise latency and context constraints, making it well-suited for real-world deployment in large-scale IT support environments.

Demerits

Limited Generalizability

The evaluation is confined to anonymized enterprise IT support scenarios, leaving uncertainty about DQA's performance in other domains or less structured diagnostic contexts.

Dependency on Historical Data

DQA relies heavily on historical cases for retrieval, which may introduce biases or limitations if the underlying dataset is incomplete or unrepresentative of diverse root causes.

Complexity in Implementation

The framework's multi-component architecture—requiring conversational query rewriting, retrieval aggregation, and state-conditioned generation—may pose challenges in integration and maintenance for some enterprise systems.

Latency in Real-Time Applications

While DQA is designed for enterprise latency constraints, the cumulative overhead of state maintenance and aggregation may still introduce delays in highly time-sensitive IT support scenarios.

Expert Commentary

The Diagnostic Question Answering (DQA) framework represents a significant advancement in the application of AI to enterprise IT support, addressing a longstanding challenge in multi-turn conversational systems: the systematic accumulation and evaluation of diagnostic evidence. By explicitly modeling diagnostic state and aggregating retrieved cases at the root-cause level, DQA transcends the limitations of traditional RAG systems, which often struggle with ambiguity and competing hypotheses in user reports. The empirical results—particularly the 37.4 percentage point improvement over the baseline—are compelling and suggest that DQA could become a gold standard for IT support automation. However, the framework's reliance on historical data and its complexity in implementation may pose barriers for some enterprises. Future work should explore the generalizability of DQA across domains and the feasibility of deploying such systems in real-time, high-stakes environments where latency and interpretability are paramount. The paper also raises important questions about the balance between automation and human expertise, particularly in scenarios where AI-driven diagnostics may require human validation or intervention. Overall, DQA sets a new benchmark for diagnostic AI systems, with far-reaching implications for both industry and academia.

Recommendations

✓ Enterprises should pilot DQA in controlled IT support environments to validate its performance and feasibility before full-scale deployment, particularly in contexts with high diagnostic ambiguity.
✓ Further research should investigate the integration of DQA-like frameworks with human-in-the-loop systems to ensure accountability and mitigate risks in critical diagnostic scenarios.
✓ Policymakers and industry leaders should collaborate to develop standards for AI-driven diagnostic systems, focusing on transparency, data privacy, and the ethical use of historical case data.
✓ The authors should extend their evaluation to include diverse diagnostic domains (e.g., healthcare, legal) to assess the generalizability of DQA's core principles.
✓ Organizations adopting DQA should invest in robust data governance frameworks to ensure compliance with privacy regulations and to mitigate biases in historical case retrieval.

Sources

Original: arXiv - cs.CL

arXiv - cs.CL

DQA: Diagnostic Question Answering for IT Support

AI Commentary

Executive Summary

Key Points

Merits

Diagnostic State Modeling

Root-Cause Aggregation

Empirical Superiority

Scalability and Adaptability

Demerits

Limited Generalizability

Dependency on Historical Data

Complexity in Implementation

Latency in Real-Time Applications

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs