DQA: Diagnostic Question Answering for IT Support
arXiv:2604.05350v1 Announce Type: new Abstract: Enterprise IT support interactions are fundamentally diagnostic: effective resolution requires iterative evidence gathering from ambiguous user reports to identify an underlying root cause. While retrieval-augmented generation (RAG) provides grounding through historical cases, standard multi-turn RAG systems lack explicit diagnostic state and therefore struggle to accumulate evidence and resolve competing hypotheses across turns. We introduce DQA, a diagnostic question-answering framework that maintains persistent diagnostic state and aggregates retrieved cases at the level of root causes rather than individual documents. DQA combines conversational query rewriting, retrieval aggregation, and state-conditioned response generation to support systematic troubleshooting under enterprise latency and context constraints. We evaluate DQA on 150 anonymized enterprise IT support scenarios using a replay-based protocol. Averaged over three indepe
arXiv:2604.05350v1 Announce Type: new Abstract: Enterprise IT support interactions are fundamentally diagnostic: effective resolution requires iterative evidence gathering from ambiguous user reports to identify an underlying root cause. While retrieval-augmented generation (RAG) provides grounding through historical cases, standard multi-turn RAG systems lack explicit diagnostic state and therefore struggle to accumulate evidence and resolve competing hypotheses across turns. We introduce DQA, a diagnostic question-answering framework that maintains persistent diagnostic state and aggregates retrieved cases at the level of root causes rather than individual documents. DQA combines conversational query rewriting, retrieval aggregation, and state-conditioned response generation to support systematic troubleshooting under enterprise latency and context constraints. We evaluate DQA on 150 anonymized enterprise IT support scenarios using a replay-based protocol. Averaged over three independent runs, DQA achieves a 78.7% success rate under a trajectory-level success criterion, compared to 41.3% for a multi-turn RAG baseline, while reducing average turns from 8.4 to 3.9.
Executive Summary
The paper introduces DQA (Diagnostic Question Answering), an innovative framework designed to enhance IT support interactions by explicitly modeling diagnostic reasoning. Unlike traditional multi-turn Retrieval-Augmented Generation (RAG) systems, DQA maintains a persistent diagnostic state and aggregates evidence at the root-cause level, enabling more efficient and accurate troubleshooting. Evaluated on 150 anonymized enterprise IT support scenarios, DQA achieved a 78.7% success rate—nearly double that of a multi-turn RAG baseline (41.3%)—while reducing the average number of turns required from 8.4 to 3.9. The framework combines conversational query rewriting, retrieval aggregation, and state-conditioned response generation to systematically address ambiguous user reports, demonstrating significant potential to improve enterprise IT support efficiency and accuracy.
Key Points
- ▸ DQA explicitly models diagnostic reasoning by maintaining a persistent diagnostic state, unlike standard multi-turn RAG systems.
- ▸ The framework aggregates retrieved cases at the root-cause level, enabling more targeted and efficient troubleshooting.
- ▸ DQA outperforms a multi-turn RAG baseline by 37.4 percentage points in success rate and reduces the average number of turns by 4.5, highlighting its efficiency in enterprise IT support scenarios.
Merits
Diagnostic State Modeling
DQA's explicit maintenance of diagnostic state allows it to systematically accumulate evidence and resolve competing hypotheses across conversational turns, addressing a critical gap in traditional RAG systems.
Root-Cause Aggregation
By aggregating retrieved cases at the root-cause level rather than individual documents, DQA achieves more precise and coherent troubleshooting, improving both accuracy and efficiency.
Empirical Superiority
The framework's performance metrics—78.7% success rate and reduced average turns—demonstrate its practical superiority over multi-turn RAG baselines in enterprise IT support scenarios.
Scalability and Adaptability
DQA's design is tailored to enterprise latency and context constraints, making it well-suited for real-world deployment in large-scale IT support environments.
Demerits
Limited Generalizability
The evaluation is confined to anonymized enterprise IT support scenarios, leaving uncertainty about DQA's performance in other domains or less structured diagnostic contexts.
Dependency on Historical Data
DQA relies heavily on historical cases for retrieval, which may introduce biases or limitations if the underlying dataset is incomplete or unrepresentative of diverse root causes.
Complexity in Implementation
The framework's multi-component architecture—requiring conversational query rewriting, retrieval aggregation, and state-conditioned generation—may pose challenges in integration and maintenance for some enterprise systems.
Latency in Real-Time Applications
While DQA is designed for enterprise latency constraints, the cumulative overhead of state maintenance and aggregation may still introduce delays in highly time-sensitive IT support scenarios.
Expert Commentary
The Diagnostic Question Answering (DQA) framework represents a significant advancement in the application of AI to enterprise IT support, addressing a longstanding challenge in multi-turn conversational systems: the systematic accumulation and evaluation of diagnostic evidence. By explicitly modeling diagnostic state and aggregating retrieved cases at the root-cause level, DQA transcends the limitations of traditional RAG systems, which often struggle with ambiguity and competing hypotheses in user reports. The empirical results—particularly the 37.4 percentage point improvement over the baseline—are compelling and suggest that DQA could become a gold standard for IT support automation. However, the framework's reliance on historical data and its complexity in implementation may pose barriers for some enterprises. Future work should explore the generalizability of DQA across domains and the feasibility of deploying such systems in real-time, high-stakes environments where latency and interpretability are paramount. The paper also raises important questions about the balance between automation and human expertise, particularly in scenarios where AI-driven diagnostics may require human validation or intervention. Overall, DQA sets a new benchmark for diagnostic AI systems, with far-reaching implications for both industry and academia.
Recommendations
- ✓ Enterprises should pilot DQA in controlled IT support environments to validate its performance and feasibility before full-scale deployment, particularly in contexts with high diagnostic ambiguity.
- ✓ Further research should investigate the integration of DQA-like frameworks with human-in-the-loop systems to ensure accountability and mitigate risks in critical diagnostic scenarios.
- ✓ Policymakers and industry leaders should collaborate to develop standards for AI-driven diagnostic systems, focusing on transparency, data privacy, and the ethical use of historical case data.
- ✓ The authors should extend their evaluation to include diverse diagnostic domains (e.g., healthcare, legal) to assess the generalizability of DQA's core principles.
- ✓ Organizations adopting DQA should invest in robust data governance frameworks to ensure compliance with privacy regulations and to mitigate biases in historical case retrieval.
Sources
Original: arXiv - cs.CL