Academic

AI-Assisted Moot Courts: Simulating Justice-Specific Questioning in Oral Arguments

arXiv:2603.04718v1 Announce Type: new Abstract: In oral arguments, judges probe attorneys with questions about the factual record, legal claims, and the strength of their arguments. To prepare for this questioning, both law schools and practicing attorneys rely on moot courts: practice simulations of appellate hearings. Leveraging a dataset of U.S. Supreme Court oral argument transcripts, we examine whether AI models can effectively simulate justice-specific questioning for moot court-style training. Evaluating oral argument simulation is challenging because there is no single correct question for any given turn. Instead, effective questioning should reflect a combination of desirable qualities, such as anticipating substantive legal issues, detecting logical weaknesses, and maintaining an appropriately adversarial tone. We introduce a two-layer evaluation framework that assesses both the realism and pedagogical usefulness of simulated questions using complementary proxy metrics. We c

arXiv:2603.04718v1 Announce Type: new Abstract: In oral arguments, judges probe attorneys with questions about the factual record, legal claims, and the strength of their arguments. To prepare for this questioning, both law schools and practicing attorneys rely on moot courts: practice simulations of appellate hearings. Leveraging a dataset of U.S. Supreme Court oral argument transcripts, we examine whether AI models can effectively simulate justice-specific questioning for moot court-style training. Evaluating oral argument simulation is challenging because there is no single correct question for any given turn. Instead, effective questioning should reflect a combination of desirable qualities, such as anticipating substantive legal issues, detecting logical weaknesses, and maintaining an appropriately adversarial tone. We introduce a two-layer evaluation framework that assesses both the realism and pedagogical usefulness of simulated questions using complementary proxy metrics. We construct and evaluate both prompt-based and agentic oral argument simulators. We find that simulated questions are often perceived as realistic by human annotators and achieve high recall of ground truth substantive legal issues. However, models still face substantial shortcomings, including low diversity in question types and sycophancy. Importantly, these shortcomings would remain undetected under naive evaluation approaches.

Executive Summary

This article presents a novel approach to evaluating AI-assisted moot courts by introducing a two-layer evaluation framework that assesses both the realism and pedagogical usefulness of simulated questions. Leveraging a dataset of U.S. Supreme Court oral argument transcripts, the authors construct and evaluate both prompt-based and agentic oral argument simulators. While the simulated questions are often perceived as realistic by human annotators and achieve high recall of ground truth substantive legal issues, the models still face substantial shortcomings, including low diversity in question types and sycophancy. This research has significant implications for the development of AI-assisted moot courts and highlights the need for more sophisticated evaluation approaches to uncover these limitations.

Key Points

  • The article introduces a two-layer evaluation framework for assessing AI-assisted moot courts.
  • The framework evaluates both the realism and pedagogical usefulness of simulated questions.
  • The authors construct and evaluate both prompt-based and agentic oral argument simulators.
  • Simulated questions are often perceived as realistic by human annotators, but lack diversity and may exhibit sycophancy.

Merits

Strength in Realism

The simulated questions are often perceived as realistic by human annotators, indicating a high level of realism in the AI-assisted moot court simulations.

High Recall of Substantive Legal Issues

The agentic oral argument simulators achieve high recall of ground truth substantive legal issues, indicating that the AI models are effective in identifying key legal issues.

Demerits

Low Diversity in Question Types

The simulated questions lack diversity in question types, which may limit their effectiveness in preparing attorneys for actual oral arguments.

Sycophancy

The AI models may exhibit sycophancy, which can undermine the effectiveness of the moot court simulations and create unrealistic expectations among attorneys.

Expert Commentary

This article presents a significant contribution to the ongoing discussion of AI's potential impact on the legal profession. The development of AI-assisted moot courts has the potential to revolutionize the way attorneys prepare for oral arguments, but the limitations of the current AI models highlighted in this research must be addressed to ensure that the simulations are realistic and effective. As policymakers and educators consider the potential benefits and risks of AI-assisted moot courts, it is essential to develop a nuanced understanding of the limitations and potential applications of these simulations.

Recommendations

  • Develop more sophisticated AI models that can simulate a wider range of question types and avoid sycophancy.
  • Conduct further research to evaluate the effectiveness of AI-assisted moot courts in improving oral argument preparation and advocacy skills among attorneys.

Sources