Academic

A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification

arXiv:2603.09990v1 Announce Type: cross Abstract: In business-to-business relations, it is common to establish NonDisclosure Agreements (NDAs). However, these documents exhibit significant variation in format, structure, and writing style, making manual analysis slow and error-prone. We propose an architecture based on LLMs to automate the segmentation and clauses classification within these contracts. We employed two models: LLaMA-3.1-8B-Instruct for NDA segmentation (clause extraction) and a fine-tuned Legal-Roberta-Large for clause classification. In the segmentation task, we achieved a ROUGE F1 of 0.95 +/- 0.0036; for classification, we obtained a weighted F1 of 0.85, demonstrating the feasibility and precision of the approach.

Ana Begnini, Matheus Vicente, Leonardo Souza · March 12, 2026 · 1 min read · 43 views

#cs.CL #cs.AI

Executive Summary

This article proposes an innovative two-stage architecture to automate the analysis of Non-Disclosure Agreements (NDAs). By leveraging Large Language Models (LLMs), the authors develop a system capable of segmenting contracts into clauses and classifying these clauses using a transformer-based approach. The article demonstrates the feasibility and precision of this approach with impressive results, achieving a ROUGE F1 score of 0.95 for segmentation and a weighted F1 score of 0.85 for classification. This breakthrough has significant implications for the automation of contract analysis, reducing the time and error-prone nature associated with manual processing. As the reliance on NDAs continues to grow in business-to-business relations, the potential impact of this research is substantial.

Key Points

▸ Proposed a two-stage architecture for NDA analysis using LLMs
▸ Achieved high accuracy in segmentation and classification tasks
▸ Demonstrated the feasibility of automating contract analysis with LLMs

Merits

Strength in LLM-based Approach

The use of LLMs enables the system to learn from large amounts of data, allowing it to capture nuances in contract language and improve accuracy in segmentation and classification tasks.

Precise Results in Segmentation and Classification

The system achieved high ROUGE F1 and weighted F1 scores, indicating a high degree of accuracy in both tasks, which is a significant improvement over manual analysis.

Demerits

Data Quality and Availability

The success of the system relies heavily on the quality and availability of training data, which may be a challenge in certain industries or jurisdictions where NDAs are not as prevalent.

Interpretability and Transparency

The use of LLMs can make it difficult to understand the reasoning behind the system's decisions, which may be a concern in high-stakes applications where transparency is crucial.

Expert Commentary

This research represents a significant breakthrough in the application of AI to contract analysis. The use of LLMs to segment and classify clauses in NDAs has the potential to transform the way businesses approach contract analysis and processing. However, as with any AI application, it is essential to carefully consider the potential limitations and challenges, including data quality, interpretability, and transparency. As the field continues to evolve, it is crucial to engage in ongoing research and development to address these challenges and ensure that these technologies are used responsibly and effectively.

Recommendations

✓ Further research is needed to develop and refine the system, including the integration of multiple LLMs and the development of more advanced techniques for interpretability and transparency.
✓ The findings of this research should be shared with the broader legal and business communities to raise awareness of the potential benefits and challenges of AI in contract analysis.

Sources

arXiv - cs.AI

A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification

AI Commentary

Executive Summary

Key Points

Merits

Strength in LLM-based Approach

Precise Results in Segmentation and Classification

Demerits

Data Quality and Availability

Interpretability and Transparency

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs