Academic

THIVLVC: Retrieval Augmented Dependency Parsing for Latin

arXiv:2604.05564v1 Announce Type: new Abstract: We describe THIVLVC, a two-stage system for the EvaLatin 2026 Dependency Parsing task. Given a Latin sentence, we retrieve structurally similar entries from the CIRCSE treebank using sentence length and POS n-gram similarity, then prompt a large language model to refine the baseline parse from UDPipe using the retrieved examples and UD annotation guidelines. We submit two configurations: one without retrieval and one with retrieval (RAG). On poetry (Seneca), THIVLVC improves CLAS by +17 points over the UDPipe baseline; on prose (Thomas Aquinas), the gain is +1.5 CLAS. A double-blind error analysis of 300 divergences between our system and the gold standard reveals that, among unanimous annotator decisions, 53.3% favour THIVLVC, showing annotation inconsistencies both within and across treebanks.

Luc Pommeret (STL), Thibault Wagret (ENS de Lyon, HiSoMA), Jules Deret · April 8, 2026 · 1 min read · 28 views

#cs.CL

Executive Summary

The paper introduces THIVLVC, a two-stage retrieval-augmented dependency parsing system for Latin, designed for the EvaLatin 2026 Dependency Parsing task. The system first retrieves structurally similar Latin sentences from the CIRCSE treebank using sentence length and POS n-gram similarity, then uses these retrieved examples to refine baseline parses generated by UDPipe via a large language model (LLM) guided by UD annotation guidelines. Results show significant improvements in CLAS scores, particularly for poetry (Seneca) with a +17 point gain over UDPipe, and modest gains for prose (Thomas Aquinas) at +1.5 points. A blind error analysis of 300 divergences indicates that 53.3% of disagreements favor THIVLVC, though inconsistencies within and across treebanks are noted. The study highlights the potential of retrieval-augmented methods in low-resource or morphologically complex languages like Latin.

Key Points

▸ THIVLVC employs a two-stage retrieval-augmented parsing system combining structural retrieval from the CIRCSE treebank with LLM-based refinement guided by UD annotation guidelines.
▸ Performance gains are substantial for poetry (Seneca, +17 CLAS) but modest for prose (Thomas Aquinas, +1.5 CLAS), suggesting variability in effectiveness across different Latin text genres.
▸ A blind error analysis reveals that over half of the divergences from the gold standard favor THIVLVC, yet annotation inconsistencies within and across treebanks complicate evaluation.
▸ The system demonstrates the potential of retrieval-augmented generation (RAG) in enhancing dependency parsing for historically significant but low-resource languages.

Merits

Innovative Methodology

The integration of retrieval-augmented generation (RAG) with dependency parsing for Latin represents a novel approach, leveraging structurally similar examples to refine baseline parses generated by established tools like UDPipe.

Significant Performance Gains

The system achieves notable improvements, particularly in poetry parsing, where a +17 CLAS gain over UDPipe underscores the efficacy of retrieval-augmented refinement in morphologically complex or stylistically distinct texts.

Rigorous Evaluation

The inclusion of a double-blind error analysis provides qualitative insights into the system's performance, revealing both strengths and limitations in handling divergences from the gold standard.

Demerits

Genre-Specific Limitations

The system's modest gain (+1.5 CLAS) for prose (Thomas Aquinas) compared to poetry suggests that retrieval-augmented methods may be less effective for certain Latin text genres, possibly due to stylistic or syntactic differences.

Annotation Inconsistencies

The error analysis highlights annotation inconsistencies within and across treebanks, which complicate the evaluation of parsing systems and may skew performance metrics.

Dependency on Treebank Size

The effectiveness of THIVLVC is contingent on the availability and quality of the CIRCSE treebank for retrieval, raising questions about scalability and applicability to other low-resource historical languages.

Expert Commentary

THIVLVC represents a significant advancement in the application of modern NLP techniques to historical linguistics, particularly for Latin, a language with rich morphological complexity and stylistic diversity. The two-stage retrieval-augmented approach effectively leverages structural similarity and LLM refinement to enhance dependency parsing, achieving impressive gains in poetry parsing. However, the modest improvements for prose highlight the need for genre-specific adaptations and further investigation into the factors influencing retrieval efficacy. The error analysis, while illuminating, also underscores a critical challenge in historical NLP: the inconsistency within and across treebanks. This issue is not merely an academic concern but a practical barrier to robust system evaluation and deployment. Future work should focus on improving treebank consistency and exploring hybrid models that combine retrieval-augmented techniques with traditional parsing methods to address genre-specific limitations. Additionally, the integration of LLMs into parsing workflows raises questions about scalability and computational feasibility, which warrant further exploration.

Recommendations

✓ Develop genre-specific retrieval strategies or fine-tune LLMs on domain-specific corpora to address the variability in performance across different Latin text genres.
✓ Conduct further research into the causes of annotation inconsistencies within and across treebanks, and establish standardized annotation guidelines to improve the reliability of parsing evaluations.
✓ Explore the integration of THIVLVC's methodology with other low-resource languages to assess its generalizability and scalability beyond Latin.
✓ Investigate the computational efficiency and resource requirements of retrieval-augmented parsing systems to ensure their practical deployment in resource-constrained environments.
✓ Collaborate with treebank curators to enhance the quality and consistency of annotations, thereby improving the foundation for NLP applications in historical linguistics.

Sources

Original: arXiv - cs.CL

arXiv - cs.CL

THIVLVC: Retrieval Augmented Dependency Parsing for Latin

AI Commentary

Executive Summary

Key Points

Merits

Innovative Methodology

Significant Performance Gains

Rigorous Evaluation

Demerits

Genre-Specific Limitations

Annotation Inconsistencies

Dependency on Treebank Size

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs