How Do Lexical Senses Correspond Between Spoken German and German Sign Language?
arXiv:2602.13790v1 Announce Type: new Abstract: Sign language lexicographers construct bilingual dictionaries by establishing word-to-sign mappings, where polysemous and homonymous words corresponding to different signs across contexts are often underrepresented. A usage-based approach examining how word senses map to signs can identify such novel mappings absent from current dictionaries, enriching lexicographic resources. We address this by analyzing German and German Sign Language (Deutsche Geb\"ardensprache, DGS), manually annotating 1,404 word use-to-sign ID mappings derived from 32 words from the German Word Usage Graph (D-WUG) and 49 signs from the Digital Dictionary of German Sign Language (DW-DGS). We identify three correspondence types: Type 1 (one-to-many), Type 2 (many-to-one), and Type 3 (one-to-one), plus No Match cases. We evaluate computational methods: Exact Match (EM) and Semantic Similarity (SS) using SBERT embeddings. SS substantially outperforms EM overall 88.52%
arXiv:2602.13790v1 Announce Type: new Abstract: Sign language lexicographers construct bilingual dictionaries by establishing word-to-sign mappings, where polysemous and homonymous words corresponding to different signs across contexts are often underrepresented. A usage-based approach examining how word senses map to signs can identify such novel mappings absent from current dictionaries, enriching lexicographic resources. We address this by analyzing German and German Sign Language (Deutsche Geb\"ardensprache, DGS), manually annotating 1,404 word use-to-sign ID mappings derived from 32 words from the German Word Usage Graph (D-WUG) and 49 signs from the Digital Dictionary of German Sign Language (DW-DGS). We identify three correspondence types: Type 1 (one-to-many), Type 2 (many-to-one), and Type 3 (one-to-one), plus No Match cases. We evaluate computational methods: Exact Match (EM) and Semantic Similarity (SS) using SBERT embeddings. SS substantially outperforms EM overall 88.52% vs. 71.31%), with dramatic gains for Type 1 (+52.1 pp). Our work establishes the first annotated dataset for cross-modal sense correspondence and reveals which correspondence patterns are computationally identifiable. Our code and dataset are made publicly available.
Executive Summary
The article explores the correspondence between lexical senses in spoken German and German Sign Language (DGS), addressing the challenge of polysemy and homonymy in bilingual lexicography. By manually annotating 1,404 word use-to-sign mappings, the study identifies three types of correspondence patterns and evaluates computational methods for identifying these patterns. The findings highlight the superiority of semantic similarity (SS) over exact match (EM) methods, particularly for one-to-many mappings, and establish a foundational dataset for future research in cross-modal sense correspondence.
Key Points
- ▸ The study examines the mapping of word senses between spoken German and German Sign Language (DGS).
- ▸ Three types of correspondence patterns (Type 1, Type 2, Type 3) and No Match cases are identified.
- ▸ Semantic Similarity (SS) using SBERT embeddings outperforms Exact Match (EM) methods, especially for Type 1 mappings.
- ▸ The research provides the first annotated dataset for cross-modal sense correspondence.
- ▸ The study offers insights into the computational identifiability of correspondence patterns.
Merits
Innovative Approach
The usage-based approach to lexicography is innovative and addresses a significant gap in bilingual dictionaries, particularly for sign languages.
Comprehensive Dataset
The creation of the first annotated dataset for cross-modal sense correspondence is a valuable contribution to the field.
Methodological Rigor
The study employs rigorous computational methods and provides a detailed evaluation of their performance.
Demerits
Limited Scope
The study is limited to a small subset of words and signs, which may not fully capture the complexity and diversity of lexical mappings.
Manual Annotation
The reliance on manual annotation for creating the dataset is time-consuming and may introduce human bias.
Generalizability
The findings may not be generalizable to other languages or sign languages due to the specific focus on German and DGS.
Expert Commentary
The article presents a significant advancement in the field of lexicography, particularly in the context of sign languages. The usage-based approach and the identification of correspondence patterns provide valuable insights into the complexities of mapping lexical senses between spoken and sign languages. The study's rigorous methodology and the creation of a comprehensive dataset are commendable. However, the limited scope and the reliance on manual annotation are notable limitations. The findings have practical implications for improving bilingual dictionaries and can inform policy decisions regarding the inclusion and accessibility of sign languages. Future research should aim to expand the dataset and explore the generalizability of the findings to other language pairs.
Recommendations
- ✓ Expand the dataset to include a broader range of words and signs to capture the full complexity of lexical mappings.
- ✓ Investigate automated methods for annotating lexical sense correspondences to reduce human bias and improve efficiency.