Character-aware Transformers Learn an Irregular Morphological Pattern Yet None Generalize Like Humans
arXiv:2602.14100v1 Announce Type: new Abstract: Whether neural networks can serve as cognitive models of morphological learning remains an open question. Recent work has shown that encoder-decoder models can acquire irregular patterns, but evidence that they generalize these patterns like humans is mixed. We investigate this using the Spanish \emph{L-shaped morphome}, where only the first-person singular indicative (e.g., \textit{pongo} `I put') shares its stem with all subjunctive forms (e.g., \textit{ponga, pongas}) despite lacking apparent phonological, semantic, or syntactic motivation. We compare five encoder-decoder transformers varying along two dimensions: sequential vs. position-invariant positional encoding, and atomic vs. decomposed tag representations. Positional encoding proves decisive: position-invariant models recover the correct L-shaped paradigm clustering even when L-shaped verbs are scarce in training, whereas sequential positional encoding models only partially ca
arXiv:2602.14100v1 Announce Type: new Abstract: Whether neural networks can serve as cognitive models of morphological learning remains an open question. Recent work has shown that encoder-decoder models can acquire irregular patterns, but evidence that they generalize these patterns like humans is mixed. We investigate this using the Spanish \emph{L-shaped morphome}, where only the first-person singular indicative (e.g., \textit{pongo} `I put') shares its stem with all subjunctive forms (e.g., \textit{ponga, pongas}) despite lacking apparent phonological, semantic, or syntactic motivation. We compare five encoder-decoder transformers varying along two dimensions: sequential vs. position-invariant positional encoding, and atomic vs. decomposed tag representations. Positional encoding proves decisive: position-invariant models recover the correct L-shaped paradigm clustering even when L-shaped verbs are scarce in training, whereas sequential positional encoding models only partially capture the pattern. Yet none of the models productively generalize this pattern to novel forms. Position-invariant models generalize the L-shaped stem across subjunctive cells but fail to extend it to the first-person singular indicative, producing a mood-based generalization rather than the L-shaped morphomic pattern. Humans do the opposite, generalizing preferentially to the first-person singular indicative over subjunctive forms. None of the models reproduce the human pattern, highlighting the gap between statistical pattern reproduction and morphological abstraction.
Executive Summary
The article investigates the capacity of neural networks, specifically transformer models, to learn and generalize irregular morphological patterns in language, using the Spanish L-shaped morphome as a case study. The study compares five transformer models with variations in positional encoding and tag representations. The findings indicate that position-invariant models can recover the L-shaped paradigm clustering but fail to generalize the pattern to novel forms as humans do. This highlights a significant gap between statistical pattern reproduction by models and the human ability to abstract morphological rules.
Key Points
- ▸ Transformer models with position-invariant encoding can recover the L-shaped paradigm clustering.
- ▸ None of the models generalize the L-shaped pattern to novel forms as humans do.
- ▸ Positional encoding is a decisive factor in the models' ability to capture morphological patterns.
Merits
Empirical Rigor
The study employs a rigorous experimental design, comparing multiple models with variations in key parameters, providing robust evidence for its conclusions.
Relevance to Cognitive Science
The research addresses a fundamental question in cognitive science and linguistics, contributing to the understanding of morphological learning and generalization.
Demerits
Limited Generalizability
The findings are based on a specific linguistic pattern (Spanish L-shaped morphome), which may not be generalizable to other morphological patterns or languages.
Model Limitations
The study does not explore potential model architectures or training methods that could achieve human-like generalization, leaving open questions about the potential for future advancements.
Expert Commentary
The article presents a well-designed and executed study that sheds light on the limitations of transformer models in capturing and generalizing complex morphological patterns. The focus on the Spanish L-shaped morphome is a clever choice, as it allows for a nuanced examination of the models' capabilities. The finding that position-invariant models can recover the L-shaped paradigm clustering but fail to generalize it to novel forms as humans do is particularly insightful. It underscores the gap between statistical pattern reproduction by models and the human ability to abstract and apply morphological rules. However, the study's conclusions are somewhat limited by its focus on a single linguistic pattern. Future research could explore whether these findings hold for other morphological patterns and languages, and investigate potential model architectures or training methods that could achieve human-like generalization. Additionally, the study's implications extend beyond linguistics, contributing to the broader discourse on the capabilities and limitations of neural networks in various domains.
Recommendations
- ✓ Future research should explore the generalizability of these findings to other morphological patterns and languages to provide a more comprehensive understanding of the models' capabilities.
- ✓ Investigate potential model architectures or training methods that could achieve human-like generalization, bridging the gap between statistical pattern reproduction and morphological abstraction.