Multilingual Language Models Encode Script Over Linguistic Structure
arXiv:2604.05090v1 Announce Type: new Abstract: Multilingual language models (LMs) organize representations for typologically and orthographically diverse languages into a shared parameter space, yet the nature of this internal organization remains elusive. In this work, we investigate which linguistic properties - abstract language identity or surface-form cues - shape multilingual representations. Focusing on compact, distilled models where representational trade-offs are explicit, we analyze language-associated units in Llama-3.2-1B and Gemma-2-2B using the Language Activation Probability Entropy (LAPE) metric, and further decompose activations with Sparse Autoencoders. We find that these units are strongly conditioned on orthography: romanization induces near-disjoint representations that align with neither native-script inputs nor English, while word-order shuffling has limited effect on unit identity. Probing shows that typological structure becomes increasingly accessible in de
arXiv:2604.05090v1 Announce Type: new Abstract: Multilingual language models (LMs) organize representations for typologically and orthographically diverse languages into a shared parameter space, yet the nature of this internal organization remains elusive. In this work, we investigate which linguistic properties - abstract language identity or surface-form cues - shape multilingual representations. Focusing on compact, distilled models where representational trade-offs are explicit, we analyze language-associated units in Llama-3.2-1B and Gemma-2-2B using the Language Activation Probability Entropy (LAPE) metric, and further decompose activations with Sparse Autoencoders. We find that these units are strongly conditioned on orthography: romanization induces near-disjoint representations that align with neither native-script inputs nor English, while word-order shuffling has limited effect on unit identity. Probing shows that typological structure becomes increasingly accessible in deeper layers, while causal interventions indicate that generation is most sensitive to units that are invariant to surface-form perturbations rather than to units identified by typological alignment alone. Overall, our results suggest that multilingual LMs organize representations around surface form, with linguistic abstraction emerging gradually without collapsing into a unified interlingua.
Executive Summary
This study investigates how multilingual language models (LMs) organize internal representations for diverse languages, challenging the assumption that abstract linguistic structures (e.g., syntax or typology) dominate. The authors demonstrate that surface-form cues, particularly orthography, play a pivotal role in shaping representations, even when models are trained on multiple languages. Using compact models like Llama-3.2-1B and Gemma-2-2B, they introduce the Language Activation Probability Entropy (LAPE) metric and Sparse Autoencoders to decompose activations. Findings reveal that romanization disrupts representational alignment, while word-order variations minimally impact unit identity. Typological structure emerges incrementally in deeper layers, but generation sensitivity favors units invariant to surface perturbations. The work suggests that multilingual LMs prioritize script-based organization, with linguistic abstraction developing gradually rather than coalescing into a unified interlingua.
Key Points
- ▸ Multilingual LMs rely heavily on orthographic (script-based) cues for internal representation, overshadowing abstract linguistic structures like syntax or typology.
- ▸ Romanization (transliteration to Latin script) induces near-disjoint representations, failing to align with native-script inputs or English, highlighting the fragility of surface-form conditioning.
- ▸ Typological structure becomes more accessible in deeper layers, yet causal interventions show generation is most sensitive to units invariant to surface perturbations, not typological alignment.
Merits
Methodological Rigor
The use of LAPE and Sparse Autoencoders to decompose activations provides a novel, quantitative lens to probe multilingual representations, offering granular insights into how language-specific units are encoded.
Theoretical Significance
Challenges prevailing assumptions about interlingual abstraction in multilingual LMs, proposing a paradigm where surface form (orthography) dictates early-stage representation, with linguistic abstraction emerging later.
Empirical Scope
Examines two state-of-the-art compact models (Llama-3.2-1B and Gemma-2-2B), ensuring relevance to both academic research and practical deployment in resource-constrained environments.
Demerits
Limited Model Diversity
Focuses exclusively on Llama-3.2-1B and Gemma-2-2B, which may not generalize to larger or differently architected models (e.g., encoder-decoder architectures or models trained on fewer languages).
Static Representation Analysis
Primarily analyzes internal representations rather than dynamic behavior during generation, leaving open questions about how surface-form conditioning impacts real-time multilingual tasks (e.g., translation, question answering).
Causal Intervention Limitations
Causal interventions to assess generation sensitivity rely on proxy metrics (e.g., unit invariance), which may not fully capture the nuanced interplay between surface form and linguistic abstraction in downstream applications.
Expert Commentary
This work makes a compelling case for rethinking how multilingual LMs organize knowledge, shifting the focus from abstract linguistic universals to surface-form cues. The authors’ use of LAPE and Sparse Autoencoders is particularly innovative, as it allows for a fine-grained dissection of representation space without relying solely on probing tasks. Their finding that romanization disrupts alignment—even when models are exposed to multiple scripts—suggests a fundamental limitation in how current architectures handle script diversity. Notably, the observation that generation is most sensitive to units invariant to surface perturbations implies that models may be ‘gaming’ the training objective by exploiting shallow, script-based correlations rather than learning robust linguistic abstractions. This aligns with recent critiques of brittleness in multilingual models (e.g., their poor performance on low-resource languages or dialectal variations), but it also opens avenues for targeted improvements. For instance, incorporating script-invariant training objectives or adversarial examples in non-Latin scripts could push models toward deeper linguistic generalization. The study’s focus on compact models is pragmatic, given the computational constraints of deploying multilingual systems at scale, but future work should extend these analyses to larger models and more diverse language families to validate the generality of these findings.
Recommendations
- ✓ Expand the analysis to include a broader range of models (e.g., encoder-decoder architectures like T5 or mT5) and languages (e.g., those with non-alphabetic scripts like Chinese, Arabic, or Devanagari) to assess the generality of surface-form conditioning.
- ✓ Investigate mitigation strategies, such as script-aware training (e.g., augmenting data with script variations) or architectural modifications (e.g., cross-script attention mechanisms), to reduce orthography-induced biases.
- ✓ Conduct user studies or task-specific evaluations (e.g., machine translation, cross-lingual retrieval) to measure how surface-form conditioning impacts real-world performance and user trust in multilingual systems.
Sources
Original: arXiv - cs.CL