Skip to main content
Academic

Indic-TunedLens: Interpreting Multilingual Models in Indian Languages

arXiv:2602.15038v1 Announce Type: cross Abstract: Multilingual large language models (LLMs) are increasingly deployed in linguistically diverse regions like India, yet most interpretability tools remain tailored to English. Prior work reveals that LLMs often operate in English centric representation spaces, making cross lingual interpretability a pressing concern. We introduce Indic-TunedLens, a novel interpretability framework specifically for Indian languages that learns shared affine transformations. Unlike the standard Logit Lens, which directly decodes intermediate activations, Indic-TunedLens adjusts hidden states for each target language, aligning them with the target output distributions to enable more faithful decoding of model representations. We evaluate our framework on 10 Indian languages using the MMLU benchmark and find that it significantly improves over SOTA interpretability methods, especially for morphologically rich, low resource languages. Our results provide cruc

M
Mihir Panchal, Deeksha Varshney, Mamta, Asif Ekbal
· · 1 min read · 2 views

arXiv:2602.15038v1 Announce Type: cross Abstract: Multilingual large language models (LLMs) are increasingly deployed in linguistically diverse regions like India, yet most interpretability tools remain tailored to English. Prior work reveals that LLMs often operate in English centric representation spaces, making cross lingual interpretability a pressing concern. We introduce Indic-TunedLens, a novel interpretability framework specifically for Indian languages that learns shared affine transformations. Unlike the standard Logit Lens, which directly decodes intermediate activations, Indic-TunedLens adjusts hidden states for each target language, aligning them with the target output distributions to enable more faithful decoding of model representations. We evaluate our framework on 10 Indian languages using the MMLU benchmark and find that it significantly improves over SOTA interpretability methods, especially for morphologically rich, low resource languages. Our results provide crucial insights into the layer-wise semantic encoding of multilingual transformers. Our model is available at https://huggingface.co/spaces/AnonymousAccountACL/IndicTunedLens. Our code is available at https://github.com/AnonymousAccountACL/IndicTunedLens.

Executive Summary

The article introduces Indic-TunedLens, a novel interpretability framework designed for multilingual large language models (LLMs) in Indian languages. The framework learns shared affine transformations to align hidden states with target output distributions, enabling more faithful decoding of model representations. Evaluations on 10 Indian languages demonstrate significant improvements over state-of-the-art (SOTA) interpretability methods, particularly for morphologically rich and low-resource languages. The framework provides valuable insights into layer-wise semantic encoding of multilingual transformers and is made available for public use.

Key Points

  • Introduction of Indic-TunedLens, a novel interpretability framework for multilingual LLMs in Indian languages
  • The framework learns shared affine transformations to align hidden states with target output distributions
  • Evaluations on 10 Indian languages demonstrate significant improvements over SOTA interpretability methods

Merits

Improved Interpretability

Indic-TunedLens provides more faithful decoding of model representations, enabling better understanding of LLMs in Indian languages

Demerits

Limited Scope

The framework is currently limited to Indian languages and may not be directly applicable to other languages or regions

Expert Commentary

The introduction of Indic-TunedLens represents a significant step forward in the development of interpretability tools for multilingual LLMs. By learning shared affine transformations, the framework is able to capture the nuances of Indian languages and provide more accurate insights into model representations. However, further research is needed to extend the framework to other languages and regions, and to explore its potential applications in real-world scenarios. The availability of the framework and code for public use is a welcome development, and is likely to facilitate further innovation and collaboration in the field.

Recommendations

  • Future research should focus on extending Indic-TunedLens to other languages and regions, and exploring its potential applications in natural language processing and machine translation
  • The development of language-specific interpretability frameworks like Indic-TunedLens should be prioritized to ensure that AI systems are fair, transparent, and accountable in multilingual societies

Sources