Academic

Tracing Pharmacological Knowledge In Large Language Models

arXiv:2603.03407v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong empirical performance across pharmacology and drug discovery tasks, yet the internal mechanisms by which they encode pharmacological knowledge remain poorly understood. In this work, we investigate how drug-group semantics are represented and retrieved within Llama-based biomedical language models using causal and probing-based interpretability methods. We apply activation patching to localize where drug-group information is stored across model layers and token positions, and complement this analysis with linear probes trained on token-level and sum-pooled activations. Our results demonstrate that early layers play a key role in encoding drug-group knowledge, with the strongest causal effects arising from intermediate tokens within the drug-group span rather than the final drug-group token. Linear probing further reveals that pharmacological semantics are distributed across tokens and are al

arXiv:2603.03407v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong empirical performance across pharmacology and drug discovery tasks, yet the internal mechanisms by which they encode pharmacological knowledge remain poorly understood. In this work, we investigate how drug-group semantics are represented and retrieved within Llama-based biomedical language models using causal and probing-based interpretability methods. We apply activation patching to localize where drug-group information is stored across model layers and token positions, and complement this analysis with linear probes trained on token-level and sum-pooled activations. Our results demonstrate that early layers play a key role in encoding drug-group knowledge, with the strongest causal effects arising from intermediate tokens within the drug-group span rather than the final drug-group token. Linear probing further reveals that pharmacological semantics are distributed across tokens and are already present in the embedding space, with token-level probes performing near chance while sum-pooled representations achieve maximal accuracy. Together, these findings suggest that drug-group semantics in LLMs are not localized to single tokens but instead arise from distributed representations. This study provides the first systematic mechanistic analysis of pharmacological knowledge in LLMs, offering insights into how biomedical semantics are encoded in large language models.

Executive Summary

This study represents a significant advance in understanding how large language models encode pharmacological knowledge. By employing causal and probing-based interpretability methods—specifically activation patching and linear probing—the authors effectively localized the representation of drug-group semantics across model layers. Their findings reveal that drug-group knowledge is not confined to a single token but is instead distributed across tokens, with causal effects strongest from intermediate tokens within the drug-group span. Importantly, linear probing confirmed that pharmacological semantics are present in the embedding space early on, with sum-pooled representations outperforming token-level probes. These results provide a foundational mechanistic framework for interpreting biomedical content in LLMs, offering new insights into the architecture of biomedical language models. The work bridges a critical gap between empirical performance and interpretability in pharmacology-related AI applications.

Key Points

  • Drug-group semantics are distributed across tokens rather than localized to single tokens.
  • Early layers play a key role in encoding pharmacological knowledge, with intermediate tokens showing stronger causal effects.
  • Sum-pooled representations achieve higher accuracy than token-level probes, indicating a distributed representation mechanism.

Merits

Strength

The use of both causal and probing-based methods provides a robust, multi-layered interpretability framework that is both theoretically grounded and empirically validated.

Demerits

Limitation

The study focuses primarily on Llama-based models and does not extend its analysis to other architectures or diverse pharmacology datasets, potentially limiting generalizability.

Expert Commentary

The article presents a meticulously designed and executed investigation into the internal mechanisms of LLMs in pharmacology. The combination of activation patching and linear probing represents a methodological tour de force, offering a nuanced view of how pharmacological knowledge is encoded. What is particularly compelling is the implication that distributed representations are central to semantic encoding—this challenges the conventional assumption that knowledge is token-centric and opens new avenues for model auditing and refinement. Furthermore, the findings align with broader trends in AI interpretability research, emphasizing the importance of contextual and contextualized knowledge encoding. As the field moves toward more accountable AI systems in healthcare, this work provides a necessary foundation for understanding the 'black box' in biomedical LLMs. While the study’s scope is commendable, future work should incorporate comparative analyses across architectures and diverse pharmacology domains to strengthen cross-domain applicability.

Recommendations

  • 1. Expand analysis to other LLM architectures (e.g., GPT, T5) and pharmacology subfields to validate robustness.
  • 2. Develop standardized interpretability benchmarks for biomedical LLMs to enable reproducibility and cross-model comparison.

Sources