Academic

Meissa: Multi-modal Medical Agentic Intelligence

arXiv:2603.09018v1 Announce Type: new Abstract: Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However, these systems rely almost entirely on frontier models (e.g., GPT), whose API-based deployment incurs high cost, high latency, and privacy risks that conflict with on-premise clinical requirements. We present Meissa, a lightweight 4B-parameter medical MM-LLM that brings agentic capability offline. Instead of imitating static answers, Meissa learns both when to engage external interaction (strategy selection) and how to execute multi-step interaction (strategy execution) by distilling structured trajectories from frontier models. Specifically, we propose: (1) Unified trajectory modeling: trajectories (reasoning and action traces) are represented within a single state-action-observation

Y
Yixiong Chen, Xinyi Bai, Yue Pan, Zongwei Zhou, Alan Yuille
· · 1 min read · 3 views

arXiv:2603.09018v1 Announce Type: new Abstract: Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However, these systems rely almost entirely on frontier models (e.g., GPT), whose API-based deployment incurs high cost, high latency, and privacy risks that conflict with on-premise clinical requirements. We present Meissa, a lightweight 4B-parameter medical MM-LLM that brings agentic capability offline. Instead of imitating static answers, Meissa learns both when to engage external interaction (strategy selection) and how to execute multi-step interaction (strategy execution) by distilling structured trajectories from frontier models. Specifically, we propose: (1) Unified trajectory modeling: trajectories (reasoning and action traces) are represented within a single state-action-observation formalism, allowing one model to generalize across heterogeneous medical environments. (2) Three-tier stratified supervision: the model's own errors trigger progressive escalation from direct reasoning to tool-augmented and multi-agent interaction, explicitly learning difficulty-aware strategy selection. (3) Prospective-retrospective supervision: pairing exploratory forward traces with hindsight-rationalized execution traces enables stable learning of effective interaction policies. Trained on 40K curated trajectories, Meissa matches or exceeds proprietary frontier agents in 10 of 16 evaluation settings across 13 medical benchmarks spanning radiology, pathology, and clinical reasoning. Using over 25x fewer parameters than typical frontier models like Gemini-3, Meissa operates fully offline with 22x lower end-to-end latency compared to API-based deployment. Data, models, and environments are released at https://github.com/Schuture/Meissa.

Executive Summary

Meissa, a novel multi-modal medical agentic intelligence, is proposed to address the limitations of frontier models in medical decision-making. By distilling structured trajectories from these models, Meissa learns to engage external interaction and execute multi-step interaction offline. The system's key features include unified trajectory modeling, three-tier stratified supervision, and prospective-retrospective supervision. Meissa outperforms or matches proprietary frontier agents in 10 out of 16 evaluation settings and operates with significantly lower latency and parameter count. This breakthrough has far-reaching implications for the integration of artificial intelligence in healthcare, offering a more efficient and scalable solution for clinical decision-making. As a result, Meissa has the potential to improve patient care and reduce healthcare costs.

Key Points

  • Meissa is a lightweight 4B-parameter medical MM-LLM that brings agentic capability offline.
  • The system learns to engage external interaction and execute multi-step interaction through distillation of structured trajectories from frontier models.
  • Meissa outperforms or matches proprietary frontier agents in 10 out of 16 evaluation settings.

Merits

Scalability

Meissa operates fully offline with significantly lower latency and parameter count compared to API-based deployment.

Efficiency

The system's unified trajectory modeling and three-tier stratified supervision enable efficient learning of effective interaction policies.

Flexibility

Meissa's prospective-retrospective supervision allows the model to generalize across heterogeneous medical environments.

Demerits

Data Requirements

Meissa requires a large dataset of curated trajectories for training, which may be a limitation in certain settings.

Complexity

The system's architecture and training process may be complex and challenging to implement in practice.

Expert Commentary

Meissa represents a significant breakthrough in the development of multi-modal medical agentic intelligence. The system's ability to learn effective interaction policies and operate fully offline has far-reaching implications for the integration of AI in healthcare. However, the development and deployment of Meissa also raise important questions about the fairness, transparency, and regulation of AI systems in healthcare. As the use of AI in healthcare continues to grow, it is essential that we prioritize the development of transparent and explainable AI decision-making systems that prioritize patient care and safety.

Recommendations

  • Further research is needed to explore the limitations and potential biases of Meissa and to develop strategies for mitigating these risks.
  • The development and deployment of Meissa should be accompanied by a comprehensive assessment of the system's fairness, transparency, and explainability.

Sources