Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
arXiv:2603.09095v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) can process text presented as images, yet they often perform worse than when the same content is provided as textual tokens. We systematically diagnose this "modality gap" by evaluating seven...
AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents
arXiv:2603.09716v1 Announce Type: new Abstract: Autonomous agent frameworks still struggle to reconcile long-term experiential learning with real-time, context-sensitive decision-making. In practice, this gap appears as static cognition, rigid workflow dependence, and inefficient context usage, which jointly limit adaptability in open-ended...
LDP: An Identity-Aware Protocol for Multi-Agent LLM Systems
arXiv:2603.08852v1 Announce Type: new Abstract: As multi-agent AI systems grow in complexity, the protocols connecting them constrain their capabilities. Current protocols such as A2A and MCP do not expose model-level properties as first-class primitives, ignoring properties fundamental to effective delegation:...
Logics-Parsing-Omni Technical Report
arXiv:2603.09677v1 Announce Type: new Abstract: Addressing the challenges of fragmented task definitions and the heterogeneity of unstructured data in multimodal parsing, this paper proposes the Omni Parsing framework. This framework establishes a Unified Taxonomy covering documents, images, and audio-visual streams,...
AgentOS: From Application Silos to a Natural Language-Driven Data Ecosystem
arXiv:2603.08938v1 Announce Type: new Abstract: The rapid emergence of open-source, locally hosted intelligent agents marks a critical inflection point in human-computer interaction. Systems such as OpenClaw demonstrate that Large Language Model (LLM)-based agents can autonomously operate local computing environments, orchestrate...
PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs
arXiv:2603.09943v1 Announce Type: new Abstract: Computational pathology demands both visual pattern recognition and dynamic integration of structured domain knowledge, including taxonomy, grading criteria, and clinical evidence. In practice, diagnostic reasoning requires linking morphological evidence with formal diagnostic and grading criteria....
Logos: An evolvable reasoning engine for rational molecular design
arXiv:2603.09268v1 Announce Type: new Abstract: The discovery and design of functional molecules remain central challenges across chemistry,biology, and materials science. While recent advances in machine learning have accelerated molecular property prediction and candidate generation, existing models tend to excel either...
Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents
arXiv:2603.09203v1 Announce Type: new Abstract: Retrieval-augmented agents can query external evidence, yet their reliability in multi-step reasoning remains limited: noisy retrieval may derail multi-hop question answering, while outcome-only reinforcement learning provides credit signals that are too coarse to optimize intermediate...
Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts
arXiv:2603.09890v1 Announce Type: new Abstract: Large Language Models (LLMs) have emerged as a new paradigm for multi-agent systems. However, existing research on the behaviour of LLM-based multi-agents relies on ad hoc prompts and lacks a principled policy perspective. Different from...
Quantifying the Necessity of Chain of Thought through Opaque Serial Depth
arXiv:2603.09786v1 Announce Type: new Abstract: Large language models (LLMs) tend to externalize their reasoning in their chain of thought, making the chain of thought a good target for monitoring. This is partially an inherent feature of the Transformer architecture: sufficiently...
ALARM: Audio-Language Alignment for Reasoning Models
arXiv:2603.09556v1 Announce Type: new Abstract: Large audio language models (ALMs) extend LLMs with auditory understanding. A common approach freezes the LLM and trains only an adapter on self-generated targets. However, this fails for reasoning LLMs (RLMs) whose built-in chain-of-thought traces...
Understanding the Interplay between LLMs' Utilisation of Parametric and Contextual Knowledge: A keynote at ECIR 2025
arXiv:2603.09654v1 Announce Type: new Abstract: Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model's inner workings and further for updating or...
Beyond Fine-Tuning: Robust Food Entity Linking under Ontology Drift with FoodOntoRAG
arXiv:2603.09758v1 Announce Type: new Abstract: Standardizing food terms from product labels and menus into ontology concepts is a prerequisite for trustworthy dietary assessment and safety reporting. The dominant approach to Named Entity Linking (NEL) in the food and nutrition domains...
Chow-Liu Ordering for Long-Context Reasoning in Chain-of-Agents
arXiv:2603.09835v1 Announce Type: new Abstract: Sequential multi-agent reasoning frameworks such as Chain-of-Agents (CoA) handle long-context queries by decomposing inputs into chunks and processing them sequentially using LLM-based worker agents that read from and update a bounded shared memory. From a...
VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs
arXiv:2603.08936v1 Announce Type: cross Abstract: Speech Large Language Models (LLMs) show great promise for speech emotion recognition (SER) via generative interfaces. However, shifting from closed-set classification to open text generation introduces zero-shot stochasticity, making evaluation highly sensitive to prompts. Additionally,...
BiCLIP: Domain Canonicalization via Structured Geometric Transformation
arXiv:2603.08942v1 Announce Type: cross Abstract: Recent advances in vision-language models (VLMs) have demonstrated remarkable zero-shot capabilities, yet adapting these models to specialized domains remains a significant challenge. Building on recent theoretical insights suggesting that independently trained VLMs are related by...
Multi-level meta-reinforcement learning with skill-based curriculum
arXiv:2603.08773v1 Announce Type: new Abstract: We consider problems in sequential decision making with natural multi-level structure, where sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure has remained a longstanding challenge; we describe an efficient...
Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models
arXiv:2603.08859v1 Announce Type: new Abstract: Hybrid sequence models--combining Transformer and state-space model layers--seek to gain the expressive versatility of attention as well as the computational efficiency of state-space model layers. Despite burgeoning interest in hybrid models, we lack a basic...
Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates
arXiv:2603.08914v1 Announce Type: new Abstract: Over-parameterized neural networks incur prohibitive memory and computational costs for resource-constrained deployment. The Strong Lottery Ticket (SLT) hypothesis suggests that randomly initialized networks contain sparse subnetworks achieving competitive accuracy without weight training. Existing SLT methods,...
Sim2Act: Robust Simulation-to-Decision Learning via Adversarial Calibration and Group-Relative Perturbation
arXiv:2603.09053v1 Announce Type: new Abstract: Simulation-to-decision learning enables safe policy training in digital environments without risking real-world deployment, and has become essential in mission-critical domains such as supply chains and industrial systems. However, simulators learned from noisy or biased real-world...
Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL
arXiv:2603.09161v1 Announce Type: new Abstract: Learning effective netlist representations is fundamentally constrained by the scarcity of labeled datasets, as real designs are protected by Intellectual Property (IP) and costly to annotate. Existing work therefore focuses on small-scale circuits with clean...
Proxy-Guided Measurement Calibration
arXiv:2603.09288v1 Announce Type: new Abstract: Aggregate outcome variables collected through surveys and administrative records are often subject to systematic measurement error. For instance, in disaster loss databases, county-level losses reported may differ from the true damages due to variations in...
The how and why of gun control
A Second Opinion is a recurring series by Haley Proctor on the Second Amendment and constitutional litigation. Last Monday, the Supreme Court heard argument in United States v. Hemani. In […]The postThe how and why of gun controlappeared first onSCOTUSblog.
AI Now Co-ED Amba Kak Gives Remarks Before the UN General Assembly on AI Governance - AI Now Institute
Google gives in to users’ complaints over AI-powered ‘Ask Photos’ search feature
The option appears on the Google Photos Search screen and lets users pick which experience they want.
Dissecting racial bias in an algorithm used to manage the health of populations
Racial bias in health algorithms The U.S. health care system uses commercial algorithms to guide health decisions. Obermeyer et al. find evidence of racial bias in one widely used algorithm, such that Black patients assigned the same level of risk...
"Dark Triad" Model Organisms of Misalignment: Narrow Fine-Tuning Mirrors Human Antisocial Behavior
arXiv:2603.06816v1 Announce Type: new Abstract: The alignment problem refers to concerns regarding powerful intelligences, ensuring compatibility with human preferences and values as capabilities increase. Current large language models (LLMs) show misaligned behaviors, such as strategic deception, manipulation, and reward-seeking, that...
A Dynamic Self-Evolving Extraction System
arXiv:2603.06915v1 Announce Type: new Abstract: The extraction of structured information from raw text is a fundamental component of many NLP applications, including document retrieval, ranking, and relevance estimation. High-quality extractions often require domain-specific accuracy, up-to-date understanding of specialized taxonomies, and...
Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale
arXiv:2603.06592v1 Announce Type: new Abstract: Contemporary studies have uncovered many puzzling phenomena in the neural information processing of Transformer-based language models. Building a robust, unified understanding of these phenomena requires disassembling a model within the scope of its training. While...
Hit-RAG: Learning to Reason with Long Contexts via Preference Alignment
arXiv:2603.07023v1 Announce Type: new Abstract: Despite the promise of Retrieval-Augmented Generation in grounding Multimodal Large Language Models with external knowledge, the transition to extensive contexts often leads to significant attention dilution and reasoning hallucinations. The surge in information density causes...