Post Fusion Bird's Eye View Feature Stabilization for Robust Multimodal 3D Detection
arXiv:2603.05623v1 Announce Type: cross Abstract: Camera-LiDAR fusion is widely used in autonomous driving to enable accurate 3D object detection. However, bird's-eye view (BEV) fusion detectors can degrade significantly under domain shift and sensor failures, limiting reliability in real-world deployment. Existing...
On the Reliability of AI Methods in Drug Discovery: Evaluation of Boltz-2 for Structure and Binding Affinity Prediction
arXiv:2603.05532v1 Announce Type: cross Abstract: Despite continuing hype about the role of AI in drug discovery, no "AI-discovered drugs" have so far received regulatory approval. Here we assess one of the latest AI based tools in this domain. The ability...
Molecular Representations for AI in Chemistry and Materials Science: An NLP Perspective
arXiv:2603.05525v1 Announce Type: cross Abstract: Deep learning, a subfield of machine learning, has gained importance in various application areas in recent years. Its growing popularity has led it to enter the natural sciences as well. This has created the need...
Model Change for Description Logic Concepts
arXiv:2603.05562v1 Announce Type: cross Abstract: We consider the problem of modifying a description logic concept in light of models represented as pointed interpretations. We call this setting model change, and distinguish three main kinds of changes: eviction, which consists of...
EigenData: A Self-Evolving Multi-Agent Platform for Function-Calling Data Synthesis, Auditing, and Repair
arXiv:2603.05553v1 Announce Type: cross Abstract: Function-calling agents -- large language models that invoke tools and APIs -- require high-quality, domain-specific training data spanning executable environments, backing databases, and diverse multi-turn trajectories. We introduce EigenData, an integrated, self-evolving platform that automates...
RACAS: Controlling Diverse Robots With a Single Agentic System
arXiv:2603.05621v1 Announce Type: cross Abstract: Many robotic platforms expose an API through which external software can command their actuators and read their sensors. However, transitioning from these low-level interfaces to high-level autonomous behaviour requires a complicated pipeline, whose components demand...
VDCook:DIY video data cook your MLLMs
arXiv:2603.05539v1 Announce Type: cross Abstract: We introduce VDCook: a self-evolving video data operating system, a configurable video data construction platform for researchers and vertical domain teams. Users initiate data requests via natural language queries and adjustable parameters (scale, retrieval-synthesis ratio,...
When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On
arXiv:2603.05659v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) and Rubrics as Rewards (RaR) have driven strong gains in domains with clear correctness signals and even in subjective domains by synthesizing evaluation criteria from ideal reference answers. But...
Longitudinal Lesion Inpainting in Brain MRI via 3D Region Aware Diffusion
arXiv:2603.05693v1 Announce Type: cross Abstract: Accurate longitudinal analysis of brain MRI is often hindered by evolving lesions, which bias automated neuroimaging pipelines. While deep generative models have shown promise in inpainting these lesions, most existing methods operate cross-sectionally or lack...
The Rise of AI in Weather and Climate Information and its Impact on Global Inequality
arXiv:2603.05710v1 Announce Type: cross Abstract: The rapid adoption of AI in Earth system science promises unprecedented speed and fidelity in the generation of climate information. However, this technological prowess rests on a fragile and unequal foundation: the current trajectory of...
Verify as You Go: An LLM-Powered Browser Extension for Fake News Detection
arXiv:2603.05519v1 Announce Type: new Abstract: The rampant spread of fake news in the digital age poses serious risks to public trust and democratic institutions, underscoring the need for effective, transparent, and user-centered detection tools. Existing browser extensions often fall short...
Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding
arXiv:2603.05540v1 Announce Type: new Abstract: We study grammar-constrained decoding (GCD) as a coupling between an autoregressive next-token distribution and a reachability oracle over a pushdown system compiled from a context-free grammar (CFG). We prove an oracle invariance theorem: language-equivalent grammars...
NOTAI.AI: Explainable Detection of Machine-Generated Text via Curvature and Feature Attribution
arXiv:2603.05617v1 Announce Type: new Abstract: We present NOTAI.AI, an explainable framework for machine-generated text detection that extends Fast-DetectGPT by integrating curvature-based signals with neural and stylometric features in a supervised setting. The system combines 17 interpretable features, including Conditional Probability...
Let's Talk, Not Type: An Oral-First Multi-Agent Architecture for Guaran\'i
arXiv:2603.05743v1 Announce Type: new Abstract: Although artificial intelligence (AI) and Human-Computer Interaction (HCI) systems are often presented as universal solutions, their design remains predominantly text-first, underserving primarily oral languages and indigenous communities. This position paper uses Guaran\'i, an official and...
CodeScout: Contextual Problem Statement Enhancement for Software Agents
arXiv:2603.05744v1 Announce Type: new Abstract: Current AI-powered code assistance tools often struggle with poorly-defined problem statements that lack sufficient task context and requirements specification. Recent analysis of software engineering agents reveals that failures on such underspecified requests are highly correlated...
NERdME: a Named Entity Recognition Dataset for Indexing Research Artifacts in Code Repositories
arXiv:2603.05750v1 Announce Type: new Abstract: Existing scholarly information extraction (SIE) datasets focus on scientific papers and overlook implementation-level details in code repositories. README files describe datasets, source code, and other implementation-level artifacts, however, their free-form Markdown offers little semantic structure,...
RouteGoT: Node-Adaptive Routing for Cost-Efficient Graph of Thoughts Reasoning
arXiv:2603.05818v1 Announce Type: new Abstract: Large Language Models (LLMs) excel at multi-step reasoning, yet increasing the structural complexity of inference does not consistently improve system-level returns. Methods such as Tree of Thoughts (ToT), Graph of Thoughts (GoT), and Adaptive Graph...
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs
arXiv:2603.05890v1 Announce Type: new Abstract: What happens when a storyteller forgets its own story? Large Language Models (LLMs) can now generate narratives spanning tens of thousands of words, but they often fail to maintain consistency throughout. When generating long-form narratives,...
InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning
arXiv:2603.05909v1 Announce Type: new Abstract: LLMs are increasingly deployed in high-stakes domains such as medical triage and legal assistance, often as document-grounded QA systems in which a user provides a description, relevant sources are retrieved, and an LLM generates a...
Implicit Style Conditioning: A Structured Style-Rewrite Framework for Low-Resource Character Modeling
arXiv:2603.05933v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in role-playing (RP); however, small Language Models (SLMs) with highly stylized personas remains a challenge due to data scarcity and the complexity of style disentanglement. Standard Supervised...
Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL
arXiv:2603.05996v1 Announce Type: new Abstract: Generative language models have shown significant potential in single-turn Text-to-SQL. However, their performance does not extend equivalently to multi-turn Text-to-SQL. This is primarily due to generative language models' inadequacy in handling the complexities of context...
MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing
arXiv:2603.06007v1 Announce Type: new Abstract: Large language model-based (LLM-based) multi-agent systems (MAS) are increasingly used to extend agentic problem solving via role specialization and collaboration. MAS workflows can be naturally modeled as directed computation graphs, where nodes execute agents/sub-workflows and...
ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning
arXiv:2603.06024v1 Announce Type: new Abstract: Multi-view spatial reasoning remains difficult for current vision-language models. Even when multiple viewpoints are available, models often underutilize cross-view relations and instead rely on single-image shortcuts, leading to fragile performance on viewpoint transformation and occlusion-sensitive...
Diffusion Language Models Are Natively Length-Aware
arXiv:2603.06123v1 Announce Type: new Abstract: Unlike autoregressive language models, which terminate variable-length generation upon predicting an End-of-Sequence (EoS) token, Diffusion Language Models (DLMs) operate over a fixed maximum-length context window for a predetermined number of denoising steps. However, this process...
CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report Evaluation
arXiv:2603.06183v1 Announce Type: new Abstract: We introduce CRIMSON, a clinically grounded evaluation framework for chest X-ray report generation that assesses reports based on diagnostic correctness, contextual relevance, and patient safety. Unlike prior metrics, CRIMSON incorporates full clinical context, including patient...
LIT-RAGBench: Benchmarking Generator Capabilities of Large Language Models in Retrieval-Augmented Generation
arXiv:2603.06198v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) is a framework in which a Generator, such as a Large Language Model (LLM), produces answers by retrieving documents from an external collection using a Retriever. In practice, Generators must integrate evidence...
FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling
arXiv:2603.06199v1 Announce Type: new Abstract: Long-context modeling is a pivotal capability for Large Language Models, yet the quadratic complexity of attention remains a critical bottleneck, particularly during the compute-intensive prefilling phase. While various sparse attention mechanisms have been explored, they...
SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models
arXiv:2603.06222v1 Announce Type: new Abstract: Explicit Chain-of-Thought improves the reasoning performance of large language models but often incurs high inference cost due to verbose token-level traces. While recent approaches reduce this overhead via concise prompting or step pruning, they largely...
From Prompting to Preference Optimization: A Comparative Study of LLM-based Automated Essay Scoring
arXiv:2603.06424v1 Announce Type: new Abstract: Large language models (LLMs) have recently reshaped Automated Essay Scoring (AES), yet prior studies typically examine individual techniques in isolation, limiting understanding of their relative merits for English as a Second Language (L2) writing. To...
Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing
arXiv:2603.06503v1 Announce Type: new Abstract: Recent advances in multimodal Retrieval-Augmented Generation (RAG) enable Large Language Models (LLMs) to analyze enterprise spreadsheet workbooks containing millions of cells, cross-sheet dependencies, and embedded visual artifacts. However, state-of-the-art approaches exclude critical context through single-pass...