CAMA: Exploring Collusive Adversarial Attacks in c-MARL
arXiv:2603.20390v1 Announce Type: new Abstract: Cooperative multi-agent reinforcement learning (c-MARL) has been widely deployed in real-world applications, such as social robots, embodied intelligence, UAV swarms, etc. Nevertheless, many adversarial attacks still exist to threaten various c-MARL systems. At present, the...
Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP
arXiv:2603.20405v1 Announce Type: new Abstract: We report on an experiment in which Claude Opus~4.6, equipped with a suite of Model Context Protocol (MCP) tools for the Rocq proof assistant, autonomously proved 10 of 12 problems from the 2025 Putnam Mathematical...
Thinking in Different Spaces: Domain-Specific Latent Geometry Survives Cross-Architecture Translation
arXiv:2603.20406v1 Announce Type: new Abstract: We investigate whether independently trained language models converge to geometrically compatible latent representations, and whether this compatibility can be exploited to correct model behavior at inference time without any weight updates. We learn a linear...
Data-driven discovery of roughness descriptors for surface characterization and intimate contact modeling of unidirectional composite tapes
arXiv:2603.20418v1 Announce Type: new Abstract: Unidirectional tapes surface roughness determines the evolution of the degree of intimate contact required for ensuring the thermoplastic molecular diffusion and the associated inter-tapes consolidation during manufacturing of composite structures. However, usual characterization of rough...
Does This Gradient Spark Joy?
arXiv:2603.20526v1 Announce Type: new Abstract: Policy gradient computes a backward pass for every sample, even though the backward pass is expensive and most samples carry little learning value. The Delightful Policy Gradient (DG) provides a forward-pass signal of learning value:...
Understanding Behavior Cloning with Action Quantization
arXiv:2603.20538v1 Announce Type: new Abstract: Behavior cloning is a fundamental paradigm in machine learning, enabling policy learning from expert demonstrations across robotics, autonomous driving, and generative models. Autoregressive models like transformer have proven remarkably effective, from large language models (LLMs)...
Beyond Token Eviction: Mixed-Dimension Budget Allocation for Efficient KV Cache Compression
arXiv:2603.20616v1 Announce Type: new Abstract: Key-value (KV) caching is widely used to accelerate transformer inference, but its memory cost grows linearly with input length, limiting long-context deployment. Existing token eviction methods reduce memory by discarding less important tokens, which can...
Centrality-Based Pruning for Efficient Echo State Networks
arXiv:2603.20684v1 Announce Type: new Abstract: Echo State Networks (ESNs) are a reservoir computing framework widely used for nonlinear time-series prediction. However, despite their effectiveness, the randomly initialized reservoir often contains redundant nodes, leading to unnecessary computational overhead and reduced efficiency....
Evaluating Uplift Modeling under Structural Biases: Insights into Metric Stability and Model Robustness
arXiv:2603.20775v1 Announce Type: new Abstract: In personalized marketing, uplift models estimate incremental effects by modeling how customer behavior changes under alternative treatments. However, real-world data often exhibit biases - such as selection bias, spillover effects, and unobserved confounding - which...
OmniPatch: A Universal Adversarial Patch for ViT-CNN Cross-Architecture Transfer in Semantic Segmentation
arXiv:2603.20777v1 Announce Type: new Abstract: Robust semantic segmentation is crucial for safe autonomous driving, yet deployed models remain vulnerable to black-box adversarial attacks when target weights are unknown. Most existing approaches either craft image-wide perturbations or optimize patches for a...
Achieving $\widetilde{O}(1/\epsilon)$ Sample Complexity for Bilinear Systems Identification under Bounded Noises
arXiv:2603.20819v1 Announce Type: new Abstract: This paper studies finite-sample set-membership identification for discrete-time bilinear systems under bounded symmetric log-concave disturbances. Compared with existing finite-sample results for linear systems and related analyses under stronger noise assumptions, we consider the more challenging...
Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way
Gimlet Labs just raised an $80 million Series A for tech that lets AI run across NVIDIA, AMD, Intel, ARM, Cerebras and d-Matrix chips, simultaneously.
Learning Dynamic Belief Graphs for Theory-of-mind Reasoning
arXiv:2603.20170v1 Announce Type: new Abstract: Theory of Mind (ToM) reasoning with Large Language Models (LLMs) requires inferring how people's implicit, evolving beliefs shape what they seek and how they act under uncertainty -- especially in high-stakes settings such as disaster...
PA2D-MORL: Pareto Ascent Directional Decomposition based Multi-Objective Reinforcement Learning
arXiv:2603.19579v1 Announce Type: new Abstract: Multi-objective reinforcement learning (MORL) provides an effective solution for decision-making problems involving conflicting objectives. However, achieving high-quality approximations to the Pareto policy set remains challenging, especially in complex tasks with continuous or high-dimensional state-action space....
GeoChallenge: A Multi-Answer Multiple-Choice Benchmark for Geometric Reasoning with Diagrams
arXiv:2603.19252v1 Announce Type: cross Abstract: Evaluating the symbolic reasoning of large language models (LLMs) calls for geometry benchmarks that require multi-step proofs grounded in both text and diagrams. However, existing benchmarks are often limited in scale and rarely provide visually...
Pitfalls in Evaluating Interpretability Agents
arXiv:2603.20101v1 Announce Type: new Abstract: Automated interpretability systems aim to reduce the need for human labor and scale analysis to increasingly large models and diverse tasks. Recent efforts toward this goal leverage large language models (LLMs) at increasing levels of...
Hyperagents
arXiv:2603.19461v1 Announce Type: new Abstract: Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to self-improvement rely on fixed, handcrafted meta-level mechanisms, fundamentally limiting how fast such...
Teaching an Agent to Sketch One Part at a Time
arXiv:2603.19500v1 Announce Type: new Abstract: We develop a method for producing vector sketches one part at a time. To do this, we train a multi-modal language model-based agent using a novel multi-turn process-reward reinforcement learning following supervised fine-tuning. Our approach...
Utility-Guided Agent Orchestration for Efficient LLM Tool Use
arXiv:2603.19896v1 Announce Type: new Abstract: Tool-using large language model (LLM) agents often face a fundamental tension between answer quality and execution cost. Fixed workflows are stable but inflexible, while free-form multi-step reasoning methods such as ReAct may improve task performance...
ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models
arXiv:2603.19515v1 Announce Type: new Abstract: Large language models (LLMs) with advanced cognitive capabilities are emerging as agents for various reasoning and planning tasks. Traditional evaluations often focus on specific reasoning or planning questions within controlled environments. Recent studies have explored...
DuCCAE: A Hybrid Engine for Immersive Conversation via Collaboration, Augmentation, and Evolution
arXiv:2603.19248v1 Announce Type: cross Abstract: Immersive conversational systems in production face a persistent trade-off between responsiveness and long-horizon task capability. Real-time interaction is achievable for lightweight turns, but requests involving planning and tool invocation (e.g., search and media generation) produce...
Full-Stack Domain Enhancement for Combustion LLMs: Construction and Optimization
arXiv:2603.19268v1 Announce Type: cross Abstract: Large language models (LLMs) in the direction of task adaptation and capability enhancement for professional fields demonstrate significant application potential. Nevertheless, for complex physical systems such as combustion science, general-purpose LLMs often generate severe hallucinations...
A Human-Centered Workflow for Using Large Language Models in Content Analysis
arXiv:2603.19271v1 Announce Type: cross Abstract: While many researchers use Large Language Models (LLMs) through chat-based access, their real potential lies in leveraging LLMs via application programming interfaces (APIs). This paper conceptualizes LLMs as universal text processing machines and presents a...
CURE: A Multimodal Benchmark for Clinical Understanding and Retrieval Evaluation
arXiv:2603.19274v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) demonstrate considerable potential in clinical diagnostics, a domain that inherently requires synthesizing complex visual and textual data alongside consulting authoritative medical literature. However, existing benchmarks primarily evaluate MLLMs in end-to-end...
Generalized Stock Price Prediction for Multiple Stocks Combined with News Fusion
arXiv:2603.19286v1 Announce Type: cross Abstract: Predicting stock prices presents challenges in financial forecasting. While traditional approaches such as ARIMA and RNNs are prevalent, recent developments in Large Language Models (LLMs) offer alternative methodologies. This paper introduces an approach that integrates...
Speculating Experts Accelerates Inference for Mixture-of-Experts
arXiv:2603.19289v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models have gained popularity as a means of scaling the capacity of large language models (LLMs) while maintaining sparse activations and reduced per-token compute. However, in memory-constrained inference settings, expert weights must be...
Spelling Correction in Healthcare Query-Answer Systems: Methods, Retrieval Impact, and Empirical Evaluation
arXiv:2603.19249v1 Announce Type: new Abstract: Healthcare question-answering (QA) systems face a persistent challenge: users submit queries with spelling errors at rates substantially higher than those found in the professional documents they search. This paper presents the first controlled study of...
From Comprehension to Reasoning: A Hierarchical Benchmark for Automated Financial Research Reporting
arXiv:2603.19254v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to generate financial research reports, shifting from auxiliary analytic tools to primary content producers. Yet recent real-world deployments reveal persistent failures--factual errors, numerical inconsistencies, fabricated references, and shallow...
Constraint-aware Path Planning from Natural Language Instructions Using Large Language Models
arXiv:2603.19257v1 Announce Type: new Abstract: Real-world path planning tasks typically involve multiple constraints beyond simple route optimization, such as the number of routes, maximum route length, depot locations, and task-specific requirements. Traditional approaches rely on dedicated formulations and algorithms for...
Significance-Gain Pair Encoding for LLMs: A Statistical Alternative to Frequency-Based Subword Merging
arXiv:2603.19261v1 Announce Type: new Abstract: Subword tokenization is a key design choice for modern language models, including large language models (LLMs), with byte- and character-level BPE serving as a widely used baseline. Standard BPE selects merges by raw pair frequency,...