RedacBench: Can AI Erase Your Secrets?
arXiv:2603.20208v1 Announce Type: new Abstract: Modern language models can readily extract sensitive information from unstructured text, making redaction -- the selective removal of such information -- critical for data security. However, existing benchmarks for redaction typically focus on predefined categories...
Leveraging Natural Language Processing and Machine Learning for Evidence-Based Food Security Policy Decision-Making in Data-Scarce Making
arXiv:2603.20425v1 Announce Type: new Abstract: Food security policy formulation in data-scarce regions remains a critical challenge due to limited structured datasets, fragmented textual reports, and demographic bias in decision-making systems. This study proposes ZeroHungerAI, an integrated Natural Language Processing (NLP)...
Towards Intelligent Geospatial Data Discovery: a knowledge graph-driven multi-agent framework powered by large language models
arXiv:2603.20670v1 Announce Type: new Abstract: The rapid growth in the volume, variety, and velocity of geospatial data has created data ecosystems that are highly distributed, heterogeneous, and semantically inconsistent. Existing data catalogs, portals, and infrastructures still rely largely on keyword-based...
ARYA: A Physics-Constrained Composable & Deterministic World Model Architecture
arXiv:2603.21340v1 Announce Type: new Abstract: This paper presents ARYA, a composable, physics-constrained, deterministic world model architecture built on five foundational principles: nano models, composability, causal reasoning, determinism, and architectural AI safety. We demonstrate that ARYA satisfies all canonical world model...
Context Cartography: Toward Structured Governance of Contextual Space in Large Language Model Systems
arXiv:2603.20578v1 Announce Type: new Abstract: The prevailing approach to improving large language model (LLM) reasoning has centered on expanding context windows, implicitly assuming that more tokens yield better performance. However, empirical evidence - including the "lost in the middle" effect...
Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs
arXiv:2603.21155v1 Announce Type: new Abstract: Text-attributed graphs (TAGs) enhance graph learning by integrating rich textual semantics and topological context for each node. While boosting expressiveness, they also expose new vulnerabilities in graph learning through text-based adversarial surfaces. Recent advances leverage...
Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education
arXiv:2603.20255v1 Announce Type: new Abstract: Speech-based AI educational applications have gained significant interest in recent years, particularly for children. However, children speech research remains limited due to the lack of publicly available datasets, especially for low-resource languages such as Arabic.This...
Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models
arXiv:2603.20212v1 Announce Type: new Abstract: Reward models (RMs) are critical for aligning Large Language Models via Reinforcement Learning from Human Feedback (RLHF). While Generative Reward Models (GRMs) achieve superior accuracy through chain-of-thought (CoT) reasoning, they incur substantial computational costs. Conversely,...
ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics
arXiv:2603.20260v1 Announce Type: new Abstract: The integration of Large Language Models into Multi-Agent Systems (MAS) has enabled the so-lution of complex, long-horizon tasks through collaborative reasoning. However, this collec-tive intelligence is inherently fragile, as a single logical fallacy can rapidly...
Do LLM-Driven Agents Exhibit Engagement Mechanisms? Controlled Tests of Information Load, Descriptive Norms, and Popularity Cues
arXiv:2603.20911v1 Announce Type: new Abstract: Large language models make agent-based simulation more behaviorally expressive, but they also sharpen a basic methodological tension: fluent, human-like output is not, by itself, evidence for theory. We evaluate what an LLM-driven simulation can credibly...
Supporting Our Community’s Infrastructure: NeurIPS Foundation’s Donation to OpenReview
Knowledge Boundary Discovery for Large Language Models
arXiv:2603.21022v1 Announce Type: new Abstract: We propose Knowledge Boundary Discovery (KBD), a reinforcement learning based framework to explore the knowledge boundaries of the Large Language Models (LLMs). We define the knowledge boundary by automatically generating two types of questions: (i)...
Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions
arXiv:2603.20925v1 Announce Type: new Abstract: As agentic systems move into real-world deployments, their decisions increasingly depend on external inputs such as retrieved content, tool outputs, and information provided by other actors. When these inputs can be strategically shaped by adversaries,...
Expected Reward Prediction, with Applications to Model Routing
arXiv:2603.20217v1 Announce Type: new Abstract: Reward models are a standard tool to score responses from LLMs. Reward models are built to rank responses to a fixed prompt sampled from a single model, for example to choose the best of n...
The AI Scientific Community: Agentic Virtual Lab Swarms
arXiv:2603.21344v1 Announce Type: new Abstract: In this short note we propose using agentic swarms of virtual labs as a model of an AI Science Community. In this paradigm, each particle in the swarm represents a complete virtual laboratory instance, enabling...
NeurIPS Datasets & Benchmarks Track: From Art to Science in AI Evaluations
Decoding the decoder: Contextual sequence-to-sequence modeling for intracortical speech decoding
arXiv:2603.20246v1 Announce Type: new Abstract: Speech brain--computer interfaces require decoders that translate intracortical activity into linguistic output while remaining robust to limited data and day-to-day variability. While prior high-performing systems have largely relied on framewise phoneme decoding combined with downstream...
ReLaMix: Residual Latency-Aware Mixing for Delay-Robust Financial Time-Series Forecasting
arXiv:2603.20869v1 Announce Type: new Abstract: Financial time-series forecasting in real-world high-frequency markets is often hindered by delayed or partially stale observations caused by asynchronous data acquisition and transmission latency. To better reflect such practical conditions, we investigate a simulated delay...
AgentComm-Bench: Stress-Testing Cooperative Embodied AI Under Latency, Packet Loss, and Bandwidth Collapse
arXiv:2603.20285v1 Announce Type: new Abstract: Cooperative multi-agent methods for embodied AI are almost universally evaluated under idealized communication: zero latency, no packet loss, and unlimited bandwidth. Real-world deployment on robots with wireless links, autonomous vehicles on congested networks, or drone...
Enhancing Safety of Large Language Models via Embedding Space Separation
arXiv:2603.20206v1 Announce Type: new Abstract: Large language models (LLMs) have achieved impressive capabilities, yet ensuring their safety against harmful prompts remains a critical challenge. Recent work has revealed that the latent representations (embeddings) of harmful and safe queries in LLMs...
FactorSmith: Agentic Simulation Generation via Markov Decision Process Decomposition with Planner-Designer-Critic Refinement
arXiv:2603.20270v1 Announce Type: new Abstract: Generating executable simulations from natural language specifications remains a challenging problem due to the limited reasoning capacity of large language models (LLMs) when confronted with large, interconnected codebases. This paper presents FactorSmith, a framework that...
Reasoning Traces Shape Outputs but Models Won't Say So
arXiv:2603.20620v1 Announce Type: new Abstract: Can we trust the reasoning traces that large reasoning models (LRMs) produce? We investigate whether these traces faithfully reflect what drives model outputs, and whether models will honestly report their influence. We introduce Thought Injection,...
Compression is all you need: Modeling Mathematics
arXiv:2603.20396v1 Announce Type: new Abstract: Human mathematics (HM), the mathematics humans discover and value, is a vanishingly small subset of formal mathematics (FM), the totality of all valid deductions. We argue that HM is distinguished by its compressibility through hierarchically...
Beyond Test-Time Compute Strategies: Advocating Energy-per-Token in LLM Inference
arXiv:2603.20224v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate exceptional performance across diverse tasks but come with substantial energy and computational costs, particularly in request-heavy scenarios. In many real-world applications, the full scale and capabilities of LLMs are often...
Grounded Chess Reasoning in Language Models via Master Distillation
arXiv:2603.20510v1 Announce Type: new Abstract: Language models often lack grounded reasoning capabilities in specialized domains where training data is scarce but bespoke systems excel. We introduce a general framework for distilling expert system reasoning into natural language chain-of-thought explanations, enabling...
Graph of States: Solving Abductive Tasks with Large Language Models
arXiv:2603.21250v1 Announce Type: new Abstract: Logical reasoning encompasses deduction, induction, and abduction. However, while Large Language Models (LLMs) have effectively mastered the former two, abductive reasoning remains significantly underexplored. Existing frameworks, predominantly designed for static deductive tasks, fail to generalize...
Improving Coherence and Persistence in Agentic AI for System Optimization
arXiv:2603.21321v1 Announce Type: new Abstract: Designing high-performance system heuristics is a creative, iterative process requiring experts to form hypotheses and execute multi-step conceptual shifts. While Large Language Models (LLMs) show promise in automating this loop, they struggle with complex system...