SODA: Semi On-Policy Black-Box Distillation for Large Language Models
arXiv:2604.03873v1 Announce Type: new Abstract: Black-box knowledge distillation for large language models presents a strict trade-off. Simple off-policy methods (e.g., sequence-level knowledge distillation) struggle to correct the student's inherent errors. Fully on-policy methods (e.g., Generative Adversarial Distillation) solve this via...
FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning
arXiv:2604.03893v1 Announce Type: new Abstract: Breakthroughs in frontier theory often depend on the combination of concrete diagrammatic notations with rigorous logic. While multimodal large language models (MLLMs) show promise in general scientific tasks, current benchmarks often focus on local information...
Selective Forgetting for Large Reasoning Models
arXiv:2604.03571v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) generate structured chains of thought (CoTs) before producing final answers, making them especially vulnerable to knowledge leakage through intermediate reasoning steps. Yet, the memorization of sensitive information in the training data...
Understanding When Poisson Log-Normal Models Outperform Penalized Poisson Regression for Microbiome Count Data
arXiv:2604.03853v1 Announce Type: new Abstract: Multivariate count models are often justified by their ability to capture latent dependence, but researchers receive little guidance on when this added structure improves on simpler penalized marginal Poisson regression. We study this question using...
IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking
arXiv:2604.03232v1 Announce Type: new Abstract: IC3, also known as property-directed reachability (PDR), is a commonly-used algorithm for hardware safety model checking. It checks if a state transition system complies with a given safety property. IC3 either returns UNSAFE (indicating property...
Decomposing Communication Gain and Delay Cost Under Cross-Timestep Delays in Cooperative Multi-Agent Reinforcement Learning
arXiv:2604.03785v1 Announce Type: new Abstract: Communication is essential for coordination in \emph{cooperative} multi-agent reinforcement learning under partial observability, yet \emph{cross-timestep} delays cause messages to arrive multiple timesteps after generation, inducing temporal misalignment and making information stale when consumed. We formalize...
OpenAI’s vision for the AI economy: public wealth funds, robot taxes, and a four-day workweek
OpenAI proposes taxes on AI profits, public wealth funds, and expanded safety nets to address job loss and inequality, blending redistribution with capitalism as policymakers debate AI’s economic impact.
Affording Process Auditability with QualAnalyzer: An Atomistic LLM Analysis Tool for Qualitative Research
arXiv:2604.03820v1 Announce Type: new Abstract: Large language models are increasingly used for qualitative data analysis, but many workflows obscure how analytic conclusions are produced. We present QualAnalyzer, an open-source Chrome extension for Google Workspace that supports atomistic LLM analysis by...
Spatiotemporal Interpolation of GEDI Biomass with Calibrated Uncertainty
arXiv:2604.03874v1 Announce Type: new Abstract: Monitoring deforestation-driven carbon emissions requires both spatially explicit and temporally continuous estimates of aboveground biomass density (AGBD) with calibrated uncertainty. NASA's Global Ecosystem Dynamics Investigation (GEDI) provides reliable LIDAR-derived AGBD, but its orbital sampling causes...
Neural Operators for Multi-Task Control and Adaptation
arXiv:2604.03449v1 Announce Type: new Abstract: Neural operator methods have emerged as powerful tools for learning mappings between infinite-dimensional function spaces, yet their potential in optimal control remains largely unexplored. We focus on multi-task control problems, whose solution is a mapping...
Explainable Model Routing for Agentic Workflows
arXiv:2604.03527v1 Announce Type: new Abstract: Modern agentic workflows decompose complex tasks into specialized subtasks and route them to diverse models to minimize cost without sacrificing quality. However, current routing architectures focus exclusively on performance optimization, leaving underlying trade-offs between model...
Automated Analysis of Global AI Safety Initiatives: A Taxonomy-Driven LLM Approach
arXiv:2604.03533v1 Announce Type: new Abstract: We present an automated crosswalk framework that compares an AI safety policy document pair under a shared taxonomy of activities. Using the activity categories defined in Activity Map on AI Safety as fixed aspects, the...
Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty
arXiv:2604.04182v1 Announce Type: new Abstract: Non-stationary environments require agents to revise previously learned action values when contingencies change. We treat large language models (LLMs) as sequential decision policies in a two-option probabilistic reversal-learning task with three latent states and switch...
Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus
arXiv:2604.02923v1 Announce Type: new Abstract: Large Language Models (LLMs), particularly those employing Mixture-of-Experts (MoE) architectures, have achieved remarkable capabilities across diverse natural language processing tasks. However, these models frequently suffer from hallucinations -- generating plausible but factually incorrect content --...
When Modalities Remember: Continual Learning for Multimodal Knowledge Graphs
arXiv:2604.02778v1 Announce Type: new Abstract: Real-world multimodal knowledge graphs (MMKGs) are dynamic, with new entities, relations, and multimodal knowledge emerging over time. Existing continual knowledge graph reasoning (CKGR) methods focus on structural triples and cannot fully exploit multimodal signals from...
SocioEval: A Template-Based Framework for Evaluating Socioeconomic Status Bias in Foundation Models
arXiv:2604.02660v1 Announce Type: new Abstract: As Large Language Models (LLMs) increasingly power decision-making systems across critical domains, understanding and mitigating their biases becomes essential for responsible AI deployment. Although bias assessment frameworks have proliferated for attributes such as race and...
NeuReasoner: Towards Explainable, Controllable, and Unified Reasoning via Mixture-of-Neurons
arXiv:2604.02972v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have recently achieved remarkable success in complex reasoning tasks. However, closer scrutiny reveals persistent failure modes compromising performance and cost: I) Intra-step level, marked by calculation or derivation errors; II) Inter-step...
Detecting and Correcting Reference Hallucinations in Commercial LLMs and Deep Research Agents
arXiv:2604.03173v1 Announce Type: new Abstract: Large language models and deep research agents supply citation URLs to support their claims, yet the reliability of these citations has not been systematically measured. We address six research questions about citation URL validity using...
GRADE: Probing Knowledge Gaps in LLMs through Gradient Subspace Dynamics
arXiv:2604.02830v1 Announce Type: new Abstract: Detecting whether a model's internal knowledge is sufficient to correctly answer a given question is a fundamental challenge in deploying responsible LLMs. In addition to verbalising the confidence by LLM self-report, more recent methods explore...
Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation
arXiv:2604.03174v1 Announce Type: new Abstract: Large language models (LLMs) encode vast world knowledge in their parameters, yet they remain fundamentally limited by static knowledge, finite context windows, and weakly structured causal reasoning. This survey provides a unified account of augmentation...
Convolutional Surrogate for 3D Discrete Fracture-Matrix Tensor Upscaling
arXiv:2604.02335v1 Announce Type: new Abstract: Modeling groundwater flow in three-dimensional fractured crystalline media requires accounting for strong spatial heterogeneity induced by fractures. Fine-scale discrete fracture-matrix (DFM) simulations can capture this complexity but are computationally expensive, especially when repeated evaluations are...
Generating Counterfactual Patient Timelines from Real-World Data
arXiv:2604.02337v1 Announce Type: new Abstract: Counterfactual simulation - exploring hypothetical consequences under alternative clinical scenarios - holds promise for transformative applications such as personalized medicine and in silico trials. However, it remains challenging due to methodological limitations. Here, we show...
Complex-Valued GNNs for Distributed Basis-Invariant Control of Planar Systems
arXiv:2604.02615v1 Announce Type: new Abstract: Graph neural networks (GNNs) are a well-regarded tool for learned control of networked dynamical systems due to their ability to be deployed in a distributed manner. However, current distributed GNN architectures assume that all nodes...
Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models
arXiv:2604.03157v1 Announce Type: new Abstract: The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities. Beyond pattern recognition, linguistic reasoning must integrate with visual comprehension, particularly for Chart Question Answering (CQA) tasks...
FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models
arXiv:2604.02967v1 Announce Type: new Abstract: Recent Large Reasoning Models (LRMs) like DeepSeek-R1 have demonstrated remarkable success in complex reasoning tasks, exhibiting human-like patterns in exploring multiple alternative solutions. Upon closer inspection, however, we uncover a surprising phenomenon: The First is...
Breakdowns in Conversational AI: Interactional Failures in Emotionally and Ethically Sensitive Contexts
arXiv:2604.02713v1 Announce Type: new Abstract: Conversational AI is increasingly deployed in emotionally charged and ethically sensitive interactions. Previous research has primarily concentrated on emotional benchmarks or static safety checks, overlooking how alignment unfolds in evolving conversation. We explore the research...
AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation
arXiv:2604.02525v1 Announce Type: new Abstract: Low-precision training (LPT) commonly employs Hadamard transforms to suppress outliers and mitigate quantization error in large language models (LLMs). However, prior methods apply a fixed transform uniformly, despite substantial variation in outlier structures across tensors....
WGFINNs: Weak formulation-based GENERIC formalism informed neural networks'
arXiv:2604.02601v1 Announce Type: new Abstract: Data-driven discovery of governing equations from noisy observations remains a fundamental challenge in scientific machine learning. While GENERIC formalism informed neural networks (GFINNs) provide a principled framework that enforces the laws of thermodynamics by construction,...
Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens
arXiv:2604.02608v1 Announce Type: new Abstract: Function vectors (FVs) -- mean-difference directions extracted from in-context learning demonstrations -- can steer large language model behavior when added to the residual stream. We hypothesized that FV steering failures reflect an absence of task-relevant...
A Spectral Framework for Multi-Scale Nonlinear Dimensionality Reduction
arXiv:2604.02535v1 Announce Type: new Abstract: Dimensionality reduction (DR) is characterized by two longstanding trade-offs. First, there is a global-local preservation tension: methods such as t-SNE and UMAP prioritize local neighborhood preservation, yet may distort global manifold structure, while methods such...