More Human, More Efficient: Aligning Annotations with Quantized SLMs
arXiv:2604.00586v1 Announce Type: new Abstract: As Large Language Model (LLM) capabilities advance, the demand for high-quality annotation of exponentially increasing text corpora has outpaced human capacity, leading to the widespread adoption of LLMs in automatic evaluation and annotation. However, proprietary...
JetPrism: diagnosing convergence for generative simulation and inverse problems in nuclear physics
arXiv:2604.01313v1 Announce Type: new Abstract: High-fidelity Monte Carlo simulations and complex inverse problems, such as mapping smeared experimental observations to ground-truth states, are computationally intensive yet essential for robust data analysis. Conditional Flow Matching (CFM) offers a mathematically robust approach...
SECURE: Stable Early Collision Understanding via Robust Embeddings in Autonomous Driving
arXiv:2604.01337v1 Announce Type: new Abstract: While deep learning has significantly advanced accident anticipation, the robustness of these safety-critical systems against real-world perturbations remains a major challenge. We reveal that state-of-the-art models like CRASH, despite their high performance, exhibit significant instability...
Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via Diffusion Sampler
arXiv:2604.01870v1 Announce Type: new Abstract: In modern process industries, data-driven models are important tools for real-time monitoring when key performance indicators are difficult to measure directly. While accurate predictions are essential, reliable uncertainty quantification (UQ) is equally critical for safety,...
Agentic AI -- Physicist Collaboration in Experimental Particle Physics: A Proof-of-Concept Measurement with LEP Open Data
arXiv:2603.05735v2 Announce Type: cross Abstract: We present an AI agentic measurement of the thrust distribution in $e^{+}e^{-}$ collisions at $\sqrt{s}=91.2$~GeV using archived ALEPH data. The analysis and all note writing is carried out entirely by AI agents (OpenAI Codex and...
An Online Machine Learning Multi-resolution Optimization Framework for Energy System Design Limit of Performance Analysis
arXiv:2604.01308v1 Announce Type: new Abstract: Designing reliable integrated energy systems for industrial processes requires optimization and verification models across multiple fidelities, from architecture-level sizing to high-fidelity dynamic operation. However, model mismatch across fidelities obscures the sources of performance loss and...
Can Large Language Models Self-Correct in Medical Question Answering? An Exploratory Study
arXiv:2604.00261v2 Announce Type: new Abstract: Large language models (LLMs) have achieved strong performance on medical question answering (medical QA), and chain-of-thought (CoT) prompting has further improved results by eliciting explicit intermediate reasoning; meanwhile, self-reflective (self-corrective) prompting has been widely claimed...
MSA-Thinker: Discrimination-Calibration Reasoning with Hint-Guided Reinforcement Learning for Multimodal Sentiment Analysis
arXiv:2604.00013v1 Announce Type: cross Abstract: Multimodal sentiment analysis aims to understand human emotions by integrating textual, auditory, and visual modalities. Although Multimodal Large Language Models (MLLMs) have achieved state-of-the-art performance via supervised fine-tuning (SFT), their end-to-end "black-box" nature limits interpretability....
Cognitive Energy Modeling for Neuroadaptive Human-Machine Systems using EEG and WGAN-GP
arXiv:2604.01653v1 Announce Type: new Abstract: Electroencephalography (EEG) provides a non-invasive insight into the brain's cognitive and emotional dynamics. However, modeling how these states evolve in real time and quantifying the energy required for such transitions remains a major challenge. The...
Agent psychometrics: Task-level performance prediction in agentic coding benchmarks
arXiv:2604.00594v1 Announce Type: new Abstract: As the focus in LLM-based coding shifts from static single-step code generation to multi-step agentic interaction with tools and environments, understanding which tasks will challenge agents and why becomes increasingly difficult. This is compounded by...
When Reward Hacking Rebounds: Understanding and Mitigating It with Representation-Level Signals
arXiv:2604.01476v1 Announce Type: new Abstract: Reinforcement learning for LLMs is vulnerable to reward hacking, where models exploit shortcuts to maximize reward without solving the intended task. We systematically study this phenomenon in coding tasks using an environment-manipulation setting, where models...
Therefore I am. I Think
arXiv:2604.01202v2 Announce Type: new Abstract: We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable,...
Detecting Multi-Agent Collusion Through Multi-Agent Interpretability
arXiv:2604.01151v1 Announce Type: new Abstract: As LLM agents are increasingly deployed in multi-agent systems, they introduce risks of covert coordination that may evade standard forms of human oversight. While linear probes on model activations have shown promise for detecting deception...
Beyond Symbolic Solving: Multi Chain-of-Thought Voting for Geometric Reasoning in Large Language Models
arXiv:2604.00890v1 Announce Type: new Abstract: Geometric Problem Solving (GPS) remains at the heart of enhancing mathematical reasoning in large language models because it requires the combination of diagrammatic understanding, symbolic manipulation and logical inference. In existing literature, researchers have chiefly...
Care-Conditioned Neuromodulation for Autonomy-Preserving Supportive Dialogue Agents
arXiv:2604.01576v1 Announce Type: new Abstract: Large language models deployed in supportive or advisory roles must balance helpfulness with preservation of user autonomy, yet standard alignment methods primarily optimize for helpfulness and harmlessness without explicitly modeling relational risks such as dependency...
Musk loves Grok’s “roasts.” Swiss official sues in attempt to neuter them.
Swiss finance minister filed a criminal complaint over Grok's "defamation."
CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning
arXiv:2604.01634v1 Announce Type: new Abstract: Real-world reasoning often requires combining information across modalities, connecting textual context with visual cues in a multi-hop process. Yet, most multimodal benchmarks fail to capture this ability: they typically rely on single images or set...
Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory
arXiv:2604.01007v2 Announce Type: new Abstract: AI agents increasingly operate over extended time horizons, yet their ability to retain, organize, and recall multimodal experiences remains a critical bottleneck. Building effective lifelong memory requires navigating a vast design space spanning architecture, retrieval...
Does Unification Come at a Cost? Uni-SafeBench: A Safety Benchmark for Unified Multimodal Large Models
arXiv:2604.00547v1 Announce Type: new Abstract: Unified Multimodal Large Models (UMLMs) integrate understanding and generation capabilities within a single architecture. While this architectural unification, driven by the deep fusion of multimodal features, enhances model performance, it also introduces important yet underexplored...
Logarithmic Scores, Power-Law Discoveries: Disentangling Measurement from Coverage in Agent-Based Evaluation
arXiv:2604.00477v1 Announce Type: new Abstract: LLM-based agent judges are an emerging approach to evaluating conversational AI, yet a fundamental uncertainty remains: can we trust their assessments, and if so, how many are needed? Through 960 sessions with two model pairs...
Artificial Intelligence and International Law: Legal Implications of AI Development and Global Regulation
This paper examines the legal implications of artificial intelligence (AI) development within the framework of public international law. Employing a doctrinal and comparative legal methodology, it surveys the principal international and regional regulatory instruments currently governing AI — including the...
About the Association for the Advancement of Artificial Intelligence (AAAI)
AAAI is an artificial intelligence organization dedicated to advancing the scientific understanding of AI.
AV1’s open, royalty-free promise in question as Dolby sues Snapchat over codec
Big Tech declaring AV1 royalty-free “doesn't mean that it is."
Elon Musk loses big in court; X boycott perfectly legal
X admonished for "fishing expedition" as judge dismisses ad boycott lawsuit.
OpenAI shuts down Sora while Meta gets shut out in court
When an 82-year-old Kentucky woman was offered $26 million from an AI company that wanted to build a data center on her land, she said no. Sure, that same company can try to rezone 2,000 acres nearby anyway, but as...
The Compression Paradox in LLM Inference: Provider-Dependent Energy Effects of Prompt Compression
arXiv:2603.23528v1 Announce Type: new Abstract: The rapid proliferation of Large Language Models has created an environmental paradox: the very technology that could help solve climate challenges is itself becoming a significant contributor to global carbon emissions. We test whether prompt...
Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages
arXiv:2603.23521v1 Announce Type: new Abstract: Multimodal research has predominantly focused on single-image reasoning, with limited exploration of multi-image scenarios. Recent models have sought to enhance multi-image understanding through large-scale pretraining on interleaved image-text datasets. However, most Vision-Language Models (VLMs) are...
Do 3D Large Language Models Really Understand 3D Spatial Relationships?
arXiv:2603.23523v1 Announce Type: new Abstract: Recent 3D Large-Language Models (3D-LLMs) claim to understand 3D worlds, especially spatial relationships among objects. Yet, we find that simply fine-tuning a language model on text-only question-answer pairs can perform comparably or even surpass these...
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
arXiv:2603.23516v1 Announce Type: new Abstract: Long-term memory is a cornerstone of human intelligence. Enabling AI to process lifetime-scale information remains a long-standing pursuit in the field. Due to the constraints of full-attention architectures, the effective context length of large language...
Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
arXiv:2603.23507v1 Announce Type: new Abstract: While Masked Diffusion Language Models (MDLMs) relying on token masking and unmasking have shown promise in language modeling, their computational efficiency and generation flexibility remain constrained by the masking paradigm. In this paper, we propose...