How Do Language Models Process Ethical Instructions? Deliberation, Consistency, and Other-Recognition Across Four Models
arXiv:2604.00021v1 Announce Type: cross Abstract: Alignment safety research assumes that ethical instructions improve model behavior, but how language models internally process such instructions remains unknown. We conducted over 600 multi-agent simulations across four models (Llama 3.3 70B, GPT-4o mini, Qwen3-Next-80B-A3B,...
Dynin-Omni: Omnimodal Unified Large Diffusion Language Model
arXiv:2604.00007v1 Announce Type: cross Abstract: We present Dynin-Omni, the first masked-diffusion-based omnimodal foundation model that unifies text, image, and speech understanding and generation, together with video understanding, within a single architecture. Unlike autoregressive unified models that serialize heterogeneous modalities, or...
Training In-Context and In-Weights Mixtures Via Contrastive Context Sampling
arXiv:2604.01601v1 Announce Type: new Abstract: We investigate training strategies that co-develop in-context learning (ICL) and in-weights learning (IWL), and the ability to switch between them based on context relevance. Although current LLMs exhibit both modes, standard task-specific fine-tuning often erodes...
"Who Am I, and Who Else Is Here?" Behavioral Differentiation Without Role Assignment in Multi-Agent LLM Systems
arXiv:2604.00026v1 Announce Type: new Abstract: When multiple large language models interact in a shared conversation, do they develop differentiated social roles or converge toward uniform behavior? We present a controlled experimental platform that orchestrates simultaneous multi-agent discussions among 7 heterogeneous...
Cognichip wants AI to design the chips that power AI, and just raised $60M to try
The firm says it can reduce the cost of chip development by more than 75% and cut the timeline by more than half.
Variational LSTM with Augmented Inputs: Nonlinear Response History Metamodeling with Aleatoric and Epistemic Uncertainty
arXiv:2604.01587v1 Announce Type: new Abstract: Uncertainty propagation in high-dimensional nonlinear dynamic structural systems is pivotal in state-of-the-art performance-based design and risk assessment, where uncertainties from both excitations and structures, i.e., the aleatoric uncertainty, must be considered. This poses a significant...
The Chronicles of RiDiC: Generating Datasets with Controlled Popularity Distribution for Long-form Factuality Evaluation
arXiv:2604.00019v1 Announce Type: cross Abstract: We present a configurable pipeline for generating multilingual sets of entities with specified characteristics, such as domain, geographical location and popularity, using data from Wikipedia and Wikidata. These datasets are intended for evaluating the factuality...
Robust Graph Representation Learning via Adaptive Spectral Contrast
arXiv:2604.01878v1 Announce Type: new Abstract: Spectral graph contrastive learning has emerged as a unified paradigm for handling both homophilic and heterophilic graphs by leveraging high-frequency components. However, we identify a fundamental spectral dilemma: while high-frequency signals are indispensable for encoding...
Justices seem dubious of government’s argument in criminal venue case
The Supreme Court on Monday considered whether federal prosecutors can try a defendant not only in the district where the offense occurs, but also where the crime’s “contemplated effects” are […]The postJustices seem dubious of government’s argument in criminal venue...
15% of Americans say they’d be willing to work for an AI boss, according to new poll
According to a Quinnipiac University poll, 15% of Americans say they'd be willing to have a job where their direct supervisor was an AI program that assigned tasks and set schedules.
SECURE: Stable Early Collision Understanding via Robust Embeddings in Autonomous Driving
arXiv:2604.01337v1 Announce Type: new Abstract: While deep learning has significantly advanced accident anticipation, the robustness of these safety-critical systems against real-world perturbations remains a major challenge. We reveal that state-of-the-art models like CRASH, despite their high performance, exhibit significant instability...
Detecting Multi-Agent Collusion Through Multi-Agent Interpretability
arXiv:2604.01151v1 Announce Type: new Abstract: As LLM agents are increasingly deployed in multi-agent systems, they introduce risks of covert coordination that may evade standard forms of human oversight. While linear probes on model activations have shown promise for detecting deception...
Oblivion: Self-Adaptive Agentic Memory Control through Decay-Driven Activation
arXiv:2604.00131v1 Announce Type: new Abstract: Human memory adapts through selective forgetting: experiences become less accessible over time but can be reactivated by reinforcement or contextual cues. In contrast, memory-augmented LLM agents rely on "always-on" retrieval and "flat" memory storage, causing...
One Panel Does Not Fit All: Case-Adaptive Multi-Agent Deliberation for Clinical Prediction
arXiv:2604.00085v1 Announce Type: new Abstract: Large language models applied to clinical prediction exhibit case-level heterogeneity: simple cases yield consistent outputs, while complex cases produce divergent predictions under minor prompt changes. Existing single-agent strategies sample from one role-conditioned distribution, and multi-agent...
Large Language Models in the Abuse Detection Pipeline
arXiv:2604.00323v1 Announce Type: new Abstract: Online abuse has grown increasingly complex, spanning toxic language, harassment, manipulation, and fraudulent behavior. Traditional machine-learning approaches dependent on static classifiers and labor-intensive labeling struggle to keep pace with evolving threat patterns and nuanced policy...
Find Your Next Job
Association for the Advancement of Artificial Intelligence (AAAI) - Find your next career at AAAI Career Center. Check back frequently as new jobs are posted every day.
Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models
arXiv:2604.00375v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality....
As more Americans adopt AI tools, fewer say they can trust the results
AI adoption is rising in the U.S., but trust remains low, with most Americans concerned about transparency, regulation, and the technology’s broader societal impact, according to a new Quinnipiac poll.
Decision-Centric Design for LLM Systems
arXiv:2604.00414v1 Announce Type: new Abstract: LLM systems must make control decisions in addition to generating outputs: whether to answer, clarify, retrieve, call tools, repair, or escalate. In many current architectures, these decisions remain implicit within generation, entangling assessment and action...
Self-Routing: Parameter-Free Expert Routing from Hidden States
arXiv:2604.00421v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) layers increase model capacity by activating only a small subset of experts per token, and typically rely on a learned router to map hidden states to expert assignments. In this work, we ask...
CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe
arXiv:2604.01489v1 Announce Type: new Abstract: High-performance GPU kernels are critical to modern machine learning systems, yet developing efficient implementations remains a challenging, expert-driven process due to the tight coupling between algorithmic structure, memory hierarchy usage, and hardware-specific optimizations. Recent work...
Human-in-the-Loop Control of Objective Drift in LLM-Assisted Computer Science Education
arXiv:2604.00281v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly embedded in computer science education through AI-assisted programming tools, yet such workflows often exhibit objective drift, in which locally plausible outputs diverge from stated task specifications. Existing instructional responses...
Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce
arXiv:2604.00022v1 Announce Type: cross Abstract: Multi-dimensional rubric-based dialogue evaluation is widely used to assess conversational AI, yet its criterion validity -- whether quality scores are associated with the downstream outcomes they are meant to serve -- remains largely untested. We...
JetPrism: diagnosing convergence for generative simulation and inverse problems in nuclear physics
arXiv:2604.01313v1 Announce Type: new Abstract: High-fidelity Monte Carlo simulations and complex inverse problems, such as mapping smeared experimental observations to ground-truth states, are computationally intensive yet essential for robust data analysis. Conditional Flow Matching (CFM) offers a mathematically robust approach...
Trump convenes "God Squad" to override Endangered Species Act, up oil production
Administration wants to exempt all federally regulated offshore oil from protections.
DySCo: Dynamic Semantic Compression for Effective Long-term Time Series Forecasting
arXiv:2604.01261v1 Announce Type: new Abstract: Time series forecasting (TSF) is critical across domains such as finance, meteorology, and energy. While extending the lookback window theoretically provides richer historical context, in practice, it often introduces irrelevant noise and computational redundancy, preventing...
Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation
arXiv:2604.00536v1 Announce Type: new Abstract: Large language models (LLMs) achieve strong downstream performance largely due to abundant supervised fine-tuning (SFT) data. However, high-quality SFT data in knowledge-intensive domains such as humanities, social sciences, medicine, law, and finance is scarce because...
A Reliability Evaluation of Hybrid Deterministic-LLM Based Approaches for Academic Course Registration PDF Information Extraction
arXiv:2604.00003v1 Announce Type: cross Abstract: This study evaluates the reliability of information extraction approaches from KRS documents using three strategies: LLM only, Hybrid Deterministic - LLM (regex + LLM), and a Camelot based pipeline with LLM fallback. Experiments were conducted...
LLM Essay Scoring Under Holistic and Analytic Rubrics: Prompt Effects and Bias
arXiv:2604.00259v1 Announce Type: new Abstract: Despite growing interest in using Large Language Models (LLMs) for educational assessment, it remains unclear how closely they align with human scoring. We present a systematic evaluation of instruction-tuned LLMs across three open essay-scoring datasets...