Human Attribution of Causality to AI Across Agency, Misuse, and Misalignment
arXiv:2603.13236v1 Announce Type: new Abstract: AI-related incidents are becoming increasingly frequent and severe, ranging from safety failures to misuse by malicious actors. In such complex situations, identifying which elements caused an adverse outcome, the problem of cause selection, is a...
QuarkMedBench: A Real-World Scenario Driven Benchmark for Evaluating Large Language Models
arXiv:2603.13691v1 Announce Type: new Abstract: While Large Language Models (LLMs) excel on standardized medical exams, high scores often fail to translate to high-quality responses for real-world medical queries. Current evaluations rely heavily on multiple-choice questions, failing to capture the unstructured,...
FLUX: Data Worth Training On
arXiv:2603.13972v1 Announce Type: new Abstract: Modern large language model training is no longer limited by data availability, but by the inability of existing preprocessing pipelines to simultaneously achieve massive scale and high data quality. Current approaches are forced to sacrifice...
CMHL: Contrastive Multi-Head Learning for Emotionally Consistent Text Classification
arXiv:2603.14078v1 Announce Type: new Abstract: Textual Emotion Classification (TEC) is one of the most difficult NLP tasks. State of the art approaches rely on Large language models (LLMs) and multi-model ensembles. In this study, we challenge the assumption that larger...
Selective Fine-Tuning of GPT Architectures for Parameter-Efficient Clinical Text Classification
arXiv:2603.14183v1 Announce Type: new Abstract: The rapid expansion of electronic health record (EHR) systems has generated large volumes of unstructured clinical narratives that contain valuable information for disease identification, patient cohort discovery, and clinical decision support. Extracting structured knowledge from...
Rethinking Evaluation in Retrieval-Augmented Personalized Dialogue: A Cognitive and Linguistic Perspective
arXiv:2603.14217v1 Announce Type: new Abstract: In cognitive science and linguistic theory, dialogue is not seen as a chain of independent utterances but rather as a joint activity sustained by coherence, consistency, and shared understanding. However, many systems for open-domain and...
Automatic Inter-document Multi-hop Scientific QA Generation
arXiv:2603.14257v1 Announce Type: new Abstract: Existing automatic scientific question generation studies mainly focus on single-document factoid QA, overlooking the inter-document reasoning crucial for scientific understanding. We present AIM-SciQA, an automated framework for generating multi-document, multi-hop scientific QA datasets. AIM-SciQA extracts...
MedPriv-Bench: Benchmarking the Privacy-Utility Trade-off of Large Language Models in Medical Open-End Question Answering
arXiv:2603.14265v1 Announce Type: new Abstract: Recent advances in Retrieval-Augmented Generation (RAG) have enabled large language models (LLMs) to ground outputs in clinical evidence. However, connecting LLMs with external databases introduces the risk of contextual leakage: a subtle privacy threat where...
Motivation in Large Language Models
arXiv:2603.14347v1 Announce Type: new Abstract: Motivation is a central driver of human behavior, shaping decisions, goals, and task performance. As large language models (LLMs) become increasingly aligned with human preferences, we ask whether they exhibit something akin to motivation. We...
PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark
arXiv:2603.14456v1 Announce Type: new Abstract: Persian poses unique audio understanding challenges through its classical poetry, traditional music, and pervasive code-switching - none captured by existing benchmarks. We introduce PARSA-Bench (Persian Audio Reasoning and Speech Assessment Benchmark), the first benchmark for...
Knowledge, Rules and Their Embeddings: Two Paths towards Neuro-Symbolic JEPA
arXiv:2603.13265v1 Announce Type: new Abstract: Modern self-supervised predictive architectures excel at capturing complex statistical correlations from high-dimensional data but lack mechanisms to internalize verifiable human logic, leaving them susceptible to spurious correlations and shortcut learning. Conversely, traditional rule-based inference systems...
PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation
arXiv:2603.13275v1 Announce Type: new Abstract: Accurate prediction of surgical duration is pivotal for hospital resource management. Although recent supervised learning approaches-from machine learning (ML) to fine-tuned large language models (LLMs)-have shown strong performance, they remain constrained by the need for...
ICPRL: Acquiring Physical Intuition from Interactive Control
arXiv:2603.13295v1 Announce Type: new Abstract: VLMs excel at static perception but falter in interactive reasoning in dynamic physical environments, which demands planning and adaptation to dynamic outcomes. Existing physical reasoning methods often depend on abstract symbolic inputs or lack the...
Enhanced Atrial Fibrillation Prediction in ESUS Patients with Hypergraph-based Pre-training
arXiv:2603.13297v1 Announce Type: new Abstract: Atrial fibrillation (AF) is a major complication following embolic stroke of undetermined source (ESUS), elevating the risk of recurrent stroke and mortality. Early identification is clinically important, yet existing tools face limitations in accuracy, scalability,...
Evidence-based Distributional Alignment for Large Language Models
arXiv:2603.13305v1 Announce Type: new Abstract: Distributional alignment enables large language models (LLMs) to predict how a target population distributes its responses across answer options, rather than collapsing disagreement into a single consensus answer. However, existing LLM-based distribution prediction is often...
Preventing Curriculum Collapse in Self-Evolving Reasoning Systems
arXiv:2603.13309v1 Announce Type: new Abstract: Self-evolving reasoning frameworks let LLMs improve their reasoning capabilities by iteratively generating and solving problems without external supervision, using verifiable rewards. Ideally, such systems are expected to explore a diverse problem space and propose new...
Feature-level Interaction Explanations in Multimodal Transformers
arXiv:2603.13326v1 Announce Type: new Abstract: Multimodal Transformers often produce predictions without clarifying how different modalities jointly support a decision. Most existing multimodal explainable AI (MXAI) methods extend unimodal saliency to multimodal backbones, highlighting important tokens or patches within each modality,...
RBF-Solver: A Multistep Sampler for Diffusion Probabilistic Models via Radial Basis Functions
arXiv:2603.13330v1 Announce Type: new Abstract: Diffusion probabilistic models (DPMs) are widely adopted for their outstanding generative fidelity, yet their sampling is computationally demanding. Polynomial-based multistep samplers mitigate this cost by accelerating inference; however, despite their theoretical accuracy guarantees, they generate...
Lipschitz-Based Robustness Certification Under Floating-Point Execution
arXiv:2603.13334v1 Announce Type: new Abstract: Sensitivity-based robustness certification has emerged as a practical approach for certifying neural network robustness, including in settings that require verifiable guarantees. A key advantage of these methods is that certification is performed by concrete numerical...
AI-Driven Predictive Maintenance with Real-Time Contextual Data Fusion for Connected Vehicles: A Multi-Dataset Evaluation
arXiv:2603.13343v1 Announce Type: new Abstract: Most vehicle predictive maintenance systems rely exclusively on internal diagnostic signals and are validated on deterministic synthetic data, limiting the credibility of reported metrics. This paper presents a simulation-validated proof-of-concept framework for V2X-augmented predictive maintenance,...
Thermal Robustness of Retrieval in Dense Associative Memories: LSE vs LSR Kernels
arXiv:2603.13350v1 Announce Type: new Abstract: Understanding whether retrieval in dense associative memories survives thermal noise is essential for bridging zero-temperature capacity proofs with the finite-temperature conditions of practical inference and biological computation. We use Monte Carlo simulations to map the...
Justices will hear argument on Trump administration’s removal of protected status for Syrian and Haitian nationals
The Supreme Court announced on Monday afternoon that it will hear oral argument on whether the Trump administration can end a program that allows several thousand Syrians and approximately 350,000 […]The postJustices will hear argument on Trump administration’s removal of...
Haitian nationals ask court to deny Trump administration’s request to remove their protected status
A group of Haitian nationals urged the Supreme Court on Monday to leave in place a ruling by a federal judge in Washington, D.C., that allows them to stay in […]The postHaitian nationals ask court to deny Trump administration’s request...
Elon Musk's xAI sued for turning three girls' real photos into AI CSAM
Discord user led cops to Grok-generated CSAM of real girls, lawsuit says.
Trump and his FCC chair demand more positive news coverage of Iran war
Carr makes evidence-free claim of "hoaxes and news distortions." Trump is thrilled.
No accountability: Bills would ban liability lawsuits for climate change
This is the latest front in the battle over climate lawsuits.
Elon Musk’s xAI faces child porn lawsuit from minors Grok allegedly undressed
The three plaintiffs are seeking to represent anyone who had real images of them as a minor altered into sexual content by Grok.
The dictionary sues OpenAI
Encyclopedia Britannica and Merriam-Webster say that OpenAI violated the copyright of almost 100,000 articles by using them for LLM training.
A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning
arXiv:2603.12304v1 Announce Type: cross Abstract: This paper introduces a novel optimization framework that fundamentally integrates the Minimum Description Length (MDL) principle into the training dynamics of deep neural networks. Moving beyond its conventional role as a model selection criterion, we...
Generating Expressive and Customizable Evals for Timeseries Data Analysis Agents with AgentFuel
arXiv:2603.12483v1 Announce Type: new Abstract: Across many domains (e.g., IoT, observability, telecommunications, cybersecurity), there is an emerging adoption of conversational data analysis agents that enable users to "talk to your data" to extract insights. Such data analysis agents operate on...