Querying Structured Data Through Natural Language Using Language Models
arXiv:2604.03057v1 Announce Type: new Abstract: This paper presents an open source methodology for allowing users to query structured non textual datasets through natural language Unlike Retrieval Augmented Generation RAG which struggles with numerical and highly structured information our approach trains...
Mitigating Data Scarcity in Spaceflight Applications for Offline Reinforcement Learning Using Physics-Informed Deep Generative Models
arXiv:2604.02438v1 Announce Type: new Abstract: The deployment of reinforcement learning (RL)-based controllers on physical systems is often limited by poor generalization to real-world scenarios, known as the simulation-to-reality (sim-to-real) gap. This gap is particularly challenging in spaceflight, where real-world training...
Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection
arXiv:2604.02819v1 Announce Type: new Abstract: Large reasoning models achieve strong performance on complex tasks through long chain-of-thought (CoT) trajectories, but directly transferring such reasoning processes to smaller models remains challenging. A key difficulty is that not all teacher-generated reasoning trajectories...
LLM-based Atomic Propositions help weak extractors: Evaluation of a Propositioner for triplet extraction
arXiv:2604.02866v1 Announce Type: new Abstract: Knowledge Graph construction from natural language requires extracting structured triplets from complex, information-dense sentences. In this paper, we investigate if the decomposition of text into atomic propositions (minimal, semantically autonomous units of information) can improve...
Re-analysis of the Human Transcription Factor Atlas Recovers TF-Specific Signatures from Pooled Single-Cell Screens with Missing Controls
arXiv:2604.02511v1 Announce Type: new Abstract: Public pooled single-cell perturbation atlases are valuable resources for studying transcription factor (TF) function, but downstream re-analysis can be limited by incomplete deposited metadata and missing internal controls. Here we re-analyze the human TF Atlas...
Cross-subject Muscle Fatigue Detection via Adversarial and Supervised Contrastive Learning with Inception-Attention Network
arXiv:2604.02670v1 Announce Type: new Abstract: Muscle fatigue detection plays an important role in physical rehabilitation. Previous researches have demonstrated that sEMG offers superior sensitivity in detecting muscle fatigue compared to other biological signals. However, features extracted from sEMG may vary...
Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus
arXiv:2604.02923v1 Announce Type: new Abstract: Large Language Models (LLMs), particularly those employing Mixture-of-Experts (MoE) architectures, have achieved remarkable capabilities across diverse natural language processing tasks. However, these models frequently suffer from hallucinations -- generating plausible but factually incorrect content --...
ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents
arXiv:2604.02834v1 Announce Type: new Abstract: Longitudinal health agents must reason across multi-source trajectories that combine continuous device streams, sparse clinical exams, and episodic life events - yet evaluating them is hard: real-world data cannot be released at scale, and temporally...
Beyond Message Passing: Toward Semantically Aligned Agent Communication
arXiv:2604.02369v1 Announce Type: cross Abstract: Agent communication protocols are becoming critical infrastructure for large language model (LLM) systems that must use tools, coordinate with other agents, and operate across heterogeneous environments. This work presents a human-inspired perspective on this emerging...
Domain-Adapted Retrieval for In-Context Annotation of Pedagogical Dialogue Acts
arXiv:2604.03127v1 Announce Type: new Abstract: Automated annotation of pedagogical dialogue is a high-stakes task where LLMs often fail without sufficient domain grounding. We present a domain-adapted RAG pipeline for tutoring move annotation. Rather than fine-tuning the generative model, we adapt...
Product-Stability: Provable Convergence for Gradient Descent on the Edge of Stability
arXiv:2604.02653v1 Announce Type: new Abstract: Empirically, modern deep learning training often occurs at the Edge of Stability (EoS), where the sharpness of the loss exceeds the threshold below which classical convergence analysis applies. Despite recent progress, existing theoretical explanations of...
From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation
arXiv:2604.02355v1 Announce Type: new Abstract: Combining Chain-of-Thought (CoT) with Reinforcement Learning (RL) improves text-to-image (T2I) generation, yet the underlying interaction between CoT's exploration and RL's optimization remains unclear. We present a systematic entropy-based analysis that yields three key insights: (1)...
Detecting and Correcting Reference Hallucinations in Commercial LLMs and Deep Research Agents
arXiv:2604.03173v1 Announce Type: new Abstract: Large language models and deep research agents supply citation URLs to support their claims, yet the reliability of these citations has not been systematically measured. We address six research questions about citation URL validity using...
Do Audio-Visual Large Language Models Really See and Hear?
arXiv:2604.02605v1 Announce Type: new Abstract: Audio-Visual Large Language Models (AVLLMs) are emerging as unified interfaces to multimodal perception. We present the first mechanistic interpretability study of AVLLMs, analyzing how audio and visual features evolve and fuse through different layers of...
Redirected, Not Removed: Task-Dependent Stereotyping Reveals the Limits of LLM Alignments
arXiv:2604.02669v1 Announce Type: new Abstract: How biased is a language model? The answer depends on how you ask. A model that refuses to choose between castes for a leadership role will, in a fill-in-the-blank task, reliably associate upper castes with...
Supreme Court issues statement that Justice Alito was hospitalized approximately two weeks ago
Justice Samuel Alito was hospitalized on March 20 “[o]ut of an abundance of caution” and at the recommendation of his security detail, the Supreme Court’s Public Information Officer, Patricia McCabe, […]The postSupreme Court issues statement that Justice Alito was hospitalized...
What oral argument told us in the birthright citizenship case
Empirical SCOTUS is a recurring series by Adam Feldman that looks at Supreme Court data, primarily in the form of opinions and oral arguments, to provide insights into the justices’ decision making and […]The postWhat oral argument told us in...
Elon Musk insists banks working on SpaceX IPO must buy Grok subscriptions
Some banks "agreed to spend tens of millions on the chatbot," NYT reports.
Anthropic ramps up its political activities with a new PAC
With the midterms right around the corner, the new group is positioned to back candidates who support the AI company's policy agenda.
Supreme Court appears likely to side against Trump on birthright citizenship
Updated on April 1 at 10:10 p.m. On Jan. 20, 2025, President Donald Trump signed an executive order that would end birthright citizenship – the guarantee of U.S. citizenship to […]The postSupreme Court appears likely to side against Trump on...
Cognitive Energy Modeling for Neuroadaptive Human-Machine Systems using EEG and WGAN-GP
arXiv:2604.01653v1 Announce Type: new Abstract: Electroencephalography (EEG) provides a non-invasive insight into the brain's cognitive and emotional dynamics. However, modeling how these states evolve in real time and quantifying the energy required for such transitions remains a major challenge. The...
ASCAT: An Arabic Scientific Corpus and Benchmark for Advanced Translation Evaluation
arXiv:2604.00015v1 Announce Type: new Abstract: We present ASCAT (Arabic Scientific Corpus for Advanced Translation), a high-quality English-Arabic parallel benchmark corpus designed for scientific translation evaluation constructed through a systematic multi-engine translation and human validation pipeline. Unlike existing Arabic-English corpora that...
Bridging Deep Learning and Integer Linear Programming: A Predictive-to-Prescriptive Framework for Supply Chain Analytics
arXiv:2604.01775v1 Announce Type: new Abstract: Although demand forecasting is a critical component of supply chain planning, actual retail data can exhibit irreconcilable seasonality, irregular spikes, and noise, rendering precise projections nearly unattainable. This paper proposes a three-step analytical framework that...
Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial
arXiv:2604.01328v1 Announce Type: new Abstract: Traditional scientific discovery relies on an iterative hypothesise-experiment-refine cycle that has driven progress for centuries, but its intuitive, ad-hoc implementation often wastes resources, yields inefficient designs, and misses critical insights. This tutorial presents Bayesian Optimisation...
A Taxonomy of Programming Languages for Code Generation
arXiv:2604.00239v1 Announce Type: new Abstract: The world's 7,000+ languages vary widely in the availability of resources for NLP, motivating efforts to systematically categorize them by their degree of resourcefulness (Joshi et al., 2020). A similar disparity exists among programming languages...
Advisory Opinions broadcast: President Donald Trump and birthright citizenship
Oral arguments in Trump v. Barbara, on the constitutionality of President Donald Trump’s executive order on birthright citizenship, have concluded, but the conversation isn’t over. Listen now to a special […]The postAdvisory Opinions broadcast: President Donald Trump and birthright citizenshipappeared...
Logarithmic Scores, Power-Law Discoveries: Disentangling Measurement from Coverage in Agent-Based Evaluation
arXiv:2604.00477v1 Announce Type: new Abstract: LLM-based agent judges are an emerging approach to evaluating conversational AI, yet a fundamental uncertainty remains: can we trust their assessments, and if so, how many are needed? Through 960 sessions with two model pairs...
Test-Time Scaling Makes Overtraining Compute-Optimal
arXiv:2604.01411v1 Announce Type: new Abstract: Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address....
A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation
arXiv:2604.00249v1 Announce Type: new Abstract: Single-agent large language model (LLM) systems struggle to simultaneously support diverse conversational functions and maintain safety in behavioral health communication. We propose a safety-aware, role-orchestrated multi-agent LLM framework designed to simulate supportive behavioral health dialogue...
Preference Guided Iterated Pareto Referent Optimisation for Accessible Route Planning
arXiv:2604.00795v1 Announce Type: new Abstract: We propose the Preference Guided Iterated Pareto Referent Optimisation (PG-IPRO) for urban route planning for people with different accessibility requirements and preferences. With this algorithm the user can interact with the system by giving feedback...