Transductive Generalization via Optimal Transport and Its Application to Graph Node Classification
arXiv:2603.09257v1 Announce Type: new Abstract: Many existing transductive bounds rely on classical complexity measures that are computationally intractable and often misaligned with empirical behavior. In this work, we establish new representation-based generalization bounds in a distribution-free transductive setting, where learned...
The how and why of gun control
A Second Opinion is a recurring series by Haley Proctor on the Second Amendment and constitutional litigation. Last Monday, the Supreme Court heard argument in United States v. Hemani. In […]The postThe how and why of gun controlappeared first onSCOTUSblog.
Birthright citizenship: legal takeaways of mice and men and elephants and dogs
Brothers in Law is a recurring series by brothers Akhil and Vikram Amar, with special emphasis on measuring what the Supreme Court says against what the Constitution itself says. For more content from […]The postBirthright citizenship: legal takeaways of mice...
AI Now Co-ED Amba Kak Gives Remarks Before the UN General Assembly on AI Governance - AI Now Institute
Sandbar secures $23M Series A for its AI note-taking ring
Sandbar aims to ship the Stream, which can be used to take notes, chat with an AI assistant, and for media playback, this summer.
Language Shapes Mental Health Evaluations in Large Language Models
arXiv:2603.06910v1 Announce Type: new Abstract: This study investigates whether large language models (LLMs) exhibit cross-linguistic differences in mental health evaluations. Focusing on Chinese and English, we examine two widely used models, GPT-4o and Qwen3, to assess whether prompt language systematically...
Hierarchical Embedding Fusion for Retrieval-Augmented Code Generation
arXiv:2603.06593v1 Announce Type: new Abstract: Retrieval-augmented code generation often conditions the decoder on large retrieved code snippets. This ties online inference cost to repository size and introduces noise from long contexts. We present Hierarchical Embedding Fusion (HEF), a two-stage approach...
Dual-Metric Evaluation of Social Bias in Large Language Models: Evidence from an Underrepresented Nepali Cultural Context
arXiv:2603.07792v1 Announce Type: new Abstract: Large language models (LLMs) increasingly influence global digital ecosystems, yet their potential to perpetuate social and cultural biases remains poorly understood in underrepresented contexts. This study presents a systematic analysis of representational biases in seven...
Know When You're Wrong: Aligning Confidence with Correctness for LLM Error Detection
arXiv:2603.06604v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly deployed in critical decision-making systems, the lack of reliable methods to measure their uncertainty presents a fundamental trustworthiness risk. We introduce a normalized confidence score based on output...
Consensus is Not Verification: Why Crowd Wisdom Strategies Fail for LLM Truthfulness
arXiv:2603.06612v1 Announce Type: new Abstract: Pass@k and other methods of scaling inference compute can improve language model performance in domains with external verifiers, including mathematics and code, where incorrect candidates can be filtered reliably. This raises a natural question: can...
Pavement Missing Condition Data Imputation through Collective Learning-Based Graph Neural Networks
arXiv:2603.06625v1 Announce Type: new Abstract: Pavement condition data is important in providing information regarding the current state of the road network and in determining the needs of maintenance and rehabilitation treatments. However, the condition data is often incomplete due to...
ERP-RiskBench: Leakage-Safe Ensemble Learning for Financial Risk
arXiv:2603.06671v1 Announce Type: new Abstract: Financial risk detection in Enterprise Resource Planning (ERP) systems is an important but underexplored application of machine learning. Published studies in this area tend to suffer from vague dataset descriptions, leakage-prone pipelines, and evaluation practices...
ProtAlign: Contrastive learning paradigm for Sequence and structure alignment
arXiv:2603.06722v1 Announce Type: new Abstract: Protein language models often take into consideration the alignment between a protein sequence and its textual description. However, they do not take structural information into consideration. Traditional methods treat sequence and structure separately, limiting the...
In birthright citizenship case, Justice Department urges court to treat an old concept in a new way
Immigration Matters is a recurring series by César Cuauhtémoc García Hernández that analyzes the court’s immigration docket, highlighting emerging legal questions about new policy and enforcement practices. President Donald Trump’s […]The postIn birthright citizenship case, Justice Department urges court to...
SCOTUStoday for Monday, March 9
Just 22% of U.S. registered voters have “a great deal” (7%) or “quite a bit” (15%) of confidence in the Supreme Court, according to a new NBC News poll shared […]The postSCOTUStoday for Monday, March 9appeared first onSCOTUSblog.
Anthropic sues Defense Department over supply-chain risk designation
Anthropic filed suit against the Department of Defense on Monday after the agency labeled it a supply-chain risk. The complaint calls the DOD's actions "unprecedented and unlawful."
Conversational Demand Response: Bidirectional Aggregator-Prosumer Coordination through Agentic AI
arXiv:2603.06217v1 Announce Type: new Abstract: Residential demand response depends on sustained prosumer participation, yet existing coordination is either fully automated, or limited to one-way dispatch signals and price alerts that offer little possibility for informed decision-making. This paper introduces Conversational...
Traversal-as-Policy: Log-Distilled Gated Behavior Trees as Externalized, Verifiable Policies for Safe, Robust, and Efficient Agents
arXiv:2603.05517v1 Announce Type: cross Abstract: Autonomous LLM agents fail because long-horizon policy remains implicit in model weights and transcripts, while safety is retrofitted post hoc. We propose Traversal-as-Policy: distill sandboxed OpenHands execution logs into a single executable Gated Behavior Tree...
Agentic LLM Planning via Step-Wise PDDL Simulation: An Empirical Characterisation
arXiv:2603.06064v1 Announce Type: new Abstract: Task planning, the problem of sequencing actions to reach a goal from an initial state, is a core capability requirement for autonomous robotic systems. Whether large language models (LLMs) can serve as viable planners alongside...
EigenData: A Self-Evolving Multi-Agent Platform for Function-Calling Data Synthesis, Auditing, and Repair
arXiv:2603.05553v1 Announce Type: cross Abstract: Function-calling agents -- large language models that invoke tools and APIs -- require high-quality, domain-specific training data spanning executable environments, backing databases, and diverse multi-turn trajectories. We introduce EigenData, an integrated, self-evolving platform that automates...
SecureRAG-RTL: A Retrieval-Augmented, Multi-Agent, Zero-Shot LLM-Driven Framework for Hardware Vulnerability Detection
arXiv:2603.05689v1 Announce Type: cross Abstract: Large language models (LLMs) have shown remarkable capabilities in natural language processing tasks, yet their application in hardware security verification remains limited due to scarcity of publicly available hardware description language (HDL) datasets. This knowledge...
Longitudinal Lesion Inpainting in Brain MRI via 3D Region Aware Diffusion
arXiv:2603.05693v1 Announce Type: cross Abstract: Accurate longitudinal analysis of brain MRI is often hindered by evolving lesions, which bias automated neuroimaging pipelines. While deep generative models have shown promise in inpainting these lesions, most existing methods operate cross-sectionally or lack...
FreeTxt-Vi: A Benchmarked Vietnamese-English Toolkit for Segmentation, Sentiment, and Summarisation
arXiv:2603.05690v1 Announce Type: new Abstract: FreeTxt-Vi is a free and open source web based toolkit for creating and analysing bilingual Vietnamese English text collections. Positioned at the intersection of corpus linguistics and natural language processing NLP it enables users to...
Building an Ensemble LLM Semantic Tagger for UN Security Council Resolutions
arXiv:2603.05895v1 Announce Type: new Abstract: This paper introduces a new methodology for using LLM-based systems for accurate and efficient semantic tagging of UN Security Council resolutions. The main goal is to leverage LLM performance variability to build ensemble systems for...
ViewFusion: Structured Spatial Thinking Chains for Multi-View Reasoning
arXiv:2603.06024v1 Announce Type: new Abstract: Multi-view spatial reasoning remains difficult for current vision-language models. Even when multiple viewpoints are available, models often underutilize cross-view relations and instead rely on single-image shortcuts, leading to fragile performance on viewpoint transformation and occlusion-sensitive...
CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report Evaluation
arXiv:2603.06183v1 Announce Type: new Abstract: We introduce CRIMSON, a clinically grounded evaluation framework for chest X-ray report generation that assesses reports based on diagnostic correctness, contextual relevance, and patient safety. Unlike prior metrics, CRIMSON incorporates full clinical context, including patient...
Identifying Adversary Characteristics from an Observed Attack
arXiv:2603.05625v1 Announce Type: new Abstract: When used in automated decision-making systems, machine learning (ML) models are vulnerable to data-manipulation attacks. Some defense mechanisms (e.g., adversarial regularization) directly affect the ML models while others (e.g., anomaly detection) act within the broader...
Stock Market Prediction Using Node Transformer Architecture Integrated with BERT Sentiment Analysis
arXiv:2603.05917v1 Announce Type: new Abstract: Stock market prediction presents considerable challenges for investors, financial institutions, and policymakers operating in complex market environments characterized by noise, non-stationarity, and behavioral dynamics. Traditional forecasting methods often fail to capture the intricate patterns and...
Design Experiments to Compare Multi-armed Bandit Algorithms
arXiv:2603.05919v1 Announce Type: new Abstract: Online platforms routinely compare multi-armed bandit algorithms, such as UCB and Thompson Sampling, to select the best-performing policy. Unlike standard A/B tests for static treatments, each run of a bandit algorithm over $T$ users produces...
FedSCS-XGB -- Federated Server-centric surrogate XGBoost for continual health monitoring
arXiv:2603.06224v1 Announce Type: new Abstract: Wearable sensors with local data processing can detect health threats early, enhance documentation, and support personalized therapy. In the context of spinal cord injury (SCI), which involves risks such as pressure injuries and blood pressure...