Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation
arXiv:2604.05083v1 Announce Type: new Abstract: While Large Language Models (LLMs) are increasingly adopted as automated judges for evaluating generated text, their outputs are often costly, and highly sensitive to prompt design, language, and aggregation strategies, severely, which limits reproducibility. To...
Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition
arXiv:2604.05279v1 Announce Type: new Abstract: Large language models exhibit sycophancy, the tendency to shift their stated positions toward perceived user preferences or authority cues regardless of evidence. Standard alignment methods fail to correct this because scalar reward models conflate two...
Human Values Matter: Investigating How Misalignment Shapes Collective Behaviors in LLM Agent Communities
arXiv:2604.05339v1 Announce Type: new Abstract: As LLMs become increasingly integrated into human society, evaluating their orientations on human values from social science has drawn growing attention. Nevertheless, it is still unclear why human values matter for LLMs, especially in LLM-based...
Reason Analogically via Cross-domain Prior Knowledge: An Empirical Study of Cross-domain Knowledge Transfer for In-Context Learning
arXiv:2604.05396v1 Announce Type: new Abstract: Despite its success, existing in-context learning (ICL) relies on in-domain expert demonstrations, limiting its applicability when expert annotations are scarce. We posit that different domains may share underlying reasoning structures, enabling source-domain demonstrations to improve...
On the Geometry of Positional Encodings in Transformers
arXiv:2604.05217v1 Announce Type: new Abstract: Neural language models process sequences of words, but the mathematical operations inside them are insensitive to the order in which words appear. Positional encodings are the component added to remedy this. Despite their importance, positional...
Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-Correction
arXiv:2604.05477v1 Announce Type: new Abstract: Autonomous GUI agents based on vision-language models (VLMs) often assume deterministic environment responses, generating actions without verifying whether previous operations succeeded. In real-world settings with network latency, rendering delays, and system interruptions, this assumption leads...
Bypassing the CSI Bottleneck: MARL-Driven Spatial Control for Reflector Arrays
arXiv:2604.05162v1 Announce Type: new Abstract: Reconfigurable Intelligent Surfaces (RIS) are pivotal for next-generation smart radio environments, yet their practical deployment is severely bottlenecked by the intractable computational overhead of Channel State Information (CSI) estimation. To bypass this fundamental physical-layer barrier,...
The 14th Amendment’s citizenship clause is not trapped in amber: a reflection on oral argument
While I have written multiple posts for SCOTUSblog on birthright citizenship, a substantial part of my practice is litigating Second Amendment claims. In light of that experience, I was struck […]The postThe 14th Amendment’s citizenship clause is not trapped in...
What oral arguments and opinion authorships can actually tell us
Empirical SCOTUS is a recurring series by Adam Feldman that looks at Supreme Court data, primarily in the form of opinions and oral arguments, to provide insights into the justices’ decision making and […]The postWhat oral arguments and opinion authorships...
The who, what, and where of gun control
A Second Opinion is a recurring series by Haley Proctor on the Second Amendment and constitutional litigation. My previous column examined what it means for a gun control measure to […]The postThe who, what, and where of gun controlappeared first...
Intel signs on to Elon Musk’s Terafab chips project
Intel will join SpaceX and Tesla in an effort to build a new U.S. semiconductor factory in Texas, although the scope of its contributions are unclear.
4 days left to save close to $500 on TechCrunch Disrupt 2026 passes
Four days left to save up to $482 on your TechCrunch Disrupt 2026 ticket. These low rates will disappear on April 10 at 11:59 p.m. PT. Register now.
The AI gold rush is pulling private wealth into riskier, earlier bets
On a recent episode of Equity, we talked to Arena Private Wealth to explore a growing trend: family offices bypassing VCs to gain direct exposure to AI startups, turning them from passive investors into active participants.
Rethinking the Key Role of Private Antitrust Enforcement
When Do Hallucinations Arise? A Graph Perspective on the Evolution of Path Reuse and Path Compression
arXiv:2604.03557v1 Announce Type: new Abstract: Reasoning hallucinations in large language models (LLMs) often appear as fluent yet unsupported conclusions that violate either the given context or underlying factual knowledge. Although such failures are widely observed, the mechanisms by which decoder-only...
BioAlchemy: Distilling Biological Literature into Reasoning-Ready Reinforcement Learning Training Data
arXiv:2604.03506v1 Announce Type: new Abstract: Despite the large corpus of biology training text, the impact of reasoning models on biological research generally lags behind math and coding. In this work, we show that biology questions from current large-scale reasoning datasets...
Toward Full Autonomous Laboratory Instrumentation Control with Large Language Models
arXiv:2604.03286v1 Announce Type: new Abstract: The control of complex laboratory instrumentation often requires significant programming expertise, creating a barrier for researchers lacking computational skills. This work explores the potential of large language models (LLMs), such as ChatGPT, and LLM-based artificial...
Selective Forgetting for Large Reasoning Models
arXiv:2604.03571v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) generate structured chains of thought (CoTs) before producing final answers, making them especially vulnerable to knowledge leakage through intermediate reasoning steps. Yet, the memorization of sensitive information in the training data...
What really happens on the emergency docket
By now, readers of SCOTUSblog are quite familiar with the Supreme Court’s emergency docket, where parties come to the court seeking emergency orders, oftentimes without full briefing and oral argument. […]The postWhat really happens on the emergency docketappeared first onSCOTUSblog.
Court allows Steve Bannon to move forward on dismissal of criminal charges against him
The Supreme Court on Monday morning added one new case, involving challenges to veterans’ benefit laws, to its docket for the 2026-27 term. The justices also sent the case of […]The postCourt allows Steve Bannon to move forward on dismissal...
Episode 42: Russia, Imperial Continuities and Histories of International Law - EJIL: The Podcast!
Improving Model Performance by Adapting the KGE Metric to Account for System Non-Stationarity
arXiv:2604.03906v1 Announce Type: new Abstract: Geoscientific systems tend to be characterized by pronounced temporal non-stationarity, arising from seasonal and climatic variability in hydrometeorological drivers, and from natural and anthropogenic changes to land use and cover. As has been pointed out,...
Towards the AI Historian: Agentic Information Extraction from Primary Sources
arXiv:2604.03553v1 Announce Type: new Abstract: AI is supporting, accelerating, and automating scientific discovery across a diverse set of fields. However, AI adoption in historical research remains limited due to the lack of solutions designed for historians. In this technical progress...
PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training
arXiv:2604.03675v1 Announce Type: new Abstract: In agentic search, large language models (LLMs) are trained to perform multi-turn retrieval and reasoning for complex tasks such as multi-hop question answering (QA). However, current search-based Reinforcement Learning (RL) methods suffer from two core...
Single-agent vs. Multi-agents for Automated Video Analysis of On-Screen Collaborative Learning Behaviors
arXiv:2604.03631v1 Announce Type: new Abstract: On-screen learning behavior provides valuable insights into how students seek, use, and create information during learning. Analyzing on-screen behavioral engagement is essential for capturing students' cognitive and collaborative processes. The recent development of Vision Language...
Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization
arXiv:2604.03656v1 Announce Type: new Abstract: Generative Engine Optimization (GEO) is rapidly reshaping digital marketing paradigms in the era of Large Language Models (LLMs). However, current GEO strategies predominantly rely on Retrieval-Augmented Generation (RAG), which inherently suffers from probabilistic hallucinations and...
An actual alternative to originalism
Justice, Democracy, and Law is a recurring series by Edward B. Foley that focuses on election law and the relationship of law and democracy. “Original public meaning” has become the […]The postAn actual alternative to originalismappeared first onSCOTUSblog.
Autoencoder-Based Parameter Estimation for Superposed Multi-Component Damped Sinusoidal Signals
arXiv:2604.03985v1 Announce Type: new Abstract: Damped sinusoidal oscillations are widely observed in many physical systems, and their analysis provides access to underlying physical properties. However, parameter estimation becomes difficult when the signal decays rapidly, multiple components are superposed, and observational...
Noise Steering for Controlled Text Generation: Improving Diversity and Reading-Level Fidelity in Arabic Educational Story Generation
arXiv:2604.03380v1 Announce Type: new Abstract: Generating diverse, pedagogically valid stories for Arabic early-grade reading assessments requires balancing tight constraints on vocabulary, reading level, and narrative structure against the need to avoid repetitive plots that undermine assessment validity. We investigate noise...