Deep Tabular Research via Continual Experience-Driven Execution
arXiv:2603.09151v1 Announce Type: new Abstract: Large language models often struggle with complex long-horizon analytical tasks over unstructured tables, which typically feature hierarchical and bidirectional headers and non-canonical layouts. We formalize this challenge as Deep Tabular Research (DTR), requiring multi-step reasoning...
Time, Identity and Consciousness in Language Model Agents
arXiv:2603.09043v1 Announce Type: new Abstract: Machine consciousness evaluations mostly see behavior. For language model agents that behavior is language and tool use. That lets an agent say the right things about itself even when the constraints that should make those...
PrivPRISM: Automatically Detecting Discrepancies Between Google Play Data Safety Declarations and Developer Privacy Policies
arXiv:2603.09214v1 Announce Type: new Abstract: End-users seldom read verbose privacy policies, leading app stores like Google Play to mandate simplified data safety declarations as a user-friendly alternative. However, these self-declared disclosures often contradict the full privacy policies, deceiving users about...
Build, Borrow, or Just Fine-Tune? A Political Scientist's Guide to Choosing NLP Models
arXiv:2603.09595v1 Announce Type: new Abstract: Political scientists increasingly face a consequential choice when adopting natural language processing tools: build a domain-specific model from scratch, borrow and adapt an existing one, or simply fine-tune a general-purpose model on task data? Each...
Surgical Repair of Collapsed Attention Heads in ALiBi Transformers
arXiv:2603.09616v1 Announce Type: new Abstract: We identify a systematic attention collapse pathology in the BLOOM family of transformer language models, where ALiBi positional encoding causes 31-44% of attention heads to attend almost entirely to the beginning-of-sequence token. The collapse follows...
Fusing Semantic, Lexical, and Domain Perspectives for Recipe Similarity Estimation
arXiv:2603.09688v1 Announce Type: new Abstract: This research focuses on developing advanced methods for assessing similarity between recipes by combining different sources of information and analytical approaches. We explore the semantic, lexical, and domain similarity of food recipes, evaluated through the...
The $qs$ Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference
arXiv:2603.08960v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models deliver high quality at low training FLOPs, but this efficiency often vanishes at inference. We identify a double penalty that structurally disadvantages MoE architectures during decoding: first, expert routing fragments microbatches and...
Causally Sufficient and Necessary Feature Expansion for Class-Incremental Learning
arXiv:2603.09145v1 Announce Type: new Abstract: Current expansion-based methods for Class Incremental Learning (CIL) effectively mitigate catastrophic forgetting by freezing old features. However, such task-specific features learned from the new task may collide with the old features. From a causal perspective,...
The Radio-Frequency Transformer for Signal Separation
arXiv:2603.09201v1 Announce Type: new Abstract: We study a problem of signal separation: estimating a signal of interest (SOI) contaminated by an unknown non-Gaussian background/interference. Given the training data consisting of examples of SOI and interference, we show how to build...
Transductive Generalization via Optimal Transport and Its Application to Graph Node Classification
arXiv:2603.09257v1 Announce Type: new Abstract: Many existing transductive bounds rely on classical complexity measures that are computationally intractable and often misaligned with empirical behavior. In this work, we establish new representation-based generalization bounds in a distribution-free transductive setting, where learned...
Proxy-Guided Measurement Calibration
arXiv:2603.09288v1 Announce Type: new Abstract: Aggregate outcome variables collected through surveys and administrative records are often subject to systematic measurement error. For instance, in disaster loss databases, county-level losses reported may differ from the true damages due to variations in...
SCOTUSblog’s new podcast partners
SCOTUSblog is excited to announce the addition of podcasts Amarica’s Constitution and Divided Argument to its podcast lineup, joining Advisory Opinions. While both podcasts will maintain their editorial and creative independence, […]The postSCOTUSblog’s new podcast partnersappeared first onSCOTUSblog.
Birthright citizenship: legal takeaways of mice and men and elephants and dogs
Brothers in Law is a recurring series by brothers Akhil and Vikram Amar, with special emphasis on measuring what the Supreme Court says against what the Constitution itself says. For more content from […]The postBirthright citizenship: legal takeaways of mice...
Legora reaches $5.55 billion valuation as AI legal tech boom endures
Legora, an AI platform for lawyers, is now valued at $5.55 billion following a $550 million Series D led by Accel to fuel its growth in the U.S.
Sandbar secures $23M Series A for its AI note-taking ring
Sandbar aims to ship the Stream, which can be used to take notes, chat with an AI assistant, and for media playback, this summer.
Hierarchical Embedding Fusion for Retrieval-Augmented Code Generation
arXiv:2603.06593v1 Announce Type: new Abstract: Retrieval-augmented code generation often conditions the decoder on large retrieved code snippets. This ties online inference cost to repository size and introduces noise from long contexts. We present Hierarchical Embedding Fusion (HEF), a two-stage approach...
Language Shapes Mental Health Evaluations in Large Language Models
arXiv:2603.06910v1 Announce Type: new Abstract: This study investigates whether large language models (LLMs) exhibit cross-linguistic differences in mental health evaluations. Focusing on Chinese and English, we examine two widely used models, GPT-4o and Qwen3, to assess whether prompt language systematically...
Nw\=ach\=a Mun\=a: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR
arXiv:2603.07554v1 Announce Type: new Abstract: Nepal Bhasha (Newari), an endangered language of the Kathmandu Valley, remains digitally marginalized due to the severe scarcity of annotated speech resources. In this work, we introduce Nw\=ach\=a Mun\=a, a newly curated 5.39-hour manually transcribed...
Whitening Reveals Cluster Commitment as the Geometric Separator of Hallucination Types
arXiv:2603.07755v1 Announce Type: new Abstract: A geometric hallucination taxonomy distinguishes three failure types -- center-drift (Type~1), wrong-well convergence (Type~2), and coverage gaps (Type~3) -- by their signatures in embedding cluster space. Prior work found Types~1 and~2 indistinguishable in full-dimensional contextual...
Scale Dependent Data Duplication
arXiv:2603.06603v1 Announce Type: new Abstract: Data duplication during pretraining can degrade generalization and lead to memorization, motivating aggressive deduplication pipelines. However, at web scale, it is unclear what constitutes a ``duplicate'': beyond surface-form matches, semantically equivalent documents (e.g. translations) may...
Advances in GRPO for Generation Models: A Survey
arXiv:2603.06623v1 Announce Type: new Abstract: Large-scale flow matching models have achieved strong performance across generative tasks such as text-to-image, video, 3D, and speech synthesis. However, aligning their outputs with human preferences and task-specific objectives remains challenging. Flow-GRPO extends Group Relative...
Trust Aware Federated Learning for Secure Bone Healing Stage Interpretation in e-Health
arXiv:2603.06646v1 Announce Type: new Abstract: This paper presents a trust aware federated learning (FL) framework for interpreting bone healing stages using spectral features derived from frequency response data. The primary objective is to address the challenge posed by either unreliable...
Scaling Agentic Capabilities, Not Context: Efficient Reinforcement Finetuning for Large Toolspaces
arXiv:2603.06713v1 Announce Type: new Abstract: Agentic systems operating over large tool ecosystems must plan and execute long-horizon workflows under weak or non-verifiable supervision. While frontier models mitigate these challenges through scale and large context budgets, small language models (SLMs) remain...
ProtAlign: Contrastive learning paradigm for Sequence and structure alignment
arXiv:2603.06722v1 Announce Type: new Abstract: Protein language models often take into consideration the alignment between a protein sequence and its textual description. However, they do not take structural information into consideration. Traditional methods treat sequence and structure separately, limiting the...
Governor DeSantis Directs Florida State Agencies to Partner with Future of Life Institute to Shield Families from AI Harm
The collaboration will produce a Crisis Counselor Training Curriculum and a statewide AI Harms Reporting Form targeting dangerous AI companion applications
OpenAI and Google employees rush to Anthropic’s defense in DOD lawsuit
More than 30 OpenAI and Google DeepMind employees signed onto a statement supporting Anthropic's lawsuit against the Defense Department after the agency labeled the AI firm a supply-chain risk, according to court filings.
OpenAI acquires Promptfoo to secure its AI agents
This deal underscores how frontier labs are scrambling to prove their technology can be used safely in critical business operations.
Anthropic sues Defense Department over supply-chain risk designation
Anthropic filed suit against the Department of Defense on Monday after the agency labeled it a supply-chain risk. The complaint calls the DOD's actions "unprecedented and unlawful."
Omni-C: Compressing Heterogeneous Modalities into a Single Dense Encoder
arXiv:2603.05528v1 Announce Type: cross Abstract: Recent multimodal systems often rely on separate expert modality encoders which cause linearly scaling complexity and computational overhead with added modalities. While unified Omni-models address this via Mixture-of-Expert (MoE) architectures with specialized experts and routing,...
Conversational Demand Response: Bidirectional Aggregator-Prosumer Coordination through Agentic AI
arXiv:2603.06217v1 Announce Type: new Abstract: Residential demand response depends on sustained prosumer participation, yet existing coordination is either fully automated, or limited to one-way dispatch signals and price alerts that offer little possibility for informed decision-making. This paper introduces Conversational...