Birthright citizenship: an empirical analysis of supposedly originalist briefs
Brothers in Law is a recurring series by brothers Akhil and Vikram Amar, with special emphasis on measuring what the Supreme Court says against what the Constitution itself says. For more content from […]The postBirthright citizenship: an empirical analysis of...
The SCOTUS attorney switcheroo
Empirical SCOTUS is a recurring series by Adam Feldman that looks at Supreme Court data, primarily in the form of opinions and oral arguments, to provide insights into the justices’ decision making and […]The postThe SCOTUS attorney switcherooappeared first onSCOTUSblog.
SCOTUStoday for Wednesday, March 4
Good morning, and welcome to the court’s fourth opinion day in less than two weeks. We will be live blogging beginning at 9:30 a.m. EST.The postSCOTUStoday for Wednesday, March 4appeared first onSCOTUSblog.
Anthropic CEO Dario Amodei calls OpenAI’s messaging around military deal ‘straight up lies,’ report says
Anthropic gave up its contract with the Pentagon over AI safety disagreements -- then, OpenAI swooped in.
Apple Music to add Transparency Tags to distinguish AI music, says report
The label or distributor has to opt in to tagging their music as AI, so it's unclear how effective this intervention will be.
Google Search rolls out Gemini’s Canvas in AI Mode to all US users
Canvas in AI Mode is available to U.S. users in English for creating plans, projects, apps, and more.
The US military is still using Claude — but defense-tech clients are fleeing
As the U.S. continues its aerial attack on Iran, Anthropic models are being used for many targeting decisions.
Who needs data centers in space when they can float offshore?
Offshore wind developer Aikido will deploy a small data center beneath a floating offshore wind turbine later this year.
How Large Language Models Get Stuck: Early structure with persistent errors
arXiv:2603.00359v1 Announce Type: new Abstract: Linguistic insights may help make Large Language Model (LLM) training more efficient. We trained Meta's OPT model on the 100M word BabyLM dataset, and evaluated it on the BLiMP benchmark, which consists of 67 classes,...
A Typologically Grounded Evaluation Framework for Word Order and Morphology Sensitivity in Multilingual Masked LMs
arXiv:2603.00432v1 Announce Type: new Abstract: We introduce a typology-aware diagnostic for multilingual masked language models that tests reliance on word order versus inflectional form. Using Universal Dependencies, we apply inference-time perturbations: full token scrambling, content-word scrambling with function words fixed,...
CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles
arXiv:2603.00523v1 Announce Type: new Abstract: Mechanistic circuit discovery is notoriously sensitive to arbitrary analyst choices, especially pruning thresholds and feature dictionaries, often yielding brittle "one-shot" explanations with no principled notion of uncertainty. We reframe circuit discovery as an uncertainty-quantification problem...
Super Research: Answering Highly Complex Questions with Large Language Models through Super Deep and Super Wide Research
arXiv:2603.00582v1 Announce Type: new Abstract: While Large Language Models (LLMs) have demonstrated proficiency in Deep Research or Wide Search, their capacity to solve highly complex questions-those requiring long-horizon planning, massive evidence gathering, and synthesis across heterogeneous sources-remains largely unexplored. We...
From Literature to Hypotheses: An AI Co-Scientist System for Biomarker-Guided Drug Combination Hypothesis Generation
arXiv:2603.00612v1 Announce Type: new Abstract: The rapid growth of biomedical literature and curated databases has made it increasingly difficult for researchers to systematically connect biomarker mechanisms to actionable drug combination hypotheses. We present AI Co-Scientist (CoDHy), an interactive, human-in-the-loop system...
QQ: A Toolkit for Language Identifiers and Metadata
arXiv:2603.00620v1 Announce Type: new Abstract: The growing number of languages considered in multilingual NLP, including new datasets and tasks, poses challenges regarding properly and accurately reporting which languages are used and how. For example, datasets often use different language identifiers;...
Piecing Together Cross-Document Coreference Resolution Datasets: Systematic Dataset Analysis and Unification
arXiv:2603.00621v1 Announce Type: new Abstract: Research in CDCR remains fragmented due to heterogeneous dataset formats, varying annotation standards, and the predominance of the CDCR definition as the event coreference resolution (ECR). To address these challenges, we introduce uCDCR, a unified...
BLUFF: Benchmarking the Detection of False and Synthetic Content across 58 Low-Resource Languages
arXiv:2603.00634v1 Announce Type: new Abstract: Multilingual falsehoods threaten information integrity worldwide, yet detection benchmarks remain confined to English or a few high-resource languages, leaving low-resource linguistic communities without robust defense tools. We introduce BLUFF, a comprehensive benchmark for detecting false...
SSKG Hub: An Expert-Guided Platform for LLM-Empowered Sustainability Standards Knowledge Graphs
arXiv:2603.00669v1 Announce Type: new Abstract: Sustainability disclosure standards (e.g., GRI, SASB, TCFD, IFRS S2) are comprehensive yet lengthy, terminology-dense, and highly cross-referential, hindering structured analysis and downstream use. We present SSKG Hub (Sustainability Standards Knowledge Graph Hub), a research prototype...
Polynomial Mixing for Efficient Self-supervised Speech Encoders
arXiv:2603.00683v1 Announce Type: new Abstract: State-of-the-art speech-to-text models typically employ Transformer-based encoders that model token dependencies via self-attention mechanisms. However, the quadratic complexity of self-attention in both memory and computation imposes significant constraints on scalability. In this work, we propose...
SkillCraft: Can LLM Agents Learn to Use Tools Skillfully?
arXiv:2603.00718v1 Announce Type: new Abstract: Real-world tool-using agents operate over long-horizon workflows with recurring structure and diverse demands, where effective behavior requires not only invoking atomic tools but also abstracting, and reusing higher-level tool compositions. However, existing benchmarks mainly measure...
RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models
arXiv:2603.00724v1 Announce Type: new Abstract: Large language model alignment via reinforcement learning depends critically on reward function quality. However, static, domain-specific reward models are often costly to train and exhibit poor generalization in out-of-distribution scenarios encountered during RL iterations. We...
LaSTR: Language-Driven Time-Series Segment Retrieval
arXiv:2603.00725v1 Announce Type: new Abstract: Effectively searching time-series data is essential for system analysis, but existing methods often require expert-designed similarity criteria or rely on global, series-level descriptions. We study language-driven segment retrieval: given a natural language query, the goal...
A Comprehensive Evaluation of LLM Unlearning Robustness under Multi-Turn Interaction
arXiv:2603.00823v1 Announce Type: new Abstract: Machine unlearning aims to remove the influence of specific training data from pre-trained models without retraining from scratch, and is increasingly important for large language models (LLMs) due to safety, privacy, and legal concerns. Although...
Learning Nested Named Entity Recognition from Flat Annotations
arXiv:2603.00840v1 Announce Type: new Abstract: Nested named entity recognition identifies entities contained within other entities, but requires expensive multi-level annotation. While flat NER corpora exist abundantly, nested resources remain scarce. We investigate whether models can learn nested structure from flat...
MedGPT-oss: Training a General-Purpose Vision-Language Model for Biomedicine
arXiv:2603.00842v1 Announce Type: new Abstract: Biomedical multimodal assistants have the potential to unify radiology, pathology, and clinical-text reasoning, yet a critical deployment gap remains: top-performing systems are either closed-source or computationally prohibitive, precluding the on-premises deployment required for patient privacy...
CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning
arXiv:2603.00889v1 Announce Type: new Abstract: Large Language Models (LLMs) have recently exhibited remarkable reasoning capabilities, largely enabled by supervised fine-tuning (SFT)- and reinforcement learning (RL)-based post-training on high-quality reasoning data. However, reproducing and extending these capabilities in open and scalable...
Prompt Sensitivity and Answer Consistency of Small Open-Source Large Language Models on Clinical Question Answering: Implications for Low-Resource Healthcare Deployment
arXiv:2603.00917v1 Announce Type: new Abstract: Small open-source language models are gaining attention for low-resource healthcare settings, but their reliability under different prompt phrasings remains poorly understood. We evaluated five open-source models (Gemma 2 2B, Phi-3 Mini 3.8B, Llama 3.2 3B,...
Hybrid Neural-LLM Pipeline for Morphological Glossing in Endangered Language Documentation: A Case Study of Jungar Tuvan
arXiv:2603.00923v1 Announce Type: new Abstract: Interlinear glossed text (IGT) creation remains a major bottleneck in linguistic documentation and fieldwork, particularly for low-resource morphologically rich languages. We present a hybrid automatic glossing pipeline that combines neural sequence labeling with large language...
The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors
arXiv:2603.00925v1 Announce Type: new Abstract: Effective mathematics education requires identifying and responding to students' mistakes. For AI to support pedagogical applications, models must perform well across different levels of student proficiency. Our work provides an extensive, year-long snapshot of how...
Qayyem: A Real-time Platform for Scoring Proficiency of Arabic Essays
arXiv:2603.01009v1 Announce Type: new Abstract: Over the past years, Automated Essay Scoring (AES) systems have gained increasing attention as scalable and consistent solutions for assessing the proficiency of student writing. Despite recent progress, support for Arabic AES remains limited due...
How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning
arXiv:2603.01070v1 Announce Type: new Abstract: Solving complex geometric problems inherently requires interleaved reasoning: a tight alternation between constructing diagrams and performing logical deductions. Although recent Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities in visual generation and plotting, we...