SCOTUStoday for Tuesday, March 17
Happy St. Patrick’s Day! We recommend celebrating by reading about Supreme Court justices of Irish descent.The postSCOTUStoday for Tuesday, March 17appeared first onSCOTUSblog.
AI’s ‘boys’ club’ could widen the wealth gap for women, says Rana el Kaliouby
AI investor Rana el Kaliouby warns that if women are shut out of AI funding and leadership, the consequences will be grim.
Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation
arXiv:2603.13260v1 Announce Type: new Abstract: Knowledge Distillation (KD) can transfer the reasoning abilities of large models to smaller ones, which can reduce the costs to generate Chain-of-Thoughts for reasoning tasks. KD methods typically ask the student to mimic the teacher's...
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings
arXiv:2603.13594v1 Announce Type: new Abstract: Large language models are shifting from passive information providers to active agents intended for complex workflows. However, their deployment as reliable AI workers in enterprise is stalled by benchmarks that fail to capture the intricacies...
InterventionLens: A Multi-Agent Framework for Detecting ASD Intervention Strategies in Parent-Child Shared Reading
arXiv:2603.13710v1 Announce Type: new Abstract: Home-based interventions like parent-child shared reading provide a cost-effective approach for supporting children with autism spectrum disorder (ASD). However, analyzing caregiver intervention strategies in naturalistic home interactions typically relies on expert annotation, which is costly,...
DeceptGuard :A Constitutional Oversight Framework For Detecting Deception in LLM Agents
arXiv:2603.13791v1 Announce Type: new Abstract: Reliable detection of deceptive behavior in Large Language Model (LLM) agents is an essential prerequisite for safe deployment in high-stakes agentic contexts. Prior work on scheming detection has focused exclusively on black-box monitors that observe...
Distilling Deep Reinforcement Learning into Interpretable Fuzzy Rules: An Explainable AI Framework
arXiv:2603.13257v1 Announce Type: new Abstract: Deep Reinforcement Learning (DRL) agents achieve remarkable performance in continuous control but remain opaque, hindering deployment in safety-critical domains. Existing explainability methods either provide only local insights (SHAP, LIME) or employ over-simplified surrogates failing to...
vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
arXiv:2603.13966v1 Announce Type: new Abstract: Vision Language Action VLA models are typically evaluated using per benchmark scripts maintained independently by each model repository, leading to duplicated code, dependency conflicts, and underspecified protocols. We present vla eval, an open source evaluation...
Steering at the Source: Style Modulation Heads for Robust Persona Control
arXiv:2603.13249v1 Announce Type: new Abstract: Activation steering offers a computationally efficient mechanism for controlling Large Language Models (LLMs) without fine-tuning. While effectively controlling target traits (e.g., persona), coherency degradation remains a major obstacle to safety and practical deployment. We hypothesize...
ManiBench: A Benchmark for Testing Visual-Logic Drift and Syntactic Hallucinations in Manim Code Generation
arXiv:2603.13251v1 Announce Type: new Abstract: Traditional benchmarks like HumanEval and MBPP test logic and syntax effectively, but fail when code must produce dynamic, pedagogical visuals. We introduce ManiBench, a specialized benchmark evaluating LLM performance in generating Manim CE code, where...
DyACE: Dynamic Algorithm Co-evolution for Online Automated Heuristic Design with Large Language Model
arXiv:2603.13344v1 Announce Type: new Abstract: The prevailing paradigm in Automated Heuristic Design (AHD) typically relies on the assumption that a single, fixed algorithm can effectively navigate the shifting dynamics of a combinatorial search. This static approach often proves inadequate for...
AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints
arXiv:2603.13348v1 Announce Type: new Abstract: Tool use represents a critical capability for AI agents, with recent advances focusing on leveraging reinforcement learning (RL) to scale up the explicit reasoning process to achieve better performance. However, there are some key challenges...
Prompt Complexity Dilutes Structured Reasoning: A Follow-Up Study on the Car Wash Problem
arXiv:2603.13351v1 Announce Type: new Abstract: In a previous study [Jo, 2026], STAR reasoning (Situation, Task, Action, Result) raised car wash problem accuracy from 0% to 85% on Claude Sonnet 4.5, and to 100% with additional prompt layers. This follow-up asks:...
Early Rug Pull Warning for BSC Meme Tokens via Multi-Granularity Wash-Trading Pattern Profiling
arXiv:2603.13830v1 Announce Type: new Abstract: The high-frequency issuance and short-cycle speculation of meme tokens in decentralized finance (DeFi) have significantly amplified rug-pull risk. Existing approaches still struggle to provide stable early warning under scarce anomalies, incomplete labels, and limited interpretability....
The ARC of Progress towards AGI: A Living Survey of Abstraction and Reasoning
arXiv:2603.13372v1 Announce Type: new Abstract: The Abstraction and Reasoning Corpus (ARC-AGI) has become a key benchmark for fluid intelligence in AI. This survey presents the first cross-generation analysis of 82 approaches across three benchmark versions and the ARC Prize 2024-2025...
GroupGuard: A Framework for Modeling and Defending Collusive Attacks in Multi-Agent Systems
arXiv:2603.13940v1 Announce Type: new Abstract: While large language model-based agents demonstrate great potential in collaborative tasks, their interactivity also introduces security vulnerabilities. In this paper, we propose and model group collusive attacks, a highly destructive threat in which multiple agents...
Privacy Preserving Topic-wise Sentiment Analysis of the Iran Israel USA Conflict Using Federated Transformer Models
arXiv:2603.13655v1 Announce Type: new Abstract: The recent escalation of the Iran Israel USA conflict in 2026 has triggered widespread global discussions across social media platforms. As people increasingly use these platforms for expressing opinions, analyzing public sentiment from these discussions...
Multimodal Emotion Regression with Multi-Objective Optimization and VAD-Aware Audio Modeling for the 10th ABAW EMI Track
arXiv:2603.13760v1 Announce Type: new Abstract: We participated in the 10th ABAW Challenge, focusing on the Emotional Mimicry Intensity (EMI) Estimation track on the Hume-Vidmimic2 dataset. This task aims to predict six continuous emotion dimensions: Admiration, Amusement, Determination, Empathic Pain, Excitement,...
Do Large Language Models Get Caught in Hofstadter-Mobius Loops?
arXiv:2603.13378v1 Announce Type: new Abstract: In Arthur C. Clarke's 2010: Odyssey Two, HAL 9000's homicidal breakdown is diagnosed as a "Hofstadter-Mobius loop": a failure mode in which an autonomous system receives contradictory directives and, unable to reconcile them, defaults to...
Widespread Gender and Pronoun Bias in Moral Judgments Across LLMs
arXiv:2603.13636v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to assess moral or ethical statements, yet their judgments may reflect social and linguistic biases. This work presents a controlled, sentence-level study of how grammatical person, number, and...
The AI Fiction Paradox
arXiv:2603.13545v1 Announce Type: new Abstract: AI development has a fiction dependency problem: models are built on massive corpora of modern fiction and desperately need more of it, yet they struggle to generate it. I term this the AI-Fiction Paradox and...
Projection-Free Evolution Strategies for Continuous Prompt Search
arXiv:2603.13786v1 Announce Type: new Abstract: Continuous prompt search offers a computationally efficient alternative to conventional parameter tuning in natural language processing tasks. Nevertheless, its practical effectiveness can be significantly hindered by the black-box nature and the inherent high-dimensionality of the...
APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution
arXiv:2603.13853v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG), based on large language models (LLMs), serves as a vital approach to retrieving and leveraging external knowledge in various domain applications. When confronted with complex multi-hop questions, single-round retrieval is often insufficient...
OmniCompliance-100K: A Multi-Domain, Rule-Grounded, Real-World Safety Compliance Dataset
arXiv:2603.13933v1 Announce Type: new Abstract: Ensuring the safety and compliance of large language models (LLMs) is of paramount importance. However, existing LLM safety datasets often rely on ad-hoc taxonomies for data generation and suffer from a significant shortage of rule-grounded,...
ToolFlood: Beyond Selection -- Hiding Valid Tools from LLM Agents via Semantic Covering
arXiv:2603.13950v1 Announce Type: new Abstract: Large Language Model (LLM) agents increasingly use external tools for complex tasks and rely on embedding-based retrieval to select a small top-k subset for reasoning. As these systems scale, the robustness of this retrieval stage...
CMHL: Contrastive Multi-Head Learning for Emotionally Consistent Text Classification
arXiv:2603.14078v1 Announce Type: new Abstract: Textual Emotion Classification (TEC) is one of the most difficult NLP tasks. State of the art approaches rely on Large language models (LLMs) and multi-model ensembles. In this study, we challenge the assumption that larger...
OasisSimp: An Open-source Asian-English Sentence Simplification Dataset
arXiv:2603.14111v1 Announce Type: new Abstract: Sentence simplification aims to make complex text more accessible by reducing linguistic complexity while preserving the original meaning. However, progress in this area remains limited for mid-resource and low-resource languages due to the scarcity of...
Selective Fine-Tuning of GPT Architectures for Parameter-Efficient Clinical Text Classification
arXiv:2603.14183v1 Announce Type: new Abstract: The rapid expansion of electronic health record (EHR) systems has generated large volumes of unstructured clinical narratives that contain valuable information for disease identification, patient cohort discovery, and clinical decision support. Extracting structured knowledge from...
Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring
arXiv:2603.14251v1 Announce Type: new Abstract: Large Reasoning Language Models (LRLMs) demonstrate impressive capabilities on complex tasks by utilizing long Chain-of-Thought reasoning. However, they are prone to overthinking, which generates redundant reasoning steps that degrade both performance and efficiency. Recently, early-exit...
Automatic Inter-document Multi-hop Scientific QA Generation
arXiv:2603.14257v1 Announce Type: new Abstract: Existing automatic scientific question generation studies mainly focus on single-document factoid QA, overlooking the inter-document reasoning crucial for scientific understanding. We present AIM-SciQA, an automated framework for generating multi-document, multi-hop scientific QA datasets. AIM-SciQA extracts...