Protein Design with Agent Rosetta: A Case Study for Specialized Scientific Agents
arXiv:2603.15952v1 Announce Type: new Abstract: Large language models (LLMs) are capable of emulating reasoning and using tools, creating opportunities for autonomous agents that execute complex scientific tasks. Protein design provides a natural testbed: although machine learning (ML) methods achieve strong...
COGNAC at SemEval-2026 Task 5: LLM Ensembles for Human-Level Word Sense Plausibility Rating in Challenging Narratives
arXiv:2603.15897v1 Announce Type: new Abstract: We describe our system for SemEval-2026 Task 5, which requires rating the plausibility of given word senses of homonyms in short stories on a 5-point Likert scale. Systems are evaluated by the unweighted average of...
Argumentative Human-AI Decision-Making: Toward AI Agents That Reason With Us, Not For Us
arXiv:2603.15946v1 Announce Type: new Abstract: Computational argumentation offers formal frameworks for transparent, verifiable reasoning but has traditionally been limited by its reliance on domain-specific information and extensive feature engineering. In contrast, LLMs excel at processing unstructured text, yet their opaque...
Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization
arXiv:2603.16105v1 Announce Type: new Abstract: Post-training model compression is essential for enhancing the portability of Large Language Models (LLMs) while preserving their performance. While several compression approaches have been proposed, less emphasis has been placed on selecting the most suitable...
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
arXiv:2603.16127v1 Announce Type: new Abstract: We investigate the role of learning rate scheduling in the large-scale pre-training of large language models, focusing on its influence on downstream performance after supervised fine-tuning (SFT). Decay-based learning rate schedulers are widely used to...
Parametric Social Identity Injection and Diversification in Public Opinion Simulation
arXiv:2603.16142v1 Announce Type: new Abstract: Large language models (LLMs) have recently been adopted as synthetic agents for public opinion simulation, offering a promising alternative to costly and slow human surveys. Despite their scalability, current LLM-based simulation methods fail to capture...
Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR
arXiv:2603.16184v1 Announce Type: new Abstract: We present Polyglot-Lion, a family of compact multilingual automatic speech recognition (ASR) models tailored for the linguistic landscape of Singapore, covering English, Mandarin, Tamil, and Malay. Our models are obtained by fine-tuning Qwen3-ASR-0.6B and Qwen3-ASR-1.7B...
How often do Answers Change? Estimating Recency Requirements in Question Answering
arXiv:2603.16544v1 Announce Type: new Abstract: Large language models (LLMs) often rely on outdated knowledge when answering time-sensitive questions, leading to confident yet incorrect responses. Without explicit signals indicating whether up-to-date information is required, models struggle to decide when to retrieve...
Embedding-Aware Feature Discovery: Bridging Latent Representations and Interpretable Features in Event Sequences
arXiv:2603.15713v1 Announce Type: new Abstract: Industrial financial systems operate on temporal event sequences such as transactions, user actions, and system logs. While recent research emphasizes representation learning and large language models, production systems continue to rely heavily on handcrafted statistical...
Mask Is What DLLM Needs: A Masked Data Training Paradigm for Diffusion LLMs
arXiv:2603.15803v1 Announce Type: new Abstract: Discrete diffusion models offer global context awareness and flexible parallel generation. However, uniform random noise schedulers in standard DLLM training overlook the highly non-uniform information density inherent in real-world sequences. This wastes optimization resources on...
Longitudinal Risk Prediction in Mammography with Privileged History Distillation
arXiv:2603.15814v1 Announce Type: new Abstract: Breast cancer remains a leading cause of cancer-related mortality worldwide. Longitudinal mammography risk prediction models improve multi-year breast cancer risk prediction based on prior screening exams. However, in real-world clinical practice, longitudinal histories are often...
Counteractive RL: Rethinking Core Principles for Efficient and Scalable Deep Reinforcement Learning
arXiv:2603.15871v1 Announce Type: new Abstract: Following the pivotal success of learning strategies to win at tasks, solely by interacting with an environment without any supervision, agents have gained the ability to make sequential decisions in complex MDPs. Yet, reinforcement learning...
AI’s ‘boys’ club’ could widen the wealth gap for women, says Rana el Kaliouby
AI investor Rana el Kaliouby warns that if women are shut out of AI funding and leadership, the consequences will be grim.
InterventionLens: A Multi-Agent Framework for Detecting ASD Intervention Strategies in Parent-Child Shared Reading
arXiv:2603.13710v1 Announce Type: new Abstract: Home-based interventions like parent-child shared reading provide a cost-effective approach for supporting children with autism spectrum disorder (ASD). However, analyzing caregiver intervention strategies in naturalistic home interactions typically relies on expert annotation, which is costly,...
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings
arXiv:2603.13594v1 Announce Type: new Abstract: Large language models are shifting from passive information providers to active agents intended for complex workflows. However, their deployment as reliable AI workers in enterprise is stalled by benchmarks that fail to capture the intricacies...
Privacy Preserving Topic-wise Sentiment Analysis of the Iran Israel USA Conflict Using Federated Transformer Models
arXiv:2603.13655v1 Announce Type: new Abstract: The recent escalation of the Iran Israel USA conflict in 2026 has triggered widespread global discussions across social media platforms. As people increasingly use these platforms for expressing opinions, analyzing public sentiment from these discussions...
ManiBench: A Benchmark for Testing Visual-Logic Drift and Syntactic Hallucinations in Manim Code Generation
arXiv:2603.13251v1 Announce Type: new Abstract: Traditional benchmarks like HumanEval and MBPP test logic and syntax effectively, but fail when code must produce dynamic, pedagogical visuals. We introduce ManiBench, a specialized benchmark evaluating LLM performance in generating Manim CE code, where...
AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints
arXiv:2603.13348v1 Announce Type: new Abstract: Tool use represents a critical capability for AI agents, with recent advances focusing on leveraging reinforcement learning (RL) to scale up the explicit reasoning process to achieve better performance. However, there are some key challenges...
Do Large Language Models Get Caught in Hofstadter-Mobius Loops?
arXiv:2603.13378v1 Announce Type: new Abstract: In Arthur C. Clarke's 2010: Odyssey Two, HAL 9000's homicidal breakdown is diagnosed as a "Hofstadter-Mobius loop": a failure mode in which an autonomous system receives contradictory directives and, unable to reconcile them, defaults to...
Prompt Complexity Dilutes Structured Reasoning: A Follow-Up Study on the Car Wash Problem
arXiv:2603.13351v1 Announce Type: new Abstract: In a previous study [Jo, 2026], STAR reasoning (Situation, Task, Action, Result) raised car wash problem accuracy from 0% to 85% on Claude Sonnet 4.5, and to 100% with additional prompt layers. This follow-up asks:...
Projection-Free Evolution Strategies for Continuous Prompt Search
arXiv:2603.13786v1 Announce Type: new Abstract: Continuous prompt search offers a computationally efficient alternative to conventional parameter tuning in natural language processing tasks. Nevertheless, its practical effectiveness can be significantly hindered by the black-box nature and the inherent high-dimensionality of the...
DeceptGuard :A Constitutional Oversight Framework For Detecting Deception in LLM Agents
arXiv:2603.13791v1 Announce Type: new Abstract: Reliable detection of deceptive behavior in Large Language Model (LLM) agents is an essential prerequisite for safe deployment in high-stakes agentic contexts. Prior work on scheming detection has focused exclusively on black-box monitors that observe...
vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
arXiv:2603.13966v1 Announce Type: new Abstract: Vision Language Action VLA models are typically evaluated using per benchmark scripts maintained independently by each model repository, leading to duplicated code, dependency conflicts, and underspecified protocols. We present vla eval, an open source evaluation...
Early Rug Pull Warning for BSC Meme Tokens via Multi-Granularity Wash-Trading Pattern Profiling
arXiv:2603.13830v1 Announce Type: new Abstract: The high-frequency issuance and short-cycle speculation of meme tokens in decentralized finance (DeFi) have significantly amplified rug-pull risk. Existing approaches still struggle to provide stable early warning under scarce anomalies, incomplete labels, and limited interpretability....
Steering at the Source: Style Modulation Heads for Robust Persona Control
arXiv:2603.13249v1 Announce Type: new Abstract: Activation steering offers a computationally efficient mechanism for controlling Large Language Models (LLMs) without fine-tuning. While effectively controlling target traits (e.g., persona), coherency degradation remains a major obstacle to safety and practical deployment. We hypothesize...
Multimodal Emotion Regression with Multi-Objective Optimization and VAD-Aware Audio Modeling for the 10th ABAW EMI Track
arXiv:2603.13760v1 Announce Type: new Abstract: We participated in the 10th ABAW Challenge, focusing on the Emotional Mimicry Intensity (EMI) Estimation track on the Hume-Vidmimic2 dataset. This task aims to predict six continuous emotion dimensions: Admiration, Amusement, Determination, Empathic Pain, Excitement,...
Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation
arXiv:2603.13260v1 Announce Type: new Abstract: Knowledge Distillation (KD) can transfer the reasoning abilities of large models to smaller ones, which can reduce the costs to generate Chain-of-Thoughts for reasoning tasks. KD methods typically ask the student to mimic the teacher's...
APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution
arXiv:2603.13853v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG), based on large language models (LLMs), serves as a vital approach to retrieving and leveraging external knowledge in various domain applications. When confronted with complex multi-hop questions, single-round retrieval is often insufficient...
ToolFlood: Beyond Selection -- Hiding Valid Tools from LLM Agents via Semantic Covering
arXiv:2603.13950v1 Announce Type: new Abstract: Large Language Model (LLM) agents increasingly use external tools for complex tasks and rely on embedding-based retrieval to select a small top-k subset for reasoning. As these systems scale, the robustness of this retrieval stage...
CMHL: Contrastive Multi-Head Learning for Emotionally Consistent Text Classification
arXiv:2603.14078v1 Announce Type: new Abstract: Textual Emotion Classification (TEC) is one of the most difficult NLP tasks. State of the art approaches rely on Large language models (LLMs) and multi-model ensembles. In this study, we challenge the assumption that larger...