LIDS: LLM Summary Inference Under the Layered Lens
arXiv:2603.00105v1 Announce Type: new Abstract: Large language models (LLMs) have gained significant attention by many researchers and practitioners in natural language processing (NLP) since the introduction of ChatGPT in 2022. One notable feature of ChatGPT is its ability to generate...
OSF: On Pre-training and Scaling of Sleep Foundation Models
arXiv:2603.00190v1 Announce Type: new Abstract: Polysomnography (PSG) provides the gold standard for sleep assessment but suffers from substantial heterogeneity across recording devices and cohorts. There have been growing efforts to build general-purpose foundation models (FMs) for sleep physiology, but lack...
Humans and LLMs Diverge on Probabilistic Inferences
arXiv:2602.23546v1 Announce Type: new Abstract: Human reasoning often involves working over limited information to arrive at probabilistic conclusions. In its simplest form, this involves making an inference that is not strictly entailed by a premise, but rather only likely given...
Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations
arXiv:2602.23577v1 Announce Type: new Abstract: Suicide remains a pressing global public health concern. While social media platforms offer opportunities for early risk detection through online conversation trees, existing approaches face two major limitations: (1) They rely on predefined rules (e.g.,...
TRIZ-RAGNER: A Retrieval-Augmented Large Language Model for TRIZ-Aware Named Entity Recognition in Patent-Based Contradiction Mining
arXiv:2602.23656v1 Announce Type: new Abstract: TRIZ-based contradiction mining is a fundamental task in patent analysis and systematic innovation, as it enables the identification of improving and worsening technical parameters that drive inventive problem solving. However, existing approaches largely rely on...
Structured Prompt Optimization for Few-Shot Text Classification via Semantic Alignment in Latent Space
arXiv:2602.23753v1 Announce Type: new Abstract: This study addresses the issues of semantic entanglement, unclear label structure, and insufficient feature representation in few-shot text classification, and proposes an optimization framework based on structured prompts to enhance semantic understanding and task adaptation...
The Astonishing Ability of Large Language Models to Parse Jabberwockified Language
arXiv:2602.23928v1 Announce Type: new Abstract: We show that large language models (LLMs) have an astonishing ability to recover meaning from severely degraded English texts. Texts in which content words have been randomly substituted by nonsense strings, e.g., "At the ghybe...
MemEmo: Evaluating Emotion in Memory Systems of Agents
arXiv:2602.23944v1 Announce Type: new Abstract: Memory systems address the challenge of context loss in Large Language Model during prolonged interactions. However, compared to human cognition, the efficacy of these systems in processing emotion-related information remains inconclusive. To address this gap,...
Dialect and Gender Bias in YouTube's Spanish Captioning System
arXiv:2602.24002v1 Announce Type: new Abstract: Spanish is the official language of twenty-one countries and is spoken by over 441 million people. Naturally, there are many variations in how Spanish is spoken across these countries. Media platforms such as YouTube rely...
Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis
arXiv:2602.24060v1 Announce Type: new Abstract: Large language models (LLMs) with reasoning capabilities have fueled a compelling narrative that reasoning universally improves performance across language tasks. We test this claim through a comprehensive evaluation of 504 configurations across seven model families--including...
Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks
arXiv:2602.23898v1 Announce Type: cross Abstract: Referring Expression Comprehension (REC) links language to region level visual perception. Standard benchmarks (RefCOCO, RefCOCO+, RefCOCOg) have progressed rapidly with multimodal LLMs but remain weak tests of visual reasoning and grounding: (i) many expressions are...
Actor-Critic Pretraining for Proximal Policy Optimization
arXiv:2602.23804v1 Announce Type: new Abstract: Reinforcement learning (RL) actor-critic algorithms enable autonomous learning but often require a large number of environment interactions, which limits their applicability in robotics. Leveraging expert data can reduce the number of required environment interactions. A...
Grounding LLMs in Scientific Discovery via Embodied Actions
arXiv:2602.20639v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown significant potential in scientific discovery but struggle to bridge the gap between theoretical reasoning and verifiable physical simulation. Existing solutions operate in a passive "execute-then-response" loop and thus lacks...
Counterfactual Simulation Training for Chain-of-Thought Faithfulness
arXiv:2602.20710v1 Announce Type: new Abstract: Inspecting Chain-of-Thought reasoning is among the most common means of understanding why an LLM produced its output. But well-known problems with CoT faithfulness severely limit what insights can be gained from this practice. In this...
Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback
arXiv:2602.20728v1 Announce Type: new Abstract: Reward design has been one of the central challenges for real world reinforcement learning (RL) deployment, especially in settings with multiple objectives. Preference-based RL offers an appealing alternative by learning from human preferences over pairs...
PyVision-RL: Forging Open Agentic Vision Models via RL
arXiv:2602.20739v1 Announce Type: new Abstract: Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where models learn to reduce tool usage and multi-turn reasoning, limiting the benefits of agentic behavior. We introduce PyVision-RL, a reinforcement learning framework for...
HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG
arXiv:2602.20926v1 Announce Type: new Abstract: Large Language Models (LLMs) often struggle with inherent knowledge boundaries and hallucinations, limiting their reliability in knowledge-intensive tasks. While Retrieval-Augmented Generation (RAG) mitigates these issues, it frequently overlooks structural interdependencies essential for multi-hop reasoning. Graph-based...
Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning
arXiv:2602.20528v1 Announce Type: new Abstract: The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation. Unlike conventional autoregressive language models limited to token-by-token decisions, STAR-LDM incorporates a "thinking" phase that pauses generation to refine a semantic plan...
Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem
arXiv:2602.21814v1 Announce Type: new Abstract: Large language models consistently fail the "car wash problem," a viral reasoning benchmark requiring implicit physical constraint inference. We present a variable isolation study (n=20 per condition, 6 conditions, 120 total trials) examining which prompt...
2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support
arXiv:2602.21889v1 Announce Type: new Abstract: Across a growing number of fields, human decision making is supported by predictions from AI models. However, we still lack a deep understanding of the effects of adoption of these technologies. In this paper, we...
Petri Net Relaxation for Infeasibility Explanation and Sequential Task Planning
arXiv:2602.22094v1 Announce Type: new Abstract: Plans often change due to changes in the situation or our understanding of the situation. Sometimes, a feasible plan may not even exist, and identifying such infeasibilities is useful to determine when requirements need adjustment....
The Fundamental Right to Education
ARTICLE The Fundamental Right to Education Derek W. Black* New litigation has revived one of the most important questions of constitutional law: Is education a fundamental right? The Court’s previous answers have been disappointing. While the Court has hinted that...
Google looks to tackle longstanding RCS spam in India — but not alone
Google is integrating carrier-level filtering into RCS in India through a partnership with Airtel to strengthen protections against spam.
ImpRIF: Stronger Implicit Reasoning Leads to Better Complex Instruction Following
arXiv:2602.21228v1 Announce Type: cross Abstract: As applications of large language models (LLMs) become increasingly complex, the demand for robust complex instruction following capabilities is growing accordingly. We argue that a thorough understanding of the instruction itself, especially the latent reasoning...
AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression
arXiv:2602.21233v1 Announce Type: cross Abstract: This technical report introduces AngelSlim, a comprehensive and versatile toolkit for large model compression developed by the Tencent Hunyuan team. By consolidating cutting-edge algorithms, including quantization, speculative decoding, token pruning, and distillation. AngelSlim provides a...
Equitable Evaluation via Elicitation
arXiv:2602.21327v1 Announce Type: cross Abstract: Individuals with similar qualifications and skills may vary in their demeanor, or outward manner: some tend toward self-promotion while others are modest to the point of omitting crucial information. Comparing the self-descriptions of equally qualified...
Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment
arXiv:2602.21346v1 Announce Type: cross Abstract: Recent advances in alignment techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Direct Preference Optimization (DPO) have improved the safety of large language models (LLMs). However, these LLMs remain vulnerable...
Towards Controllable Video Synthesis of Routine and Rare OR Events
arXiv:2602.21365v1 Announce Type: cross Abstract: Purpose: Curating large-scale datasets of operating room (OR) workflow, encompassing rare, safety-critical, or atypical events, remains operationally and ethically challenging. This data bottleneck complicates the development of ambient intelligence for detecting, understanding, and mitigating rare...
Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages
arXiv:2602.21374v1 Announce Type: cross Abstract: Extracting clinical information from medical transcripts in low-resource languages remains a significant challenge in healthcare natural language processing (NLP). This study evaluates a two-step pipeline combining Aya-expanse-8B as a Persian-to-English translation model with five open-source...
FIRE: A Comprehensive Benchmark for Financial Intelligence and Reasoning Evaluation
arXiv:2602.22273v1 Announce Type: new Abstract: We introduce FIRE, a comprehensive benchmark designed to evaluate both the theoretical financial knowledge of LLMs and their ability to handle practical business scenarios. For theoretical assessment, we curate a diverse set of examination questions...