Labor & Employment

LOW Academic International

Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning

arXiv:2603.00296v1 Announce Type: new Abstract: Large reasoning models improve with more test-time computation, but often overthink, producing unnecessarily long chains-of-thought that raise cost without improving accuracy. Prior reinforcement learning approaches typically rely on a single outcome reward with trajectory-level length...

1 min 1 month, 1 week ago

ada

LOW Academic International

Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

arXiv:2603.02239v1 Announce Type: new Abstract: The Engineering Reasoning and Instruction (ERI) benchmark is a taxonomy-driven instruction dataset designed to train and evaluate engineering-capable large language models (LLMs) and agents. This dataset spans nine engineering fields (namely: civil, mechanical, electrical, chemical,...

1 min 1 month, 1 week ago

ada

LOW Academic International

AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows

arXiv:2603.02601v1 Announce Type: new Abstract: Autonomous AI agents are deployed at unprecedented scale, yet no principled methodology exists for verifying that an agent has not regressed after changes to its prompts, tools, models, or orchestration logic. We present AgentAssay, the...

1 min 1 month, 1 week ago

ada

LOW Academic International

See and Remember: A Multimodal Agent for Web Traversal

arXiv:2603.02626v1 Announce Type: new Abstract: Autonomous web navigation requires agents to perceive complex visual environments and maintain long-term context, yet current Large Language Model (LLM) based agents often struggle with spatial disorientation and navigation loops. In this paper, we propose...

1 min 1 month, 1 week ago

ada

LOW Academic International

A Natural Language Agentic Approach to Study Affective Polarization

arXiv:2603.02711v1 Announce Type: new Abstract: Affective polarization has been central to political and social studies, with growing focus on social media, where partisan divisions are often exacerbated. Real-world studies tend to have limited scope, while simulated studies suffer from insufficient...

1 min 1 month, 1 week ago

labor

LOW Academic International

Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

arXiv:2603.02798v1 Announce Type: new Abstract: As LLM-powered agents have been used for high-stakes decision-making, such as clinical diagnosis, it becomes critical to develop reliable verification of their decisions to facilitate trustworthy deployment. Yet, existing verifiers usually underperform owing to a...

1 min 1 month, 1 week ago

discrimination

LOW Academic International

SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

arXiv:2603.02908v1 Announce Type: new Abstract: In recent years, pre-trained large language models have achieved remarkable success across diverse tasks. Besides the pivotal role of self-supervised pre-training, their effectiveness in downstream applications also depends critically on the post-training process, which adapts...

1 min 1 month, 1 week ago

ada

LOW Academic International

ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization

arXiv:2603.02939v1 Announce Type: new Abstract: Recent advancements in reinforcement fine-tuning have significantly improved the reasoning ability of large language models (LLMs). In particular, methods such as group relative policy optimization (GRPO) have demonstrated strong capabilities across various fields. However, applying...

1 min 1 month, 1 week ago

ada

LOW Academic International

Saarthi for AGI: Towards Domain-Specific General Intelligence for Formal Verification

arXiv:2603.03175v1 Announce Type: new Abstract: Saarthi is an agentic AI framework that uses multi-agent collaboration to perform end-to-end formal verification. Even though the framework provides a complete flow from specification to coverage closure, with around 40% efficacy, there are several...

1 min 1 month, 1 week ago

labor

LOW Academic International

No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Language Models

arXiv:2603.03203v1 Announce Type: new Abstract: CDD, or Contamination Detection via output Distribution, identifies data contamination by measuring the peakedness of a model's sampled outputs. We study the conditions under which this approach succeeds and fails on small language models ranging...

1 min 1 month, 1 week ago

ada

LOW Academic International

Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals

arXiv:2603.03242v1 Announce Type: new Abstract: Language models deployed in online communities must adapt to norms that vary across social, cultural, and domain-specific contexts. Prior alignment approaches rely on explicit preference supervision or predefined principles, which are effective for well-resourced settings...

1 min 1 month, 1 week ago

ada

LOW Academic International

RO-N3WS: Enhancing Generalization in Low-Resource ASR with Diverse Romanian Speech Benchmarks

arXiv:2603.02368v1 Announce Type: new Abstract: We introduce RO-N3WS, a benchmark Romanian speech dataset designed to improve generalization in automatic speech recognition (ASR), particularly in low-resource and out-of-distribution (OOD) conditions. RO-N3WS comprises over 126 hours of transcribed audio collected from broadcast...

1 min 1 month, 1 week ago

ada

LOW Academic International

GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR

arXiv:2603.02464v1 Announce Type: new Abstract: Automatic Speech Recognition (ASR) in dialect-heavy settings remains challenging due to strong regional variation and limited labeled data. We propose GLoRIA, a parameter-efficient adaptation framework that leverages metadata (e.g., coordinates) to modulate low-rank updates in...

1 min 1 month, 1 week ago

ada

LOW Academic International

MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation

arXiv:2603.03680v1 Announce Type: new Abstract: Large Language Model (LLM) agents have demonstrated remarkable proficiency in learned tasks, yet they often struggle to adapt to non-stationary environments with feedback. While In-Context Learning and external memory offer some flexibility, they fail to...

1 min 1 month, 1 week ago

ada

LOW Academic International

RAGNav: A Retrieval-Augmented Topological Reasoning Framework for Multi-Goal Visual-Language Navigation

arXiv:2603.03745v1 Announce Type: new Abstract: Vision-Language Navigation (VLN) is evolving from single-point pathfinding toward the more challenging Multi-Goal VLN. This task requires agents to accurately identify multiple entities while collaboratively reasoning over their spatial-physical constraints and sequential execution order. However,...

1 min 1 month, 1 week ago

labor

LOW Academic International

In-Context Environments Induce Evaluation-Awareness in Language Models

arXiv:2603.03824v1 Announce Type: new Abstract: Humans often become more self-aware under threat, yet can lose self-awareness when absorbed in a task; we hypothesize that language models exhibit environment-dependent \textit{evaluation awareness}. This raises concerns that models could strategically underperform, or \textit{sandbag},...

1 min 1 month, 1 week ago

ada

LOW Academic International

Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions

arXiv:2603.04191v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly serving as personal assistants, where users share complex and diverse preferences over extended interactions. However, assessing how well LLMs can follow these preferences in realistic, long-term situations remains underexplored....

1 min 1 month, 1 week ago

ada

LOW Academic International

Capability Thresholds and Manufacturing Topology: How Embodied Intelligence Triggers Phase Transitions in Economic Geography

arXiv:2603.04457v1 Announce Type: new Abstract: The fundamental topology of manufacturing has not undergone a paradigm-level transformation since Henry Ford's moving assembly line in 1913. Every major innovation of the past century, from the Toyota Production System to Industry 4.0, has...

1 min 1 month, 1 week ago

labor

LOW Academic International

Adaptive Memory Admission Control for LLM Agents

arXiv:2603.04549v1 Announce Type: new Abstract: LLM-based agents increasingly rely on long-term memory to support multi-session reasoning and interaction, yet current systems provide little control over what information is retained. In practice, agents either accumulate large volumes of conversational content, including...

1 min 1 month, 1 week ago

ada

LOW Academic International

VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment

arXiv:2603.04822v1 Announce Type: new Abstract: Aligning Large Language Models (LLMs) with nuanced human values remains a critical challenge, as existing methods like Reinforcement Learning from Human Feedback (RLHF) often handle only coarse-grained attributes. In practice, fine-tuning LLMs on task-specific datasets...

1 min 1 month, 1 week ago

ada

LOW Academic International

SEA-TS: Self-Evolving Agent for Autonomous Code Generation of Time Series Forecasting Algorithms

arXiv:2603.04873v1 Announce Type: new Abstract: Accurate time series forecasting underpins decision-making across domains, yet conventional ML development suffers from data scarcity in new deployments, poor adaptability under distribution shift, and diminishing returns from manual iteration. We propose Self-Evolving Agent for...

1 min 1 month, 1 week ago

ada

LOW Academic International

Bounded State in an Infinite Horizon: Proactive Hierarchical Memory for Ad-Hoc Recall over Streaming Dialogues

arXiv:2603.04885v1 Announce Type: new Abstract: Real-world dialogue usually unfolds as an infinite stream. It thus requires bounded-state memory mechanisms to operate within an infinite horizon. However, existing read-then-think memory is fundamentally misaligned with this setting, as it cannot support ad-hoc...

1 min 1 month, 1 week ago

ada

LOW Academic International

Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

arXiv:2603.04896v1 Announce Type: new Abstract: The rapid adoption of vision-language models (VLMs) has heightened the demand for robust intellectual property (IP) protection of these high-value pretrained models. Effective IP protection should proactively confine model deployment within authorized domains and prevent...

1 min 1 month, 1 week ago

ada

LOW Academic International

Knowledge-informed Bidding with Dual-process Control for Online Advertising

arXiv:2603.04920v1 Announce Type: new Abstract: Bid optimization in online advertising relies on black-box machine-learning models that learn bidding decisions from historical data. However, these approaches fail to replicate human experts' adaptive, experience-driven, and globally coherent decisions. Specifically, they generalize poorly...

1 min 1 month, 1 week ago

ada

LOW Academic International

The Trilingual Triad Framework: Integrating Design, AI, and Domain Knowledge in No-code AI Smart City Course

arXiv:2603.05036v1 Announce Type: new Abstract: This paper introduces the "Trilingual Triad" framework, a model that explains how students learn to design with generative artificial intelligence (AI) through the integration of Design, AI, and Domain Knowledge. As generative AI rapidly enters...

1 min 1 month, 1 week ago

labor

LOW Academic International

One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

arXiv:2603.04411v1 Announce Type: new Abstract: Despite the remarkable progress of Large Language Models (LLMs), the escalating memory footprint of the Key-Value (KV) cache remains a critical bottleneck for efficient inference. While dimensionality reduction offers a promising compression avenue, existing approaches...

1 min 1 month, 1 week ago

ada

LOW Academic International

The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning

arXiv:2603.04415v1 Announce Type: new Abstract: While reasoning-enhanced Large Language Models (LLMs) have demonstrated remarkable advances in complex tasks such as mathematics and coding, their effectiveness across universal multimodal scenarios remains uncertain. The trend of releasing parallel "Instruct" and "Thinking" models...

1 min 1 month, 1 week ago

ada

LOW Academic International

Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

arXiv:2603.04421v1 Announce Type: new Abstract: Multi-agent large language model (LLM) systems have emerged as a promising approach for clinical diagnosis, leveraging collaboration among agents to refine medical reasoning. However, most existing frameworks rely on single-vendor teams (e.g., multiple agents from...

1 min 1 month, 1 week ago

labor

LOW Academic International

Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models

arXiv:2603.04453v1 Announce Type: new Abstract: The use of multimodal large language models has become widespread, and as such the study of these models and their failure points has become of utmost importance. We study a novel mode of failure that...

1 min 1 month, 1 week ago

ada

LOW Academic International

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

arXiv:2603.04597v1 Announce Type: new Abstract: Large language models (LLMs) typically receive diverse natural language (NL) feedback through interaction with the environment. However, current reinforcement learning (RL) algorithms rely solely on scalar rewards, leaving the rich information in NL feedback underutilized...

1 min 1 month, 1 week ago

ada

Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning

Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows

See and Remember: A Multimodal Agent for Web Traversal

A Natural Language Agentic Approach to Study Affective Polarization

Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization

Saarthi for AGI: Towards Domain-Specific General Intelligence for Formal Verification

No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Language Models

Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals

RO-N3WS: Enhancing Generalization in Low-Resource ASR with Diverse Romanian Speech Benchmarks

GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR

MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation

RAGNav: A Retrieval-Augmented Topological Reasoning Framework for Multi-Goal Visual-Language Navigation

In-Context Environments Induce Evaluation-Awareness in Language Models

Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions

Capability Thresholds and Manufacturing Topology: How Embodied Intelligence Triggers Phase Transitions in Economic Geography

Adaptive Memory Admission Control for LLM Agents

VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment

SEA-TS: Self-Evolving Agent for Autonomous Code Generation of Time Series Forecasting Algorithms

Bounded State in an Infinite Horizon: Proactive Hierarchical Memory for Ad-Hoc Recall over Streaming Dialogues

Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

Knowledge-informed Bidding with Dual-process Control for Online Advertising

The Trilingual Triad Framework: Integrating Design, AI, and Domain Knowledge in No-code AI Smart City Course

One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning

Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.