Immigration Law

LOW Academic International

Building Safe and Deployable Clinical Natural Language Processing under Temporal Leakage Constraints

arXiv:2602.15852v1 Announce Type: cross Abstract: Clinical natural language processing (NLP) models have shown promise for supporting hospital discharge planning by leveraging narrative clinical documentation. However, note-based models are particularly vulnerable to temporal and lexical leakage, where documentation artifacts encode future...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Playing With AI: How Do State-Of-The-Art Large Language Models Perform in the 1977 Text-Based Adventure Game Zork?

arXiv:2602.15867v1 Announce Type: cross Abstract: In this positioning paper, we evaluate the problem-solving and reasoning capabilities of contemporary Large Language Models (LLMs) through their performance in Zork, the seminal text-based adventure game first released in 1977. The game's dialogue-based structure...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution

arXiv:2602.15882v1 Announce Type: cross Abstract: General vision-language models increasingly support unified spatiotemporal reasoning over long video streams, yet deploying such capabilities on robots remains constrained by the prohibitive latency of processing long-horizon histories and generating high-dimensional future predictions. To bridge...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Understand Then Memory: A Cognitive Gist-Driven RAG Framework with Global Semantic Diffusion

arXiv:2602.15895v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) effectively mitigates hallucinations in LLMs by incorporating external knowledge. However, the inherent discrete representation of text in existing frameworks often results in a loss of semantic integrity, leading to retrieval deviations. Inspired...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Sales Research Agent and Sales Research Bench

arXiv:2602.17017v1 Announce Type: new Abstract: Enterprises increasingly need AI systems that can answer sales-leader questions over live, customized CRM data, but most available models do not expose transparent, repeatable evidence of quality. This paper describes the Sales Research Agent in...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

arXiv:2602.17053v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) exhibit strong performance, yet often produce rationales that sound plausible but fail to reflect their true decision process, undermining reliability and trust. We introduce a formal framework for reasoning faithfulness, defined...

1 min 1 month, 4 weeks ago

tps

LOW Academic International

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

arXiv:2602.17062v1 Announce Type: new Abstract: Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during training, often...

1 min 1 month, 4 weeks ago

tps

LOW Academic International

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

arXiv:2602.17066v1 Announce Type: new Abstract: We introduce Predictive Batch Scheduling (PBS), a novel training optimization technique that accelerates language model convergence by dynamically prioritizing high-loss samples during batch construction. Unlike curriculum learning approaches that require predefined difficulty metrics or hard...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

arXiv:2602.16811v1 Announce Type: new Abstract: Recent advancements in Natural Language Processing and Deep Learning have enabled the development of Large Language Models (LLMs), which have significantly advanced the state-of-the-art across a wide range of tasks, including Question Answering (QA). Despite...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

arXiv:2602.16852v1 Announce Type: new Abstract: Meenzerisch, the dialect spoken in the German city of Mainz, is also the traditional language of the Mainz carnival, a yearly celebration well known throughout Germany. However, Meenzerisch is on the verge of dying out-a...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders

arXiv:2602.16938v1 Announce Type: new Abstract: The promise of LLM-based user simulators to improve conversational AI is hindered by a critical "realism gap," leading to systems that are optimized for simulated interactions, but may fail to perform well in the real...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Eigenmood Space: Uncertainty-Aware Spectral Graph Analysis of Psychological Patterns in Classical Persian Poetry

arXiv:2602.16959v1 Announce Type: new Abstract: Classical Persian poetry is a historically sustained archive in which affective life is expressed through metaphor, intertextual convention, and rhetorical indirection. These properties make close reading indispensable while limiting reproducible comparison at scale. We present...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

arXiv:2602.17003v1 Announce Type: new Abstract: Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous queries by inferring user...

1 min 1 month, 4 weeks ago

tps

LOW Academic International

The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI

arXiv:2602.17127v1 Announce Type: new Abstract: As Large Language Models (LLMs) transition from standalone chat interfaces to foundational reasoning layers in multi-agent systems and recursive evaluation loops (LLM-as-a-judge), the detection of durable, provider-level behavioral signatures becomes a critical requirement for safety...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform

arXiv:2602.17194v1 Announce Type: new Abstract: Text-based telemedicine has become a common mode of care, requiring clinicians to deliver medical advice clearly and effectively in writing. As platforms increasingly rely on patient ratings and feedback, clinicians face growing pressure to maintain...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study

arXiv:2602.17262v1 Announce Type: new Abstract: Human self-report questionnaires are increasingly used in NLP to benchmark and audit large language models (LLMs), from persona consistency to safety and bias assessments. Yet these instruments presume honest responding; in evaluative contexts, LLMs can...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

arXiv:2602.17283v1 Announce Type: new Abstract: While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To...

1 min 1 month, 4 weeks ago

tps

LOW Academic International

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

arXiv:2602.17316v1 Announce Type: new Abstract: The rapid advancement of Large Language Models (LLMs) has established standardized evaluation benchmarks as the primary instrument for model comparison. Yet, their reliability is increasingly questioned due to sensitivity to shallow variations in input prompts....

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

arXiv:2602.17542v1 Announce Type: new Abstract: Fine-grained skill representations, commonly referred to as knowledge components (KCs), are fundamental to many approaches in student modeling and learning analytics. However, KC-level correctness labels are rarely available in real-world datasets, especially for open-ended programming...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Modeling Distinct Human Interaction in Web Agents

arXiv:2602.17588v1 Announce Type: new Abstract: Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene,...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency

arXiv:2602.16787v1 Announce Type: cross Abstract: Despite their strong performance on reasoning benchmarks, large language models (LLMs) have proven brittle when presented with counterfactual questions, suggesting weaknesses in their causal reasoning ability. While recent work has demonstrated that labeled counterfactual tasks...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Hybrid-Gym: Training Coding Agents to Generalize Across Tasks

arXiv:2602.16819v1 Announce Type: cross Abstract: When assessing the quality of coding agents, predominant benchmarks focus on solving single issues on GitHub, such as SWE-Bench. In contrast, in real use, these agents solve more various and complex tasks that involve other...

1 min 1 month, 4 weeks ago

tps

LOW Academic International

Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

arXiv:2602.16740v1 Announce Type: new Abstract: In mechanistic interpretability, recent work scrutinizes transformer "circuits" - sparse, mono or multi layer sub computations, that may reflect human understandable functions. Yet, these network circuits are rarely acid-tested for their stability across different instances...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

arXiv:2602.16742v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has been shown effective in enhancing the visual reflection and reasoning capabilities of Large Multimodal Models (LMMs). However, existing datasets are predominantly derived from either small-scale manual construction or...

1 min 1 month, 4 weeks ago

tps

LOW Academic International

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

arXiv:2602.16745v1 Announce Type: new Abstract: Test-time scaling can improve model performance by aggregating stochastic reasoning trajectories. However, achieving sample-efficient test-time self-consistency under a limited budget remains an open challenge. We introduce PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled...

1 min 1 month, 4 weeks ago

tps

LOW Academic International

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

arXiv:2602.16746v1 Announce Type: new Abstract: Grokking -- the delayed transition from memorization to generalization in small algorithmic tasks -- remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study

arXiv:2602.16833v1 Announce Type: new Abstract: Exploration remains a key bottleneck for reinforcement learning (RL) post-training of large language models (LLMs), where sparse feedback and large action spaces can lead to premature collapse into repetitive behaviors. We propose Verbalized Action Masking...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Early-Warning Signals of Grokking via Loss-Landscape Geometry

arXiv:2602.16967v1 Announce Type: new Abstract: Grokking -- the abrupt transition from memorization to generalization after prolonged training -- has been linked to confinement on low-dimensional execution manifolds in modular arithmetic. Whether this mechanism extends beyond arithmetic remains open. We study...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

Fail-Closed Alignment for Large Language Models

arXiv:2602.16977v1 Announce Type: new Abstract: We identify a structural weakness in current large language model (LLM) alignment: modern refusal mechanisms are fail-open. While existing approaches encode refusal behaviors across multiple latent features, suppressing a single dominant feature$-$via prompt-based jailbreaks$-$can cause...

1 min 1 month, 4 weeks ago

ead

LOW Academic International

WS-GRPO: Weakly-Supervised Group-Relative Policy Optimization for Rollout-Efficient Reasoning

arXiv:2602.17025v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) is effective for training language models on complex reasoning. However, since the objective is defined relative to a group of sampled trajectories, extended deliberation can create more chances to realize...

1 min 1 month, 4 weeks ago

ead

Building Safe and Deployable Clinical Natural Language Processing under Temporal Leakage Constraints

Playing With AI: How Do State-Of-The-Art Large Language Models Perform in the 1977 Text-Based Adventure Game Zork?

FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution

Understand Then Memory: A Cognitive Gist-Driven RAG Framework with Global Semantic Diffusion

Sales Research Agent and Sales Research Bench

RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders

Eigenmood Space: Uncertainty-Aware Spectral Graph Analysis of Psychological Patterns in Classical Persian Poetry

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI

What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform

Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

Modeling Distinct Human Interaction in Web Agents

Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency

Hybrid-Gym: Training Coding Agents to Generalize Across Tasks

Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study

Early-Warning Signals of Grokking via Loss-Landscape Geometry

Fail-Closed Alignment for Large Language Models

WS-GRPO: Weakly-Supervised Group-Relative Policy Optimization for Rollout-Efficient Reasoning

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.