Immigration Law

LOW Academic European Union

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

arXiv:2602.16943v1 Announce Type: new Abstract: Large language models deployed as agents increasingly interact with external systems through tool calls--actions with real-world consequences that text outputs alone do not carry. Safety evaluations, however, overwhelmingly measure text-level refusal behavior, leaving a critical...

1 min 2 months ago

ead

LOW Academic United States

HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signing

arXiv:2602.16976v1 Announce Type: new Abstract: Here's the corrected paragraph with all punctuation and formatting issues fixed: Financial risk systems usually follow a two-step routine: a model predicts return or risk, and then an optimizer makes a decision such as a...

1 min 2 months ago

ead

LOW Academic United States

M2F: Automated Formalization of Mathematical Literature at Scale

arXiv:2602.17016v1 Announce Type: new Abstract: Automated formalization of mathematics enables mechanical verification but remains limited to isolated theorems and short snippets. Scaling to textbooks and research papers is largely unaddressed, as it requires managing cross-file dependencies, resolving imports, and ensuring...

1 min 2 months ago

tps

LOW Academic International

Sales Research Agent and Sales Research Bench

arXiv:2602.17017v1 Announce Type: new Abstract: Enterprises increasingly need AI systems that can answer sales-leader questions over live, customized CRM data, but most available models do not expose transparent, repeatable evidence of quality. This paper describes the Sales Research Agent in...

1 min 2 months ago

ead

LOW Academic European Union

IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents

arXiv:2602.17049v1 Announce Type: new Abstract: Computer-use agents operate over long horizons under noisy perception, multi-window contexts, evolving environment states. Existing approaches, from RL-based planners to trajectory retrieval, often drift from user intent and repeatedly solve routine subproblems, leading to error...

1 min 2 months ago

ead

LOW Academic International

RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

arXiv:2602.17053v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) exhibit strong performance, yet often produce rationales that sound plausible but fail to reflect their true decision process, undermining reliability and trust. We introduce a formal framework for reasoning faithfulness, defined...

1 min 2 months ago

tps

LOW Academic International

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

arXiv:2602.17062v1 Announce Type: new Abstract: Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during training, often...

1 min 2 months ago

tps

LOW Academic International

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

arXiv:2602.17066v1 Announce Type: new Abstract: We introduce Predictive Batch Scheduling (PBS), a novel training optimization technique that accelerates language model convergence by dynamically prioritizing high-loss samples during batch construction. Unlike curriculum learning approaches that require predefined difficulty metrics or hard...

1 min 2 months ago

ead

LOW Academic United States

Toward Trustworthy Evaluation of Sustainability Rating Methodologies: A Human-AI Collaborative Framework for Benchmark Dataset Construction

arXiv:2602.17106v1 Announce Type: new Abstract: Sustainability or ESG rating agencies use company disclosures and external data to produce scores or ratings that assess the environmental, social, and governance performance of a company. However, sustainability ratings across agencies for a single...

1 min 2 months ago

adjustment

LOW Academic United States

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting

arXiv:2602.17234v1 Announce Type: new Abstract: To evaluate whether LLMs can accurately predict future events, we need the ability to \textit{backtest} them on events that have already resolved. This requires models to reason only with information available at a specified past...

1 min 2 months ago

ead

LOW Academic International

Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

arXiv:2602.16811v1 Announce Type: new Abstract: Recent advancements in Natural Language Processing and Deep Learning have enabled the development of Large Language Models (LLMs), which have significantly advanced the state-of-the-art across a wide range of tasks, including Question Answering (QA). Despite...

1 min 2 months ago

ead

LOW Academic International

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

arXiv:2602.16852v1 Announce Type: new Abstract: Meenzerisch, the dialect spoken in the German city of Mainz, is also the traditional language of the Mainz carnival, a yearly celebration well known throughout Germany. However, Meenzerisch is on the verge of dying out-a...

1 min 2 months ago

ead

LOW Academic International

ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders

arXiv:2602.16938v1 Announce Type: new Abstract: The promise of LLM-based user simulators to improve conversational AI is hindered by a critical "realism gap," leading to systems that are optimized for simulated interactions, but may fail to perform well in the real...

1 min 2 months ago

ead

LOW Academic International

Eigenmood Space: Uncertainty-Aware Spectral Graph Analysis of Psychological Patterns in Classical Persian Poetry

arXiv:2602.16959v1 Announce Type: new Abstract: Classical Persian poetry is a historically sustained archive in which affective life is expressed through metaphor, intertextual convention, and rhetorical indirection. These properties make close reading indispensable while limiting reproducible comparison at scale. We present...

1 min 2 months ago

ead

LOW Academic International

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

arXiv:2602.17003v1 Announce Type: new Abstract: Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous queries by inferring user...

1 min 2 months ago

tps

LOW Academic International

The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI

arXiv:2602.17127v1 Announce Type: new Abstract: As Large Language Models (LLMs) transition from standalone chat interfaces to foundational reasoning layers in multi-agent systems and recursive evaluation loops (LLM-as-a-judge), the detection of durable, provider-level behavioral signatures becomes a critical requirement for safety...

1 min 2 months ago

ead

LOW Academic International

What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform

arXiv:2602.17194v1 Announce Type: new Abstract: Text-based telemedicine has become a common mode of care, requiring clinicians to deliver medical advice clearly and effectively in writing. As platforms increasingly rely on patient ratings and feedback, clinicians face growing pressure to maintain...

1 min 2 months ago

ead

LOW Academic International

Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study

arXiv:2602.17262v1 Announce Type: new Abstract: Human self-report questionnaires are increasingly used in NLP to benchmark and audit large language models (LLMs), from persona consistency to safety and bias assessments. Yet these instruments presume honest responding; in evaluative contexts, LLMs can...

1 min 2 months ago

ead

LOW Academic International

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

arXiv:2602.17283v1 Announce Type: new Abstract: While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To...

1 min 2 months ago

tps

LOW Academic European Union

Representation Collapse in Machine Translation Through the Lens of Angular Dispersion

arXiv:2602.17287v1 Announce Type: new Abstract: Modern neural translation models based on the Transformer architecture are known for their high performance, particularly when trained on high-resource datasets. A standard next-token prediction training strategy, while widely adopted in practice, may lead to...

1 min 2 months ago

ead

LOW Academic International

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

arXiv:2602.17316v1 Announce Type: new Abstract: The rapid advancement of Large Language Models (LLMs) has established standardized evaluation benchmarks as the primary instrument for model comparison. Yet, their reliability is increasingly questioned due to sensitivity to shallow variations in input prompts....

1 min 2 months ago

ead

LOW Academic European Union

The Role of the Availability Heuristic in Multiple-Choice Answering Behaviour

arXiv:2602.17377v1 Announce Type: new Abstract: When students are unsure of the correct answer to a multiple-choice question (MCQ), guessing is common practice. The availability heuristic, proposed by A. Tversky and D. Kahneman in 1973, suggests that the ease with which...

1 min 2 months ago

ead

LOW Academic United States

Bridging the Domain Divide: Supervised vs. Zero-Shot Clinical Section Segmentation from MIMIC-III to Obstetrics

arXiv:2602.17513v1 Announce Type: new Abstract: Clinical free-text notes contain vital patient information. They are structured into labelled sections; recognizing these sections has been shown to support clinical decision-making and downstream NLP tasks. In this paper, we advance clinical section segmentation...

1 min 2 months ago

ead

LOW Academic International

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

arXiv:2602.17542v1 Announce Type: new Abstract: Fine-grained skill representations, commonly referred to as knowledge components (KCs), are fundamental to many approaches in student modeling and learning analytics. However, KC-level correctness labels are rarely available in real-world datasets, especially for open-ended programming...

1 min 2 months ago

ead

LOW Academic International

Modeling Distinct Human Interaction in Web Agents

arXiv:2602.17588v1 Announce Type: new Abstract: Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene,...

1 min 2 months ago

ead

LOW Academic European Union

Sink-Aware Pruning for Diffusion Language Models

arXiv:2602.17664v1 Announce Type: new Abstract: Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typically preserve attention sink tokens because AR sinks serve as stable...

1 min 2 months ago

tps

LOW Academic United States

Omitted Variable Bias in Language Models Under Distribution Shift

arXiv:2602.16784v1 Announce Type: cross Abstract: Despite their impressive performance on a wide variety of tasks, modern language models remain susceptible to distribution shifts, exhibiting brittle behavior when evaluated on data that differs in distribution from their training data. In this...

1 min 2 months ago

adjustment

LOW Academic International

Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency

arXiv:2602.16787v1 Announce Type: cross Abstract: Despite their strong performance on reasoning benchmarks, large language models (LLMs) have proven brittle when presented with counterfactual questions, suggesting weaknesses in their causal reasoning ability. While recent work has demonstrated that labeled counterfactual tasks...

1 min 2 months ago

ead

LOW Academic International

Hybrid-Gym: Training Coding Agents to Generalize Across Tasks

arXiv:2602.16819v1 Announce Type: cross Abstract: When assessing the quality of coding agents, predominant benchmarks focus on solving single issues on GitHub, such as SWE-Bench. In contrast, in real use, these agents solve more various and complex tasks that involve other...

1 min 2 months ago

tps

LOW Academic International

Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

arXiv:2602.16740v1 Announce Type: new Abstract: In mechanistic interpretability, recent work scrutinizes transformer "circuits" - sparse, mono or multi layer sub computations, that may reflect human understandable functions. Yet, these network circuits are rarely acid-tested for their stability across different instances...

1 min 2 months ago

ead

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signing

M2F: Automated Formalization of Mathematical Literature at Scale

Sales Research Agent and Sales Research Bench

IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents

RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

Toward Trustworthy Evaluation of Sustainability Rating Methodologies: A Human-AI Collaborative Framework for Benchmark Dataset Construction

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting

Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders

Eigenmood Space: Uncertainty-Aware Spectral Graph Analysis of Psychological Patterns in Classical Persian Poetry

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI

What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform

Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

Representation Collapse in Machine Translation Through the Lens of Angular Dispersion

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

The Role of the Availability Heuristic in Multiple-Choice Answering Behaviour

Bridging the Domain Divide: Supervised vs. Zero-Shot Clinical Section Segmentation from MIMIC-III to Obstetrics

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

Modeling Distinct Human Interaction in Web Agents

Sink-Aware Pruning for Diffusion Language Models

Omitted Variable Bias in Language Models Under Distribution Shift

Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency

Hybrid-Gym: Training Coding Agents to Generalize Across Tasks

Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.