Immigration Law

LOW Academic United States

Genetic Generalized Additive Models

arXiv:2602.15877v1 Announce Type: cross Abstract: Generalized Additive Models (GAMs) balance predictive accuracy and interpretability, but manually configuring their structure is challenging. We propose using the multi-objective genetic algorithm NSGA-II to automatically optimize GAMs, jointly minimizing prediction error (RMSE) and a...

1 min 2 months ago

tps

LOW Academic International

FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution

arXiv:2602.15882v1 Announce Type: cross Abstract: General vision-language models increasingly support unified spatiotemporal reasoning over long video streams, yet deploying such capabilities on robots remains constrained by the prohibitive latency of processing long-horizon histories and generating high-dimensional future predictions. To bridge...

1 min 2 months ago

ead

LOW Academic International

Understand Then Memory: A Cognitive Gist-Driven RAG Framework with Global Semantic Diffusion

arXiv:2602.15895v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) effectively mitigates hallucinations in LLMs by incorporating external knowledge. However, the inherent discrete representation of text in existing frameworks often results in a loss of semantic integrity, leading to retrieval deviations. Inspired...

1 min 2 months ago

ead

LOW Academic United States

Fairness, accountability and transparency: notes on algorithmic decision-making in criminal justice

AbstractOver the last few years, legal scholars, policy-makers, activists and others have generated a vast and rapidly expanding literature concerning the ethical ramifications of using artificial intelligence, machine learning, big data and predictive software in criminal justice contexts. These concerns...

1 min 2 months ago

ead

LOW Academic United States

Simple Baselines are Competitive with Code Evolution

arXiv:2602.16805v1 Announce Type: new Abstract: Code evolution is a family of techniques that rely on large language models to search through possible computer programs by evolving or mutating existing code. Many proposed code evolution pipelines show impressive performance but are...

1 min 2 months ago

ead

LOW Academic United States

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

arXiv:2602.16901v1 Announce Type: new Abstract: LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon attacks that exploit multi-turn user-agent-environment interactions to achieve objectives infeasible in single-turn settings. To measure...

1 min 2 months ago

tps

LOW Academic European Union

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

arXiv:2602.16935v1 Announce Type: new Abstract: While Large Language Model (LLM) capabilities have scaled, safety guardrails remain largely stateless, treating multi-turn dialogues as a series of disconnected events. This lack of temporal awareness facilitates a "Safety Gap" where adversarial tactics, like...

1 min 2 months ago

ead

LOW Academic European Union

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

arXiv:2602.16943v1 Announce Type: new Abstract: Large language models deployed as agents increasingly interact with external systems through tool calls--actions with real-world consequences that text outputs alone do not carry. Safety evaluations, however, overwhelmingly measure text-level refusal behavior, leaving a critical...

1 min 2 months ago

ead

LOW Academic United States

HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signing

arXiv:2602.16976v1 Announce Type: new Abstract: Here's the corrected paragraph with all punctuation and formatting issues fixed: Financial risk systems usually follow a two-step routine: a model predicts return or risk, and then an optimizer makes a decision such as a...

1 min 2 months ago

ead

LOW Academic United States

M2F: Automated Formalization of Mathematical Literature at Scale

arXiv:2602.17016v1 Announce Type: new Abstract: Automated formalization of mathematics enables mechanical verification but remains limited to isolated theorems and short snippets. Scaling to textbooks and research papers is largely unaddressed, as it requires managing cross-file dependencies, resolving imports, and ensuring...

1 min 2 months ago

tps

LOW Academic International

Sales Research Agent and Sales Research Bench

arXiv:2602.17017v1 Announce Type: new Abstract: Enterprises increasingly need AI systems that can answer sales-leader questions over live, customized CRM data, but most available models do not expose transparent, repeatable evidence of quality. This paper describes the Sales Research Agent in...

1 min 2 months ago

ead

LOW Academic European Union

IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents

arXiv:2602.17049v1 Announce Type: new Abstract: Computer-use agents operate over long horizons under noisy perception, multi-window contexts, evolving environment states. Existing approaches, from RL-based planners to trajectory retrieval, often drift from user intent and repeatedly solve routine subproblems, leading to error...

1 min 2 months ago

ead

LOW Academic International

RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

arXiv:2602.17053v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) exhibit strong performance, yet often produce rationales that sound plausible but fail to reflect their true decision process, undermining reliability and trust. We introduce a formal framework for reasoning faithfulness, defined...

1 min 2 months ago

tps

LOW Academic International

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

arXiv:2602.17062v1 Announce Type: new Abstract: Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during training, often...

1 min 2 months ago

tps

LOW Academic International

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

arXiv:2602.17066v1 Announce Type: new Abstract: We introduce Predictive Batch Scheduling (PBS), a novel training optimization technique that accelerates language model convergence by dynamically prioritizing high-loss samples during batch construction. Unlike curriculum learning approaches that require predefined difficulty metrics or hard...

1 min 2 months ago

ead

LOW Academic United States

Toward Trustworthy Evaluation of Sustainability Rating Methodologies: A Human-AI Collaborative Framework for Benchmark Dataset Construction

arXiv:2602.17106v1 Announce Type: new Abstract: Sustainability or ESG rating agencies use company disclosures and external data to produce scores or ratings that assess the environmental, social, and governance performance of a company. However, sustainability ratings across agencies for a single...

1 min 2 months ago

adjustment

LOW Academic United States

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting

arXiv:2602.17234v1 Announce Type: new Abstract: To evaluate whether LLMs can accurately predict future events, we need the ability to \textit{backtest} them on events that have already resolved. This requires models to reason only with information available at a specified past...

1 min 2 months ago

ead

LOW Academic International

Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

arXiv:2602.16811v1 Announce Type: new Abstract: Recent advancements in Natural Language Processing and Deep Learning have enabled the development of Large Language Models (LLMs), which have significantly advanced the state-of-the-art across a wide range of tasks, including Question Answering (QA). Despite...

1 min 2 months ago

ead

LOW Academic International

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

arXiv:2602.16852v1 Announce Type: new Abstract: Meenzerisch, the dialect spoken in the German city of Mainz, is also the traditional language of the Mainz carnival, a yearly celebration well known throughout Germany. However, Meenzerisch is on the verge of dying out-a...

1 min 2 months ago

ead

LOW Academic International

ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders

arXiv:2602.16938v1 Announce Type: new Abstract: The promise of LLM-based user simulators to improve conversational AI is hindered by a critical "realism gap," leading to systems that are optimized for simulated interactions, but may fail to perform well in the real...

1 min 2 months ago

ead

LOW Academic International

Eigenmood Space: Uncertainty-Aware Spectral Graph Analysis of Psychological Patterns in Classical Persian Poetry

arXiv:2602.16959v1 Announce Type: new Abstract: Classical Persian poetry is a historically sustained archive in which affective life is expressed through metaphor, intertextual convention, and rhetorical indirection. These properties make close reading indispensable while limiting reproducible comparison at scale. We present...

1 min 2 months ago

ead

LOW Academic International

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

arXiv:2602.17003v1 Announce Type: new Abstract: Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous queries by inferring user...

1 min 2 months ago

tps

LOW Academic International

The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI

arXiv:2602.17127v1 Announce Type: new Abstract: As Large Language Models (LLMs) transition from standalone chat interfaces to foundational reasoning layers in multi-agent systems and recursive evaluation loops (LLM-as-a-judge), the detection of durable, provider-level behavioral signatures becomes a critical requirement for safety...

1 min 2 months ago

ead

LOW Academic International

What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform

arXiv:2602.17194v1 Announce Type: new Abstract: Text-based telemedicine has become a common mode of care, requiring clinicians to deliver medical advice clearly and effectively in writing. As platforms increasingly rely on patient ratings and feedback, clinicians face growing pressure to maintain...

1 min 2 months ago

ead

LOW Academic International

Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study

arXiv:2602.17262v1 Announce Type: new Abstract: Human self-report questionnaires are increasingly used in NLP to benchmark and audit large language models (LLMs), from persona consistency to safety and bias assessments. Yet these instruments presume honest responding; in evaluative contexts, LLMs can...

1 min 2 months ago

ead

LOW Academic International

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

arXiv:2602.17283v1 Announce Type: new Abstract: While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To...

1 min 2 months ago

tps

LOW Academic European Union

Representation Collapse in Machine Translation Through the Lens of Angular Dispersion

arXiv:2602.17287v1 Announce Type: new Abstract: Modern neural translation models based on the Transformer architecture are known for their high performance, particularly when trained on high-resource datasets. A standard next-token prediction training strategy, while widely adopted in practice, may lead to...

1 min 2 months ago

ead

LOW Academic International

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

arXiv:2602.17316v1 Announce Type: new Abstract: The rapid advancement of Large Language Models (LLMs) has established standardized evaluation benchmarks as the primary instrument for model comparison. Yet, their reliability is increasingly questioned due to sensitivity to shallow variations in input prompts....

1 min 2 months ago

ead

LOW Academic European Union

The Role of the Availability Heuristic in Multiple-Choice Answering Behaviour

arXiv:2602.17377v1 Announce Type: new Abstract: When students are unsure of the correct answer to a multiple-choice question (MCQ), guessing is common practice. The availability heuristic, proposed by A. Tversky and D. Kahneman in 1973, suggests that the ease with which...

1 min 2 months ago

ead

LOW Academic United States

Bridging the Domain Divide: Supervised vs. Zero-Shot Clinical Section Segmentation from MIMIC-III to Obstetrics

arXiv:2602.17513v1 Announce Type: new Abstract: Clinical free-text notes contain vital patient information. They are structured into labelled sections; recognizing these sections has been shown to support clinical decision-making and downstream NLP tasks. In this paper, we advance clinical section segmentation...

1 min 2 months ago

ead

Genetic Generalized Additive Models

FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution

Understand Then Memory: A Cognitive Gist-Driven RAG Framework with Global Semantic Diffusion

Fairness, accountability and transparency: notes on algorithmic decision-making in criminal justice

Simple Baselines are Competitive with Code Evolution

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signing

M2F: Automated Formalization of Mathematical Literature at Scale

Sales Research Agent and Sales Research Bench

IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents

RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

Toward Trustworthy Evaluation of Sustainability Rating Methodologies: A Human-AI Collaborative Framework for Benchmark Dataset Construction

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting

Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders

Eigenmood Space: Uncertainty-Aware Spectral Graph Analysis of Psychological Patterns in Classical Persian Poetry

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI

What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform

Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

Representation Collapse in Machine Translation Through the Lens of Angular Dispersion

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

The Role of the Availability Heuristic in Multiple-Choice Answering Behaviour

Bridging the Domain Divide: Supervised vs. Zero-Shot Clinical Section Segmentation from MIMIC-III to Obstetrics

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.