International Law

LOW Academic International

Sales Research Agent and Sales Research Bench

arXiv:2602.17017v1 Announce Type: new Abstract: Enterprises increasingly need AI systems that can answer sales-leader questions over live, customized CRM data, but most available models do not expose transparent, repeatable evidence of quality. This paper describes the Sales Research Agent in...

1 min 2 months ago

ear

LOW Academic International

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

arXiv:2602.17062v1 Announce Type: new Abstract: Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during training, often...

1 min 2 months ago

ear

LOW Academic International

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

arXiv:2602.17066v1 Announce Type: new Abstract: We introduce Predictive Batch Scheduling (PBS), a novel training optimization technique that accelerates language model convergence by dynamically prioritizing high-loss samples during batch construction. Unlike curriculum learning approaches that require predefined difficulty metrics or hard...

1 min 2 months ago

ear

LOW Academic International

Instructor-Aligned Knowledge Graphs for Personalized Learning

arXiv:2602.17111v1 Announce Type: new Abstract: Mastering educational concepts requires understanding both their prerequisites (e.g., recursion before merge sort) and sub-concepts (e.g., merge sort as part of sorting algorithms). Capturing these dependencies is critical for identifying students' knowledge gaps and enabling...

1 min 2 months ago

ear

LOW Academic International

References Improve LLM Alignment in Non-Verifiable Domains

arXiv:2602.16802v1 Announce Type: new Abstract: While Reinforcement Learning with Verifiable Rewards (RLVR) has shown strong effectiveness in reasoning tasks, it cannot be directly applied to non-verifiable domains lacking ground-truth verifiers, such as LLM alignment. In this work, we investigate whether...

1 min 2 months ago

ear

LOW Academic International

Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

arXiv:2602.16811v1 Announce Type: new Abstract: Recent advancements in Natural Language Processing and Deep Learning have enabled the development of Large Language Models (LLMs), which have significantly advanced the state-of-the-art across a wide range of tasks, including Question Answering (QA). Despite...

1 min 2 months ago

ear

LOW Academic International

Claim Automation using Large Language Model

arXiv:2602.16836v1 Announce Type: new Abstract: While Large Language Models (LLMs) have achieved strong performance on general-purpose language tasks, their deployment in regulated and data-sensitive domains, including insurance, remains limited. Leveraging millions of historical warranty claims, we propose a locally deployed...

1 min 2 months ago

ear

LOW Academic International

BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization

arXiv:2602.16843v1 Announce Type: new Abstract: Evaluating factual consistency is essential for reliable text summarization, particularly in high-stakes domains such as healthcare and news. However, most existing evaluation metrics overlook Bangla, a widely spoken yet under-resourced language, and often depend on...

1 min 2 months ago

ear

LOW Academic International

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

arXiv:2602.16852v1 Announce Type: new Abstract: Meenzerisch, the dialect spoken in the German city of Mainz, is also the traditional language of the Mainz carnival, a yearly celebration well known throughout Germany. However, Meenzerisch is on the verge of dying out-a...

1 min 2 months ago

ear

LOW Academic International

Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests

arXiv:2602.17108v1 Announce Type: new Abstract: Thematic Apperception Test (TAT) is a psychometrically grounded, multidimensional assessment framework that systematically differentiates between cognitive-representational and affective-relational components of personality-like functioning. This test is a projective psychological framework designed to uncover unconscious aspects of...

1 min 2 months ago

ear

LOW Academic International

What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform

arXiv:2602.17194v1 Announce Type: new Abstract: Text-based telemedicine has become a common mode of care, requiring clinicians to deliver medical advice clearly and effectively in writing. As platforms increasingly rely on patient ratings and feedback, clinicians face growing pressure to maintain...

1 min 2 months ago

ear

LOW Academic International

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

arXiv:2602.17283v1 Announce Type: new Abstract: While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To...

1 min 2 months ago

human rights

LOW Academic International

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

arXiv:2602.17316v1 Announce Type: new Abstract: The rapid advancement of Large Language Models (LLMs) has established standardized evaluation benchmarks as the primary instrument for model comparison. Yet, their reliability is increasingly questioned due to sensitivity to shallow variations in input prompts....

1 min 2 months ago

ear

LOW Academic International

RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question Answering

arXiv:2602.17366v1 Announce Type: new Abstract: Long-tail question answering presents significant challenges for large language models (LLMs) due to their limited ability to acquire and accurately recall less common knowledge. Retrieval-augmented generation (RAG) systems have shown great promise in mitigating this...

1 min 2 months ago

ear

LOW Academic International

Diverse Word Choices, Same Reference: Annotating Lexically-Rich Cross-Document Coreference

arXiv:2602.17424v1 Announce Type: new Abstract: Cross-document coreference resolution (CDCR) identifies and links mentions of the same entities and events across related documents, enabling content analysis that aggregates information at the level of discourse participants. However, existing datasets primarily focus on...

1 min 2 months ago

ear

LOW Academic International

AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Multi-Turn Dialogue

arXiv:2602.17443v1 Announce Type: new Abstract: Evaluating the strategic reasoning capabilities of Large Language Models (LLMs) requires moving beyond static benchmarks to dynamic, multi-turn interactions. We introduce AIDG (Adversarial Information Deduction Game), a game-theoretic framework that probes the asymmetry between information...

1 min 2 months ago

ear

LOW Academic International

Auditing Reciprocal Sentiment Alignment: Inversion Risk, Dialect Representation and Intent Misalignment in Transformers

arXiv:2602.17469v1 Announce Type: new Abstract: The core theme of bidirectional alignment is ensuring that AI systems accurately understand human intent and that humans can trust AI behavior. However, this loop fractures significantly across language barriers. Our research addresses Cross-Lingual Sentiment...

1 min 2 months ago

ear

LOW Academic International

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

arXiv:2602.17542v1 Announce Type: new Abstract: Fine-grained skill representations, commonly referred to as knowledge components (KCs), are fundamental to many approaches in student modeling and learning analytics. However, KC-level correctness labels are rarely available in real-world datasets, especially for open-ended programming...

1 min 2 months ago

ear

LOW Academic International

Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning

arXiv:2602.17546v1 Announce Type: new Abstract: Instruction-following language models are trained to be helpful and safe, yet their safety behavior can deteriorate under benign fine-tuning and worsen under adversarial updates. Existing defenses often offer limited protection or force a trade-off between...

1 min 2 months ago

ear

LOW Academic International

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?

arXiv:2602.17598v1 Announce Type: new Abstract: Current speech LLMs largely perform implicit ASR: on tasks solvable from a transcript, they are behaviorally and mechanistically equivalent to simple Whisper$\to$LLM cascades. We show this through matched-backbone testing across four speech LLMs and six...

1 min 2 months ago

ear

LOW Academic International

Unmasking the Factual-Conceptual Gap in Persian Language Models

arXiv:2602.17623v1 Announce Type: new Abstract: While emerging Persian NLP benchmarks have expanded into pragmatics and politeness, they rarely distinguish between memorized cultural facts and the ability to reason about implicit social norms. We introduce DivanBench, a diagnostic benchmark focused on...

1 min 2 months ago

ear

LOW Academic International

Differences in Typological Alignment in Language Models' Treatment of Differential Argument Marking

arXiv:2602.17653v1 Announce Type: new Abstract: Recent work has shown that language models (LMs) trained on synthetic corpora can exhibit typological preferences that resemble cross-linguistic regularities in human languages, particularly for syntactic phenomena such as word order. In this paper, we...

1 min 2 months ago

ear

LOW Academic International

What Language is This? Ask Your Tokenizer

arXiv:2602.17655v1 Announce Type: new Abstract: Language Identification (LID) is an important component of many multilingual natural language processing pipelines, where it facilitates corpus curation, training data analysis, and cross-lingual evaluation of large language models. Despite near-perfect performance on high-resource languages,...

1 min 2 months ago

ear

LOW Academic International

Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency

arXiv:2602.16787v1 Announce Type: cross Abstract: Despite their strong performance on reasoning benchmarks, large language models (LLMs) have proven brittle when presented with counterfactual questions, suggesting weaknesses in their causal reasoning ability. While recent work has demonstrated that labeled counterfactual tasks...

1 min 2 months ago

ear

LOW Academic International

Hybrid-Gym: Training Coding Agents to Generalize Across Tasks

arXiv:2602.16819v1 Announce Type: cross Abstract: When assessing the quality of coding agents, predominant benchmarks focus on solving single issues on GitHub, such as SWE-Bench. In contrast, in real use, these agents solve more various and complex tasks that involve other...

1 min 2 months ago

ear

LOW Academic International

Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

arXiv:2602.16740v1 Announce Type: new Abstract: In mechanistic interpretability, recent work scrutinizes transformer "circuits" - sparse, mono or multi layer sub computations, that may reflect human understandable functions. Yet, these network circuits are rarely acid-tested for their stability across different instances...

1 min 2 months ago

ear

LOW Academic International

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

arXiv:2602.16742v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has been shown effective in enhancing the visual reflection and reasoning capabilities of Large Multimodal Models (LMMs). However, existing datasets are predominantly derived from either small-scale manual construction or...

1 min 2 months ago

ear

LOW Academic International

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

arXiv:2602.16746v1 Announce Type: new Abstract: Grokking -- the delayed transition from memorization to generalization in small algorithmic tasks -- remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight...

1 min 2 months ago

ear

LOW Academic International

Attending to Routers Aids Indoor Wireless Localization

arXiv:2602.16762v1 Announce Type: new Abstract: Modern machine learning-based wireless localization using Wi-Fi signals continues to face significant challenges in achieving groundbreaking performance across diverse environments. A major limitation is that most existing algorithms do not appropriately weight the information from...

1 min 2 months ago

ear

LOW Academic International

Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models

arXiv:2602.16793v1 Announce Type: new Abstract: In the past year, custom and unreleased math reasoning models reached gold medal performance on the International Mathematical Olympiad (IMO). Similar performance was then reported using large-scale inference on publicly available models but at prohibitive...

1 min 2 months ago

ear

Sales Research Agent and Sales Research Bench

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

Instructor-Aligned Knowledge Graphs for Personalized Learning

References Improve LLM Alignment in Non-Verifiable Domains

Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

Claim Automation using Large Language Model

BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests

What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question Answering

Diverse Word Choices, Same Reference: Annotating Lexically-Rich Cross-Document Coreference

AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Multi-Turn Dialogue

Auditing Reciprocal Sentiment Alignment: Inversion Risk, Dialect Representation and Intent Misalignment in Transformers

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?

Unmasking the Factual-Conceptual Gap in Persian Language Models

Differences in Typological Alignment in Language Models' Treatment of Differential Argument Marking

What Language is This? Ask Your Tokenizer

Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency

Hybrid-Gym: Training Coding Agents to Generalize Across Tasks

Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

Attending to Routers Aids Indoor Wireless Localization

Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.