International Law

LOW Academic International

Claim Automation using Large Language Model

arXiv:2602.16836v1 Announce Type: new Abstract: While Large Language Models (LLMs) have achieved strong performance on general-purpose language tasks, their deployment in regulated and data-sensitive domains, including insurance, remains limited. Leveraging millions of historical warranty claims, we propose a locally deployed...

1 min 2 months, 1 week ago

ear

LOW Academic International

BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization

arXiv:2602.16843v1 Announce Type: new Abstract: Evaluating factual consistency is essential for reliable text summarization, particularly in high-stakes domains such as healthcare and news. However, most existing evaluation metrics overlook Bangla, a widely spoken yet under-resourced language, and often depend on...

1 min 2 months, 1 week ago

ear

LOW Academic International

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

arXiv:2602.16852v1 Announce Type: new Abstract: Meenzerisch, the dialect spoken in the German city of Mainz, is also the traditional language of the Mainz carnival, a yearly celebration well known throughout Germany. However, Meenzerisch is on the verge of dying out-a...

1 min 2 months, 1 week ago

ear

LOW Academic South Korea

Evaluating Cross-Lingual Classification Approaches Enabling Topic Discovery for Multilingual Social Media Data

arXiv:2602.17051v1 Announce Type: new Abstract: Analysing multilingual social media discourse remains a major challenge in natural language processing, particularly when large-scale public debates span across diverse languages. This study investigates how different approaches for cross-lingual text classification can support reliable...

1 min 2 months, 1 week ago

ear

LOW Academic United States

BankMathBench: A Benchmark for Numerical Reasoning in Banking Scenarios

arXiv:2602.17072v1 Announce Type: new Abstract: Large language models (LLMs)-based chatbots are increasingly being adopted in the financial domain, particularly in digital banking, to handle customer inquiries about products such as deposits, savings, and loans. However, these models still exhibit low...

1 min 2 months, 1 week ago

ear

LOW Academic International

Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests

arXiv:2602.17108v1 Announce Type: new Abstract: Thematic Apperception Test (TAT) is a psychometrically grounded, multidimensional assessment framework that systematically differentiates between cognitive-representational and affective-relational components of personality-like functioning. This test is a projective psychological framework designed to uncover unconscious aspects of...

1 min 2 months, 1 week ago

ear

LOW Academic International

What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform

arXiv:2602.17194v1 Announce Type: new Abstract: Text-based telemedicine has become a common mode of care, requiring clinicians to deliver medical advice clearly and effectively in writing. As platforms increasingly rely on patient ratings and feedback, clinicians face growing pressure to maintain...

1 min 2 months, 1 week ago

ear

LOW Academic International

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

arXiv:2602.17283v1 Announce Type: new Abstract: While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To...

1 min 2 months, 1 week ago

human rights

LOW Academic International

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

arXiv:2602.17316v1 Announce Type: new Abstract: The rapid advancement of Large Language Models (LLMs) has established standardized evaluation benchmarks as the primary instrument for model comparison. Yet, their reliability is increasingly questioned due to sensitivity to shallow variations in input prompts....

1 min 2 months, 1 week ago

ear

LOW Academic International

RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question Answering

arXiv:2602.17366v1 Announce Type: new Abstract: Long-tail question answering presents significant challenges for large language models (LLMs) due to their limited ability to acquire and accurately recall less common knowledge. Retrieval-augmented generation (RAG) systems have shown great promise in mitigating this...

1 min 2 months, 1 week ago

ear

LOW Academic International

Diverse Word Choices, Same Reference: Annotating Lexically-Rich Cross-Document Coreference

arXiv:2602.17424v1 Announce Type: new Abstract: Cross-document coreference resolution (CDCR) identifies and links mentions of the same entities and events across related documents, enabling content analysis that aggregates information at the level of discourse participants. However, existing datasets primarily focus on...

1 min 2 months, 1 week ago

ear

LOW Academic International

AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Multi-Turn Dialogue

arXiv:2602.17443v1 Announce Type: new Abstract: Evaluating the strategic reasoning capabilities of Large Language Models (LLMs) requires moving beyond static benchmarks to dynamic, multi-turn interactions. We introduce AIDG (Adversarial Information Deduction Game), a game-theoretic framework that probes the asymmetry between information...

1 min 2 months, 1 week ago

ear

LOW Academic International

Auditing Reciprocal Sentiment Alignment: Inversion Risk, Dialect Representation and Intent Misalignment in Transformers

arXiv:2602.17469v1 Announce Type: new Abstract: The core theme of bidirectional alignment is ensuring that AI systems accurately understand human intent and that humans can trust AI behavior. However, this loop fractures significantly across language barriers. Our research addresses Cross-Lingual Sentiment...

1 min 2 months, 1 week ago

ear

LOW Academic International

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

arXiv:2602.17542v1 Announce Type: new Abstract: Fine-grained skill representations, commonly referred to as knowledge components (KCs), are fundamental to many approaches in student modeling and learning analytics. However, KC-level correctness labels are rarely available in real-world datasets, especially for open-ended programming...

1 min 2 months, 1 week ago

ear

LOW Academic International

Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning

arXiv:2602.17546v1 Announce Type: new Abstract: Instruction-following language models are trained to be helpful and safe, yet their safety behavior can deteriorate under benign fine-tuning and worsen under adversarial updates. Existing defenses often offer limited protection or force a trade-off between...

1 min 2 months, 1 week ago

ear

LOW Academic International

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?

arXiv:2602.17598v1 Announce Type: new Abstract: Current speech LLMs largely perform implicit ASR: on tasks solvable from a transcript, they are behaviorally and mechanistically equivalent to simple Whisper$\to$LLM cascades. We show this through matched-backbone testing across four speech LLMs and six...

1 min 2 months, 1 week ago

ear

LOW Academic International

Unmasking the Factual-Conceptual Gap in Persian Language Models

arXiv:2602.17623v1 Announce Type: new Abstract: While emerging Persian NLP benchmarks have expanded into pragmatics and politeness, they rarely distinguish between memorized cultural facts and the ability to reason about implicit social norms. We introduce DivanBench, a diagnostic benchmark focused on...

1 min 2 months, 1 week ago

ear

LOW Academic International

Differences in Typological Alignment in Language Models' Treatment of Differential Argument Marking

arXiv:2602.17653v1 Announce Type: new Abstract: Recent work has shown that language models (LMs) trained on synthetic corpora can exhibit typological preferences that resemble cross-linguistic regularities in human languages, particularly for syntactic phenomena such as word order. In this paper, we...

1 min 2 months, 1 week ago

ear

LOW Academic International

What Language is This? Ask Your Tokenizer

arXiv:2602.17655v1 Announce Type: new Abstract: Language Identification (LID) is an important component of many multilingual natural language processing pipelines, where it facilitates corpus curation, training data analysis, and cross-lingual evaluation of large language models. Despite near-perfect performance on high-resource languages,...

1 min 2 months, 1 week ago

ear

LOW Academic International

Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency

arXiv:2602.16787v1 Announce Type: cross Abstract: Despite their strong performance on reasoning benchmarks, large language models (LLMs) have proven brittle when presented with counterfactual questions, suggesting weaknesses in their causal reasoning ability. While recent work has demonstrated that labeled counterfactual tasks...

1 min 2 months, 1 week ago

ear

LOW Academic International

Hybrid-Gym: Training Coding Agents to Generalize Across Tasks

arXiv:2602.16819v1 Announce Type: cross Abstract: When assessing the quality of coding agents, predominant benchmarks focus on solving single issues on GitHub, such as SWE-Bench. In contrast, in real use, these agents solve more various and complex tasks that involve other...

1 min 2 months, 1 week ago

ear

LOW Academic United States

MMCAformer: Macro-Micro Cross-Attention Transformer for Traffic Speed Prediction with Microscopic Connected Vehicle Driving Behavior

arXiv:2602.16730v1 Announce Type: new Abstract: Accurate speed prediction is crucial for proactive traffic management to enhance traffic efficiency and safety. Existing studies have primarily relied on aggregated, macroscopic traffic flow data to predict future traffic trends, whereas road traffic dynamics...

1 min 2 months, 1 week ago

ear

LOW Academic United States

A Few-Shot LLM Framework for Extreme Day Classification in Electricity Markets

arXiv:2602.16735v1 Announce Type: new Abstract: This paper proposes a few-shot classification framework based on Large Language Models (LLMs) to predict whether the next day will have spikes in real-time electricity prices. The approach aggregates system state information, including electricity demand,...

1 min 2 months, 1 week ago

ear

LOW Academic United States

Real-time Secondary Crash Likelihood Prediction Excluding Post Primary Crash Features

arXiv:2602.16739v1 Announce Type: new Abstract: Secondary crash likelihood prediction is a critical component of an active traffic management system to mitigate congestion and adverse impacts caused by secondary crashes. However, existing approaches mainly rely on post-crash features (e.g., crash type...

1 min 2 months, 1 week ago

ear

LOW Academic International

Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

arXiv:2602.16740v1 Announce Type: new Abstract: In mechanistic interpretability, recent work scrutinizes transformer "circuits" - sparse, mono or multi layer sub computations, that may reflect human understandable functions. Yet, these network circuits are rarely acid-tested for their stability across different instances...

1 min 2 months, 1 week ago

ear

LOW Academic International

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

arXiv:2602.16742v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has been shown effective in enhancing the visual reflection and reasoning capabilities of Large Multimodal Models (LMMs). However, existing datasets are predominantly derived from either small-scale manual construction or...

1 min 2 months, 1 week ago

ear

LOW Academic International

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

arXiv:2602.16746v1 Announce Type: new Abstract: Grokking -- the delayed transition from memorization to generalization in small algorithmic tasks -- remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight...

1 min 2 months, 1 week ago

ear

LOW Academic International

Attending to Routers Aids Indoor Wireless Localization

arXiv:2602.16762v1 Announce Type: new Abstract: Modern machine learning-based wireless localization using Wi-Fi signals continues to face significant challenges in achieving groundbreaking performance across diverse environments. A major limitation is that most existing algorithms do not appropriately weight the information from...

1 min 2 months, 1 week ago

ear

LOW Academic European Union

Machine Learning Argument of Latitude Error Model for LEO Satellite Orbit and Covariance Correction

arXiv:2602.16764v1 Announce Type: new Abstract: Low Earth orbit (LEO) satellites are leveraged to support new position, navigation, and timing (PNT) service alternatives to GNSS. These alternatives require accurate propagation of satellite position and velocity with a realistic quantification of uncertainty....

1 min 2 months, 1 week ago

ear

LOW Academic International

Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models

arXiv:2602.16793v1 Announce Type: new Abstract: In the past year, custom and unreleased math reasoning models reached gold medal performance on the International Mathematical Olympiad (IMO). Similar performance was then reported using large-scale inference on publicly available models but at prohibitive...

1 min 2 months, 1 week ago

ear

Claim Automation using Large Language Model

BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

Evaluating Cross-Lingual Classification Approaches Enabling Topic Discovery for Multilingual Social Media Data

BankMathBench: A Benchmark for Numerical Reasoning in Banking Scenarios

Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests

What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question Answering

Diverse Word Choices, Same Reference: Annotating Lexically-Rich Cross-Document Coreference

AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Multi-Turn Dialogue

Auditing Reciprocal Sentiment Alignment: Inversion Risk, Dialect Representation and Intent Misalignment in Transformers

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?

Unmasking the Factual-Conceptual Gap in Persian Language Models

Differences in Typological Alignment in Language Models' Treatment of Differential Argument Marking

What Language is This? Ask Your Tokenizer

Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency

Hybrid-Gym: Training Coding Agents to Generalize Across Tasks

MMCAformer: Macro-Micro Cross-Attention Transformer for Traffic Speed Prediction with Microscopic Connected Vehicle Driving Behavior

A Few-Shot LLM Framework for Extreme Day Classification in Electricity Markets

Real-time Secondary Crash Likelihood Prediction Excluding Post Primary Crash Features

Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

Attending to Routers Aids Indoor Wireless Localization

Machine Learning Argument of Latitude Error Model for LEO Satellite Orbit and Covariance Correction

Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.