International Law

LOW Academic International

LLM-WikiRace: Benchmarking Long-term Planning and Reasoning over Real-World Knowledge Graphs

arXiv:2602.16902v1 Announce Type: new Abstract: We introduce LLM-Wikirace, a benchmark for evaluating planning, reasoning, and world knowledge in large language models (LLMs). In LLM-Wikirace, models must efficiently navigate Wikipedia hyperlinks step by step to reach a target page from a...

1 min 2 months ago

ear

LOW Academic International

Narrow fine-tuning erodes safety alignment in vision-language agents

arXiv:2602.16931v1 Announce Type: new Abstract: Lifelong multimodal agents must continuously adapt to new tasks through post-training, but this creates fundamental tension between acquiring capabilities and preserving safety alignment. We demonstrate that fine-tuning aligned vision-language models on narrow-domain harmful datasets induces...

1 min 2 months ago

ear

LOW Academic International

SourceBench: Can AI Answers Reference Quality Web Sources?

arXiv:2602.16942v1 Announce Type: new Abstract: Large language models (LLMs) increasingly answer queries by citing web sources, but existing evaluations emphasize answer correctness rather than evidence quality. We introduce SourceBench, a benchmark for measuring the quality of cited web sources across...

1 min 2 months ago

ear

LOW Academic International

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

arXiv:2602.16953v1 Announce Type: new Abstract: Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical. High-coverage hardware verification exemplifies this challenge due...

1 min 2 months ago

ear

LOW Academic United States

Automating Agent Hijacking via Structural Template Injection

arXiv:2602.16958v1 Announce Type: new Abstract: Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted,...

1 min 2 months ago

ear

LOW Academic United States

HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signing

arXiv:2602.16976v1 Announce Type: new Abstract: Here's the corrected paragraph with all punctuation and formatting issues fixed: Financial risk systems usually follow a two-step routine: a model predicts return or risk, and then an optimizer makes a decision such as a...

1 min 2 months ago

ear

LOW Academic European Union

Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databases

arXiv:2602.17001v1 Announce Type: new Abstract: Natural Language Querying for Time Series Databases (NLQ4TSDB) aims to assist non-expert users retrieve meaningful events, intervals, and summaries from massive temporal records. However, existing Text-to-SQL methods are not designed for continuous morphological intents such...

1 min 2 months ago

ear

LOW Academic United States

M2F: Automated Formalization of Mathematical Literature at Scale

arXiv:2602.17016v1 Announce Type: new Abstract: Automated formalization of mathematics enables mechanical verification but remains limited to isolated theorems and short snippets. Scaling to textbooks and research papers is largely unaddressed, as it requires managing cross-file dependencies, resolving imports, and ensuring...

1 min 2 months ago

ear

LOW Academic International

Sales Research Agent and Sales Research Bench

arXiv:2602.17017v1 Announce Type: new Abstract: Enterprises increasingly need AI systems that can answer sales-leader questions over live, customized CRM data, but most available models do not expose transparent, repeatable evidence of quality. This paper describes the Sales Research Agent in...

1 min 2 months ago

ear

LOW Academic European Union

IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents

arXiv:2602.17049v1 Announce Type: new Abstract: Computer-use agents operate over long horizons under noisy perception, multi-window contexts, evolving environment states. Existing approaches, from RL-based planners to trajectory retrieval, often drift from user intent and repeatedly solve routine subproblems, leading to error...

1 min 2 months ago

ear

LOW Academic International

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

arXiv:2602.17062v1 Announce Type: new Abstract: Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during training, often...

1 min 2 months ago

ear

LOW Academic International

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

arXiv:2602.17066v1 Announce Type: new Abstract: We introduce Predictive Batch Scheduling (PBS), a novel training optimization technique that accelerates language model convergence by dynamically prioritizing high-loss samples during batch construction. Unlike curriculum learning approaches that require predefined difficulty metrics or hard...

1 min 2 months ago

ear

LOW Academic International

Instructor-Aligned Knowledge Graphs for Personalized Learning

arXiv:2602.17111v1 Announce Type: new Abstract: Mastering educational concepts requires understanding both their prerequisites (e.g., recursion before merge sort) and sub-concepts (e.g., merge sort as part of sorting algorithms). Capturing these dependencies is critical for identifying students' knowledge gaps and enabling...

1 min 2 months ago

ear

LOW Academic European Union

Epistemology of Generative AI: The Geometry of Knowing

arXiv:2602.17116v1 Announce Type: new Abstract: Generative AI presents an unprecedented challenge to our understanding of knowledge and its production. Unlike previous technological transformations, where engineering understanding preceded or accompanied deployment, generative AI operates through mechanisms whose epistemic character remains obscure,...

1 min 2 months ago

ear

LOW Academic United States

Continual learning and refinement of causal models through dynamic predicate invention

arXiv:2602.17217v1 Announce Type: new Abstract: Efficiently navigating complex environments requires agents to internalize the underlying logic of their world, yet standard world modelling methods often struggle with sample inefficiency, lack of transparency, and poor scalability. We propose a framework for...

1 min 2 months ago

ear

LOW Academic United States

From Labor to Collaboration: A Methodological Experiment Using AI Agents to Augment Research Perspectives in Taiwan's Humanities and Social Sciences

arXiv:2602.17221v1 Announce Type: new Abstract: Generative AI is reshaping knowledge work, yet existing research focuses predominantly on software engineering and the natural sciences, with limited methodological exploration for the humanities and social sciences. Positioned as a "methodological experiment," this study...

1 min 2 months ago

ear

LOW Academic United States

Decoding the Human Factor: High Fidelity Behavioral Prediction for Strategic Foresight

arXiv:2602.17222v1 Announce Type: new Abstract: Predicting human decision-making in high-stakes environments remains a central challenge for artificial intelligence. While large language models (LLMs) demonstrate strong general reasoning, they often struggle to generate consistent, individual-specific behavior, particularly when accurate prediction depends...

1 min 2 months ago

ear

LOW Academic United States

Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy

arXiv:2602.17229v1 Announce Type: new Abstract: The black-box nature of Large Language Models necessitates novel evaluation frameworks that transcend surface-level performance metrics. This study investigates the internal neural representations of cognitive complexity using Bloom's Taxonomy as a hierarchical lens. By analyzing...

1 min 2 months ago

ear

LOW Academic International

References Improve LLM Alignment in Non-Verifiable Domains

arXiv:2602.16802v1 Announce Type: new Abstract: While Reinforcement Learning with Verifiable Rewards (RLVR) has shown strong effectiveness in reasoning tasks, it cannot be directly applied to non-verifiable domains lacking ground-truth verifiers, such as LLM alignment. In this work, we investigate whether...

1 min 2 months ago

ear

LOW Academic International

Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

arXiv:2602.16811v1 Announce Type: new Abstract: Recent advancements in Natural Language Processing and Deep Learning have enabled the development of Large Language Models (LLMs), which have significantly advanced the state-of-the-art across a wide range of tasks, including Question Answering (QA). Despite...

1 min 2 months ago

ear

LOW Academic International

Claim Automation using Large Language Model

arXiv:2602.16836v1 Announce Type: new Abstract: While Large Language Models (LLMs) have achieved strong performance on general-purpose language tasks, their deployment in regulated and data-sensitive domains, including insurance, remains limited. Leveraging millions of historical warranty claims, we propose a locally deployed...

1 min 2 months ago

ear

LOW Academic International

BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization

arXiv:2602.16843v1 Announce Type: new Abstract: Evaluating factual consistency is essential for reliable text summarization, particularly in high-stakes domains such as healthcare and news. However, most existing evaluation metrics overlook Bangla, a widely spoken yet under-resourced language, and often depend on...

1 min 2 months ago

ear

LOW Academic International

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

arXiv:2602.16852v1 Announce Type: new Abstract: Meenzerisch, the dialect spoken in the German city of Mainz, is also the traditional language of the Mainz carnival, a yearly celebration well known throughout Germany. However, Meenzerisch is on the verge of dying out-a...

1 min 2 months ago

ear

LOW Academic South Korea

Evaluating Cross-Lingual Classification Approaches Enabling Topic Discovery for Multilingual Social Media Data

arXiv:2602.17051v1 Announce Type: new Abstract: Analysing multilingual social media discourse remains a major challenge in natural language processing, particularly when large-scale public debates span across diverse languages. This study investigates how different approaches for cross-lingual text classification can support reliable...

1 min 2 months ago

ear

LOW Academic United States

BankMathBench: A Benchmark for Numerical Reasoning in Banking Scenarios

arXiv:2602.17072v1 Announce Type: new Abstract: Large language models (LLMs)-based chatbots are increasingly being adopted in the financial domain, particularly in digital banking, to handle customer inquiries about products such as deposits, savings, and loans. However, these models still exhibit low...

1 min 2 months ago

ear

LOW Academic International

Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests

arXiv:2602.17108v1 Announce Type: new Abstract: Thematic Apperception Test (TAT) is a psychometrically grounded, multidimensional assessment framework that systematically differentiates between cognitive-representational and affective-relational components of personality-like functioning. This test is a projective psychological framework designed to uncover unconscious aspects of...

1 min 2 months ago

ear

LOW Academic International

What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform

arXiv:2602.17194v1 Announce Type: new Abstract: Text-based telemedicine has become a common mode of care, requiring clinicians to deliver medical advice clearly and effectively in writing. As platforms increasingly rely on patient ratings and feedback, clinicians face growing pressure to maintain...

1 min 2 months ago

ear

LOW Academic International

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

arXiv:2602.17283v1 Announce Type: new Abstract: While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To...

1 min 2 months ago

human rights

LOW Academic International

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

arXiv:2602.17316v1 Announce Type: new Abstract: The rapid advancement of Large Language Models (LLMs) has established standardized evaluation benchmarks as the primary instrument for model comparison. Yet, their reliability is increasingly questioned due to sensitivity to shallow variations in input prompts....

1 min 2 months, 1 week ago

ear

LOW Academic International

RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question Answering

arXiv:2602.17366v1 Announce Type: new Abstract: Long-tail question answering presents significant challenges for large language models (LLMs) due to their limited ability to acquire and accurately recall less common knowledge. Retrieval-augmented generation (RAG) systems have shown great promise in mitigating this...

1 min 2 months, 1 week ago

ear

LLM-WikiRace: Benchmarking Long-term Planning and Reasoning over Real-World Knowledge Graphs

Narrow fine-tuning erodes safety alignment in vision-language agents

SourceBench: Can AI Answers Reference Quality Web Sources?

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

Automating Agent Hijacking via Structural Template Injection

HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signing

Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databases

M2F: Automated Formalization of Mathematical Literature at Scale

Sales Research Agent and Sales Research Bench

IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization

Instructor-Aligned Knowledge Graphs for Personalized Learning

Epistemology of Generative AI: The Geometry of Knowing

Continual learning and refinement of causal models through dynamic predicate invention

From Labor to Collaboration: A Methodological Experiment Using AI Agents to Augment Research Perspectives in Taiwan's Humanities and Social Sciences

Decoding the Human Factor: High Fidelity Behavioral Prediction for Strategic Foresight

Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy

References Improve LLM Alignment in Non-Verifiable Domains

Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

Claim Automation using Large Language Model

BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization

Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect

Evaluating Cross-Lingual Classification Approaches Enabling Topic Discovery for Multilingual Social Media Data

BankMathBench: A Benchmark for Numerical Reasoning in Banking Scenarios

Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests

What Makes a Good Doctor Response? An Analysis on a Romanian Telemedicine Platform

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question Answering

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.