Litigation

LOW Academic International

ConstraintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization

arXiv:2602.22465v1 Announce Type: new Abstract: Large language models are increasingly applied to operational decision-making where the underlying structure is constrained optimization. Existing benchmarks evaluate whether LLMs can formulate optimization problems as solver code, but leave open a complementary question. Can...

1 min 1 month, 2 weeks ago

standing

LOW Academic International

VeRO: An Evaluation Harness for Agents to Optimize Agents

arXiv:2602.22480v1 Announce Type: new Abstract: An important emerging application of coding agents is agent optimization: the iterative improvement of a target agent through edit-execute-evaluate cycles. Despite its relevance, the community lacks a systematic understanding of coding agent performance on this...

1 min 1 month, 2 weeks ago

standing

LOW Academic International

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

arXiv:2602.22638v1 Announce Type: new Abstract: Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is...

1 min 1 month, 2 weeks ago

standing

LOW Academic International

Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions

arXiv:2602.22680v1 Announce Type: new Abstract: Large language models have enabled agents that reason, plan, and interact with tools and environments to accomplish complex tasks. As these agents operate over extended interaction horizons, their effectiveness increasingly depends on adapting behavior to...

1 min 1 month, 2 weeks ago

standing

LOW Academic International

FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics

arXiv:2602.22822v1 Announce Type: new Abstract: The identification and property prediction of chemical molecules is of central importance in the advancement of drug discovery and material science, where the tandem mass spectrometry technology gives valuable fragmentation cues in the form of...

1 min 1 month, 2 weeks ago

discovery

LOW Academic International

FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning

arXiv:2602.22963v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have substantially advanced video misinformation detection through unified multimodal reasoning, but they often rely on fixed-depth inference and place excessive trust in internally generated assumptions, particularly in scenarios where critical...

1 min 1 month, 2 weeks ago

evidence

LOW Academic International

A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection

arXiv:2602.22449v1 Announce Type: new Abstract: Cyberbullying has become a serious and growing concern in todays virtual world. When left unnoticed, it can have adverse consequences for social and mental health. Researchers have explored various types of cyberbullying, but most approaches...

1 min 1 month, 2 weeks ago

standing

LOW Academic International

Bridging Latent Reasoning and Target-Language Generation via Retrieval-Transition Heads

arXiv:2602.22453v1 Announce Type: new Abstract: Recent work has identified a subset of attention heads in Transformer as retrieval heads, which are responsible for retrieving information from the context. In this work, we first investigate retrieval heads in multilingual contexts. In...

1 min 1 month, 2 weeks ago

standing

LOW Law Review International

Volume 110 – Issue 3 - Minnesota Law Review

1 min 1 month, 2 weeks ago

motion

LOW Academic International

TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models

arXiv:2602.22827v1 Announce Type: new Abstract: This paper presents a comprehensive evaluation framework for assessing the cultural competence of large language models (LLMs) in Persian. Existing Persian cultural benchmarks rely predominantly on multiple-choice formats and English-centric metrics that fail to capture...

1 min 1 month, 3 weeks ago

standing

LOW Academic International

CiteLLM: An Agentic Platform for Trustworthy Scientific Reference Discovery

arXiv:2602.23075v1 Announce Type: new Abstract: Large language models (LLMs) have created new opportunities to enhance the efficiency of scholarly activities; however, challenges persist in the ethical deployment of AI assistance, including (1) the trustworthiness of AI-generated content, (2) preservation of...

1 min 1 month, 3 weeks ago

discovery

LOW Academic International

Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs

arXiv:2602.23136v1 Announce Type: new Abstract: Multimodal LLMs can process speech and images, but they cannot hear a speaker's voice or see an object's texture. We show this is not a failure of encoding: speaker identity, emotion, and visual attributes survive...

1 min 1 month, 3 weeks ago

motion

LOW Academic International

A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations

arXiv:2602.23300v1 Announce Type: new Abstract: Emotion Recognition in Conversations (ERC) presents unique challenges, requiring models to capture the temporal flow of multi-turn dialogues and to effectively integrate cues from multiple modalities. We propose Mixture of Speech-Text Experts for Recognition of...

1 min 1 month, 3 weeks ago

motion

LOW Academic International

Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning

arXiv:2602.23351v1 Announce Type: new Abstract: The lack of reasoning capabilities in Vision-Language Models (VLMs) has remained at the forefront of research discourse. We posit that this behavior stems from a reporting bias in their training data. That is, how people...

1 min 1 month, 3 weeks ago

standing

LOW Academic International

Early Risk Stratification of Dosing Errors in Clinical Trials Using Machine Learning

arXiv:2602.22285v1 Announce Type: new Abstract: Objective: The objective of this study is to develop a machine learning (ML)-based framework for early risk stratification of clinical trials (CTs) according to their likelihood of exhibiting a high rate of dosing errors, using...

1 min 1 month, 3 weeks ago

trial

LOW Academic International

Manifold of Failure: Behavioral Attraction Basins in Language Models

arXiv:2602.22291v1 Announce Type: new Abstract: While prior work has focused on projecting adversarial examples back onto the manifold of natural data to restore safety, we argue that a comprehensive understanding of AI safety requires characterizing the unsafe regions themselves. This...

1 min 1 month, 3 weeks ago

standing

LOW Academic International

When Should a Model Change Its Mind? An Energy-Based Theory and Regularizer for Concept Drift in Electrocardiogram (ECG) Signals

arXiv:2602.22294v1 Announce Type: new Abstract: Models operating on dynamic physiologic signals must distinguish benign, label-preserving variability from true concept change. Existing concept-drift frameworks are largely distributional and provide no principled guidance on how much a model's internal representation may move...

1 min 1 month, 3 weeks ago

evidence

LOW Academic International

UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs

arXiv:2602.22296v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has improved the reasoning abilities of large language models (LLMs) on mathematics and programming tasks, but standard approaches that optimize single-attempt accuracy can inadvertently suppress response diversity across repeated...

1 min 1 month, 3 weeks ago

evidence

LOW Academic International

Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection

arXiv:2602.22297v1 Announce Type: new Abstract: Reinforcement learning (RL) offers significant promise for machinery fault detection (MFD). However, most existing RL-based MFD approaches do not fully exploit RL's sequential decision-making strengths, often treating MFD as a simple guessing game (Contextual Bandits)....

1 min 1 month, 3 weeks ago

trial

LOW Academic International

Predicting Tennis Serve directions with Machine Learning

arXiv:2602.22527v1 Announce Type: new Abstract: Serves, especially first serves, are very important in professional tennis. Servers choose their serve directions strategically to maximize their winning chances while trying to be unpredictable. On the other hand, returners try to predict serve...

1 min 1 month, 3 weeks ago

evidence

LOW Academic International

Coarse-to-Fine Learning of Dynamic Causal Structures

arXiv:2602.22532v1 Announce Type: new Abstract: Learning the dynamic causal structure of time series is a challenging problem. Most existing approaches rely on distributional or structural invariance to uncover underlying causal dynamics, assuming stationary or partially stationary causality. However, these assumptions...

1 min 1 month, 3 weeks ago

discovery

LOW News International

Netflix cedes Warner Bros. Discovery to Paramount: “No longer financially attractive”

Netflix shares jumped following the announcement.

1 min 1 month, 3 weeks ago

discovery

LOW Academic International

Disaster Question Answering with LoRA Efficiency and Accurate End Position

arXiv:2602.21212v1 Announce Type: new Abstract: Natural disasters such as earthquakes, torrential rainfall, floods, and volcanic eruptions occur with extremely low frequency and affect limited geographic areas. When individuals face disaster situations, they often experience confusion and lack the domain-specific knowledge...

1 min 1 month, 3 weeks ago

standing

LOW Academic International

TRACE: Trajectory-Aware Comprehensive Evaluation for Deep Research Agents

arXiv:2602.21230v1 Announce Type: new Abstract: The evaluation of Deep Research Agents is a critical challenge, as conventional outcome-based metrics fail to capture the nuances of their complex reasoning. Current evaluation faces two primary challenges: 1) a reliance on singular metrics...

1 min 1 month, 3 weeks ago

evidence

LOW Academic International

Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models

arXiv:2602.21262v1 Announce Type: new Abstract: With increasing integration of Large Language Models (LLMs) into areas of high-stakes human decision-making, it is important to understand the risks they introduce as advisors. To be useful advisors, LLMs must sift through large amounts...

1 min 1 month, 3 weeks ago

evidence

LOW Academic International

ToolMATH: A Math Tool Benchmark for Realistic Long-Horizon Multi-Tool Reasoning

arXiv:2602.21265v1 Announce Type: new Abstract: We introduce \ToolMATH, a math-grounded benchmark that evaluates tool-augmented language models in realistic multi-tool environments where the output depends on calling schema-specified tools and sustaining multi-step execution. It turns math problems into a controlled, correctness-checkable...

1 min 1 month, 3 weeks ago

evidence

LOW Academic International

Evaluating the Usage of African-American Vernacular English in Large Language Models

arXiv:2602.21485v1 Announce Type: new Abstract: In AI, most evaluations of natural language understanding tasks are conducted in standardized dialects such as Standard American English (SAE). In this work, we investigate how accurately large language models (LLMs) represent African American Vernacular...

1 min 1 month, 3 weeks ago

standing

LOW Academic International

Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling

arXiv:2602.21728v1 Announce Type: new Abstract: The reasoning process of Large Language Models (LLMs) is often plagued by hallucinations and missing facts in question-answering tasks. A promising solution is to ground LLMs' answers in verifiable knowledge sources, such as Knowledge Graphs...

1 min 1 month, 3 weeks ago

discovery

LOW Academic International

Improving Implicit Discourse Relation Recognition with Natural Language Explanations from LLMs

arXiv:2602.21763v1 Announce Type: new Abstract: Implicit Discourse Relation Recognition (IDRR) remains a challenging task due to the requirement for deep semantic understanding in the absence of explicit discourse markers. A further limitation is that existing methods only predict relations without...

1 min 1 month, 3 weeks ago

standing

LOW Academic International

FewMMBench: A Benchmark for Multimodal Few-Shot Learning

arXiv:2602.21854v1 Announce Type: new Abstract: As multimodal large language models (MLLMs) advance in handling interleaved image-text data, assessing their few-shot learning capabilities remains an open challenge. In this paper, we introduce FewMMBench, a comprehensive benchmark designed to evaluate MLLMs under...

1 min 1 month, 3 weeks ago

standing

ConstraintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization

VeRO: An Evaluation Harness for Agents to Optimize Agents

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions

FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics

FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning

A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection

Bridging Latent Reasoning and Target-Language Generation via Retrieval-Transition Heads

Volume 110 – Issue 3 - Minnesota Law Review

TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models

CiteLLM: An Agentic Platform for Trustworthy Scientific Reference Discovery

Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs

A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations

Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning

Early Risk Stratification of Dosing Errors in Clinical Trials Using Machine Learning

Manifold of Failure: Behavioral Attraction Basins in Language Models

When Should a Model Change Its Mind? An Energy-Based Theory and Regularizer for Concept Drift in Electrocardiogram (ECG) Signals

UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs

Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection

Predicting Tennis Serve directions with Machine Learning

Coarse-to-Fine Learning of Dynamic Causal Structures

Netflix cedes Warner Bros. Discovery to Paramount: “No longer financially attractive”

Disaster Question Answering with LoRA Efficiency and Accurate End Position

TRACE: Trajectory-Aware Comprehensive Evaluation for Deep Research Agents

Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models

ToolMATH: A Math Tool Benchmark for Realistic Long-Horizon Multi-Tool Reasoning

Evaluating the Usage of African-American Vernacular English in Large Language Models

Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling

Improving Implicit Discourse Relation Recognition with Natural Language Explanations from LLMs

FewMMBench: A Benchmark for Multimodal Few-Shot Learning

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.