Litigation

LOW Academic International

LifeEval: A Multimodal Benchmark for Assistive AI in Egocentric Daily Life Tasks

arXiv:2603.00490v1 Announce Type: new Abstract: The rapid progress of Multimodal Large Language Models (MLLMs) marks a significant step toward artificial general intelligence, offering great potential for augmenting human capabilities. However, their ability to provide effective assistance in dynamic, real-world environments...

1 min 1 month, 1 week ago

standing

LOW Academic International

MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation

arXiv:2603.00585v1 Announce Type: new Abstract: Recent advances in video generation have opened new avenues for macroscopic simulation of complex dynamic systems, but their application to microscopic phenomena remains largely unexplored. Microscale simulation holds great promise for biomedical applications such as...

1 min 1 month, 1 week ago

discovery

LOW Academic International

Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs

arXiv:2603.00590v1 Announce Type: new Abstract: As artificial intelligence (AI) is increasingly deployed across domains, ensuring fairness has become a core challenge. However, the field faces a "Tower of Babel'' dilemma: fairness metrics abound, yet their underlying philosophical assumptions often conflict,...

1 min 1 month, 1 week ago

standing

LOW Academic International

MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains

arXiv:2603.00873v1 Announce Type: new Abstract: With the increasing demand for step-wise, cross-modal, and knowledge-grounded reasoning, multimodal large language models (MLLMs) are evolving beyond the traditional fixed retrieve-then-generate paradigm toward more sophisticated agentic multimodal retrieval-augmented generation (MM-RAG). Existing benchmarks, however, mainly...

1 min 1 month, 1 week ago

evidence

LOW Academic International

HVR-Met: A Hypothesis-Verification-Replaning Agentic System for Extreme Weather Diagnosis

arXiv:2603.01121v1 Announce Type: new Abstract: While deep learning-based weather forecasting paradigms have made significant strides, addressing extreme weather diagnostics remains a formidable challenge. This gap exists primarily because the diagnostic process demands sophisticated multi-step logical reasoning, dynamic tool invocation, and...

1 min 1 month, 1 week ago

evidence

LOW Academic International

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

arXiv:2603.00026v1 Announce Type: new Abstract: Effective memory management is essential for large language model (LLM) agents handling long-term interactions. Current memory frameworks typically treat agents as passive "recorders" and retrieve information without understanding its deeper implications. They may fail in...

1 min 1 month, 1 week ago

standing

LOW Academic International

Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

arXiv:2603.02239v1 Announce Type: new Abstract: The Engineering Reasoning and Instruction (ERI) benchmark is a taxonomy-driven instruction dataset designed to train and evaluate engineering-capable large language models (LLMs) and agents. This dataset spans nine engineering fields (namely: civil, mechanical, electrical, chemical,...

1 min 1 month, 1 week ago

trial

LOW Academic International

Estimating Visual Attribute Effects in Advertising from Observational Data: A Deepfake-Informed Double Machine Learning Approach

arXiv:2603.02359v1 Announce Type: new Abstract: Digital advertising increasingly relies on visual content, yet marketers lack rigorous methods for understanding how specific visual attributes causally affect consumer engagement. This paper addresses a fundamental methodological challenge: estimating causal effects when the treatment,...

1 min 1 month, 1 week ago

standing

LOW Academic International

AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows

arXiv:2603.02601v1 Announce Type: new Abstract: Autonomous AI agents are deployed at unprecedented scale, yet no principled methodology exists for verifying that an agent has not regressed after changes to its prompts, tools, models, or orchestration logic. We present AgentAssay, the...

1 min 1 month, 1 week ago

trial

LOW Academic International

SorryDB: Can AI Provers Complete Real-World Lean Theorems?

arXiv:2603.02668v1 Announce Type: new Abstract: We present SorryDB, a dynamically-updating benchmark of open Lean tasks drawn from 78 real world formalization projects on GitHub. Unlike existing static benchmarks, often composed of competition problems, hillclimbing the SorryDB benchmark will yield tools...

1 min 1 month, 1 week ago

standing

LOW Academic International

Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

arXiv:2603.02798v1 Announce Type: new Abstract: As LLM-powered agents have been used for high-stakes decision-making, such as clinical diagnosis, it becomes critical to develop reliable verification of their decisions to facilitate trustworthy deployment. Yet, existing verifiers usually underperform owing to a...

1 min 1 month, 1 week ago

evidence

LOW Academic International

RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

arXiv:2603.03078v1 Announce Type: new Abstract: Agentic Reinforcement Learning (Agentic RL) has shown remarkable potential in large language model-based (LLM) agents. These works can empower LLM agents to tackle complex tasks via multi-step, tool-integrated reasoning. However, an inherent limitation of existing...

1 min 1 month, 1 week ago

discovery

LOW Academic International

Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation

arXiv:2603.03080v1 Announce Type: new Abstract: LLM-based explainable recommenders can produce fluent explanations that are factually correct, yet still justify items using attributes that conflict with a user's historical preferences. Such preference-inconsistent explanations yield logically valid but unconvincing reasoning and are...

1 min 1 month, 1 week ago

evidence

LOW Academic International

CoDAR: Continuous Diffusion Language Models are More Powerful Than You Think

arXiv:2603.02547v1 Announce Type: new Abstract: We study why continuous diffusion language models (DLMs) have lagged behind discrete diffusion approaches despite their appealing continuous generative dynamics. Under a controlled token--recovery study, we identify token rounding, the final projection from denoised embeddings...

1 min 1 month, 1 week ago

appeal

LOW Academic International

Phi-4-reasoning-vision-15B Technical Report

arXiv:2603.03975v1 Announce Type: new Abstract: We present Phi-4-reasoning-vision-15B, a compact open-weight multimodal reasoning model, and share the motivations, design choices, experiments, and learnings that informed its development. Our goal is to contribute practical insight to the research community on building...

1 min 1 month, 1 week ago

standing

LOW Academic International

Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions

arXiv:2603.04191v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly serving as personal assistants, where users share complex and diverse preferences over extended interactions. However, assessing how well LLMs can follow these preferences in realistic, long-term situations remains underexplored....

1 min 1 month, 1 week ago

standing

LOW Academic International

$\tau$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

arXiv:2603.04370v1 Announce Type: new Abstract: Conversational agents are increasingly deployed in knowledge-intensive settings, where correct behavior depends on retrieving and applying domain-specific knowledge from large, proprietary, and unstructured corpora during live interactions with users. Yet most existing benchmarks evaluate retrieval...

1 min 1 month, 1 week ago

trial

LOW Academic International

When Agents Persuade: Propaganda Generation and Mitigation in LLMs

arXiv:2603.04636v1 Announce Type: new Abstract: Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited to produce manipulative material. In this study, we task LLMs with propaganda objectives and analyze their outputs using two domain-specific models: one...

1 min 1 month, 1 week ago

appeal

LOW Academic International

LLM-Grounded Explainability for Port Congestion Prediction via Temporal Graph Attention Networks

arXiv:2603.04818v1 Announce Type: new Abstract: Port congestion at major maritime hubs disrupts global supply chains, yet existing prediction systems typically prioritize forecasting accuracy without providing operationally interpretable explanations. This paper proposes AIS-TGNN, an evidence-grounded framework that jointly performs congestion-escalation prediction...

1 min 1 month, 1 week ago

evidence

LOW Academic International

Retrieval-Augmented Generation with Covariate Time Series

arXiv:2603.04951v1 Announce Type: new Abstract: While RAG has greatly enhanced LLMs, extending this paradigm to Time-Series Foundation Models (TSFMs) remains a challenge. This is exemplified in the Predictive Maintenance of the Pressure Regulating and Shut-Off Valve (PRSOV), a high-stakes industrial...

1 min 1 month, 1 week ago

trial

LOW Academic International

CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

arXiv:2603.04406v1 Announce Type: new Abstract: With the growing use of Retrieval-Augmented Generation (RAG), training large language models (LLMs) for context-sensitive reasoning and faithfulness is increasingly important. Existing RAG-oriented reinforcement learning (RL) methods rely on external rewards that often fail to...

1 min 1 month, 1 week ago

evidence

LOW Academic International

Coordinated Semantic Alignment and Evidence Constraints for Retrieval-Augmented Generation with Large Language Models

arXiv:2603.04647v1 Announce Type: new Abstract: Retrieval augmented generation mitigates limitations of large language models in factual consistency and knowledge updating by introducing external knowledge. However, practical applications still suffer from semantic misalignment between retrieved results and generation objectives, as well...

1 min 1 month, 1 week ago

evidence

LOW Academic International

iAgentBench: Benchmarking Sensemaking Capabilities of Information-Seeking Agents on High-Traffic Topics

arXiv:2603.04656v1 Announce Type: new Abstract: With the emergence of search-enabled generative QA systems, users are increasingly turning to tools that browse, aggregate, and reconcile evidence across multiple sources on their behalf. Yet many widely used QA benchmarks remain answerable by...

1 min 1 month, 1 week ago

evidence

LOW Academic International

Stacked from One: Multi-Scale Self-Injection for Context Window Extension

arXiv:2603.04759v1 Announce Type: new Abstract: The limited context window of contemporary large language models (LLMs) remains a primary bottleneck for their broader application across diverse domains. Although continual pre-training on long-context data offers a straightforward solution, it incurs prohibitive data...

1 min 1 month, 1 week ago

standing

LOW Academic International

TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings

arXiv:2603.04772v1 Announce Type: new Abstract: Despite the exceptional reasoning capabilities of Multimodal Large Language Models (MLLMs), their adaptation into universal embedding models is significantly impeded by task conflict. To address this, we propose TSEmbed, a universal multimodal embedding framework that...

1 min 1 month, 1 week ago

trial

LOW Academic International

Autoscoring Anticlimax: A Meta-analytic Understanding of AI's Short-answer Shortcomings and Wording Weaknesses

arXiv:2603.04820v1 Announce Type: new Abstract: Automated short-answer scoring lags other LLM applications. We meta-analyze 890 culminating results across a systematic review of LLM short-answer scoring studies, modeling the traditional effect size of Quadratic Weighted Kappa (QWK) with mixed effects metaregression....

1 min 1 month, 1 week ago

standing

LOW Academic International

Engineering Regression Without Real-Data Training: Domain Adaptation for Tabular Foundation Models Using Multi-Dataset Embeddings

arXiv:2603.04692v1 Announce Type: new Abstract: Predictive modeling in engineering applications has long been dominated by bespoke models and small, siloed tabular datasets, limiting the applicability of large-scale learning approaches. Despite recent progress in tabular foundation models, the resulting synthetic training...

1 min 1 month, 1 week ago

trial

LOW Academic International

Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning

arXiv:2603.04780v1 Announce Type: new Abstract: Causal discovery with latent variables is a fundamental task. Yet most existing methods rely on strong structural assumptions, such as enforcing specific indicator patterns for latents or restricting how they can interact with others. We...

1 min 1 month, 1 week ago

discovery

LOW News International

Workers report watching Ray-Ban Meta-shot footage of people using the bathroom

Meta accused of "concealing the facts" about smart glass users' privacy.

1 min 1 month, 1 week ago

lawsuit

LOW Academic International

From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG

arXiv:2603.03292v1 Announce Type: cross Abstract: Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation (RAG) mitigates these issues, existing methods...

1 min 1 month, 1 week ago

evidence

LifeEval: A Multimodal Benchmark for Assistive AI in Egocentric Daily Life Tasks

MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation

Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs

MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains

HVR-Met: A Hypothesis-Verification-Replaning Agentic System for Extreme Weather Diagnosis

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

Estimating Visual Attribute Effects in Advertising from Observational Data: A Deepfake-Informed Double Machine Learning Approach

AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows

SorryDB: Can AI Provers Complete Real-World Lean Theorems?

Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation

CoDAR: Continuous Diffusion Language Models are More Powerful Than You Think

Phi-4-reasoning-vision-15B Technical Report

Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions

$\tau$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

When Agents Persuade: Propaganda Generation and Mitigation in LLMs

LLM-Grounded Explainability for Port Congestion Prediction via Temporal Graph Attention Networks

Retrieval-Augmented Generation with Covariate Time Series

CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

Coordinated Semantic Alignment and Evidence Constraints for Retrieval-Augmented Generation with Large Language Models

iAgentBench: Benchmarking Sensemaking Capabilities of Information-Seeking Agents on High-Traffic Topics

Stacked from One: Multi-Scale Self-Injection for Context Window Extension

TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings

Autoscoring Anticlimax: A Meta-analytic Understanding of AI's Short-answer Shortcomings and Wording Weaknesses

Engineering Regression Without Real-Data Training: Domain Adaptation for Tabular Foundation Models Using Multi-Dataset Embeddings

Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning

Workers report watching Ray-Ban Meta-shot footage of people using the bathroom

From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.