Academic

Academic · 1 min

LLM-as-Judge for Semantic Judging of Powerline Segmentation in UAV Inspection

arXiv:2604.05371v1 Announce Type: new Abstract: The deployment of lightweight segmentation models on drones for autonomous power line inspection presents a critical challenge: maintaining reliable performance …

Akram Hossain, Rabab Abdelfattah, Xiaofeng Wang, Kareem Abdelfatah

3 views Apr 8

Academic · 1 min

TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems

arXiv:2604.05364v1 Announce Type: new Abstract: We introduce TFRBench, the first benchmark designed to evaluate the reasoning capabilities of forecasting systems. Traditionally, time-series forecasting has been …

Md Atik Ahamed, Mihir Parmar, Palash Goyal, Yiwen Song, Long T. Le, Qiang Cheng, Chun-Liang Li, Hamid Palangi, Jinsung Yoon, Tomas Pfister

4 views Apr 8

Academic · 1 min

LatentAudit: Real-Time White-Box Faithfulness Monitoring for Retrieval-Augmented Generation with Verifiable Deployment

arXiv:2604.05358v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) mitigates hallucination but does not eliminate it: a deployed system must still decide, at inference time, whether …

Zhe Yu, Wenpeng Xing, Meng Han

5 views Apr 8

Academic · 1 min

ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning

arXiv:2604.05355v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning improves large language model performance on complex tasks, but often produces excessively long and inefficient reasoning traces. …

Xuan Xiong, Huan Liu, Li Gu, Zhixiang Chi, Yue Qiu, Yuanhao Yu, Yang Wang

5 views Apr 8

Academic · 1 min

From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical …

arXiv:2604.05348v1 Announce Type: new Abstract: Hallucinations in medical large language models (LLMs) remain a safety-critical issue, particularly when available evidence is insufficient or conflicting. We …

Zhe Yu, Wenpeng Xing, Meng Han

8 views Apr 8

Academic · 1 min

Dynamic Agentic AI Expert Profiler System Architecture for Multidomain Intelligence Modeling

arXiv:2604.05345v1 Announce Type: new Abstract: In today's artificial intelligence driven world, modern systems communicate with people from diverse backgrounds and skill levels. For human-machine interaction …

Aisvarya Adeseye, Jouni Isoaho, Seppo Virtanen, Mohammad Tahir

4 views Apr 8

Academic · 1 min

TRACE: Capability-Targeted Agentic Training

arXiv:2604.05336v1 Announce Type: new Abstract: Large Language Models (LLMs) deployed in agentic environments must exercise multiple capabilities across different task instances, where a capability is …

Hangoo Kang, Tarun Suresh, Jon Saad-Falcon, Azalia Mirhoseini

20 views Apr 8

Academic · 1 min

Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

arXiv:2604.05333v1 Announce Type: new Abstract: Skill usage has become a core component of modern agent systems and can substantially improve agents' ability to complete complex …

Dawei Li, Zongxia Li, Hongyang Du, Xiyang Wu, Shihang Gui, Yongbei Kuang, Lichao Sun

3 views Apr 8

Academic · 1 min

Breakthrough the Suboptimal Stable Point in Value-Factorization-Based Multi-Agent Reinforcement Learning

arXiv:2604.05297v1 Announce Type: new Abstract: Value factorization, a popular paradigm in MARL, faces significant theoretical and algorithmic bottlenecks: its tendency to converge to suboptimal solutions …

Lesong Tao, Yifei Wang, Haodong Jing, Jingwen Fu, Miao Kang, Shitao Chen, Nanning Zheng

4 views Apr 8

Academic · 1 min

Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition

arXiv:2604.05279v1 Announce Type: new Abstract: Large language models exhibit sycophancy, the tendency to shift their stated positions toward perceived user preferences or authority cues regardless …

Muhammad Ahmed Mohsin, Ahsan Bilal, Muhammad Umer, Emily Fox

3 views Apr 8

Academic · 1 min

Simulating the Evolution of Alignment and Values in Machine Intelligence

arXiv:2604.05274v1 Announce Type: new Abstract: Model alignment is currently applied in a vacuum, evaluated primarily through standardised benchmark performance. The purpose of this study is …

Jonathan Elsworth Eicher

4 views Apr 8

Academic · 1 min

EAGLE: Edge-Aware Graph Learning for Proactive Delivery Delay Prediction in Smart Logistics Networks

arXiv:2604.05254v1 Announce Type: new Abstract: Modern logistics networks generate rich operational data streams at every warehouse node and transportation lane -- from order timestamps and …

Zhiming Xue, Menghao Huo, Yujue Wang

4 views Apr 8

LLM-as-Judge for Semantic Judging of Powerline Segmentation in UAV Inspection

TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems

LatentAudit: Real-Time White-Box Faithfulness Monitoring for Retrieval-Augmented Generation with Verifiable Deployment

ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning

From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical …

Dynamic Agentic AI Expert Profiler System Architecture for Multidomain Intelligence Modeling

TRACE: Capability-Targeted Agentic Training

Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

Breakthrough the Suboptimal Stable Point in Value-Factorization-Based Multi-Agent Reinforcement Learning

Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition

Simulating the Evolution of Alignment and Values in Machine Intelligence

EAGLE: Edge-Aware Graph Learning for Proactive Delivery Delay Prediction in Smart Logistics Networks

JCG, PC

HSOLLC Co., Ltd.