International Law

LOW Academic International

Causal Effect Estimation with Latent Textual Treatments

arXiv:2602.15730v1 Announce Type: new Abstract: Understanding the causal effects of text on downstream outcomes is a central task in many applications. Estimating such effects requires researchers to run controlled experiments that systematically vary textual features. While large language models (LLMs)...

1 min 2 months ago

ear

LOW Academic International

Ethical Considerations in Artificial Intelligence: Addressing Bias and Fairness in Algorithmic Decision-Making

The expanding use of artificial intelligence (AI) in decision-making across a range of industries has given rise to serious ethical questions about prejudice and justice. This study looks at the moral ramifications of using AI algorithms in decision-making and looks...

1 min 2 months ago

ear

LOW Academic International

How Uncertain Is the Grade? A Benchmark of Uncertainty Metrics for LLM-Based Automatic Assessment

arXiv:2602.16039v1 Announce Type: new Abstract: The rapid rise of large language models (LLMs) is reshaping the landscape of automatic assessment in education. While these systems demonstrate substantial advantages in adaptability to diverse question types and flexibility in output formats, they...

1 min 2 months ago

ear

LOW Academic International

Improving Interactive In-Context Learning from Natural Language Feedback

arXiv:2602.16066v1 Announce Type: new Abstract: Adapting one's thought process based on corrective feedback is an essential ability in human learning, particularly in collaborative settings. In contrast, the current large language model training paradigm relies heavily on modeling vast, static corpora....

1 min 2 months ago

ear

LOW Academic International

GPSBench: Do Large Language Models Understand GPS Coordinates?

arXiv:2602.16105v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in applications that interact with the physical world, such as navigation, robotics, or mapping, making robust geospatial reasoning a critical capability. Despite that, LLMs' ability to reason about...

1 min 2 months ago

ear

LOW Academic International

Learning Personalized Agents from Human Feedback

arXiv:2602.16173v1 Announce Type: new Abstract: Modern AI agents are powerful but often fail to align with the idiosyncratic, evolving preferences of individual users. Prior approaches typically rely on static datasets, either training implicit preference models on interaction history or encoding...

1 min 2 months ago

ear

LOW Academic International

EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments

arXiv:2602.16179v1 Announce Type: new Abstract: We show that training AI agents on high-fidelity reinforcement learning environments produces capabilities that generalize beyond the training distribution. We introduce \corecraft{}, the first environment in \textsc{EnterpriseGym}, Surge AI's suite of agentic RL environments. \corecraft{}...

1 min 2 months ago

ear

LOW Academic International

Revolutionizing Long-Term Memory in AI: New Horizons with High-Capacity and High-Speed Storage

arXiv:2602.16192v1 Announce Type: new Abstract: Driven by our mission of "uplifting the world with memory," this paper explores the design concept of "memory" that is essential for achieving artificial superintelligence (ASI). Rather than proposing novel methods, we focus on several...

1 min 2 months ago

ear

LOW Academic International

Toward Scalable Verifiable Reward: Proxy State-Based Evaluation for Multi-turn Tool-Calling LLM Agents

arXiv:2602.16246v1 Announce Type: new Abstract: Interactive large language model (LLM) agents operating via multi-turn dialogue and multi-step tool calling are increasingly used in production. Benchmarks for these agents must both reliably compare models and yield on-policy training data. Prior agentic...

1 min 2 months ago

ear

LOW Academic International

Verifiable Semantics for Agent-to-Agent Communication

arXiv:2602.16424v1 Announce Type: new Abstract: Multiagent AI systems require consistent communication, but we lack methods to verify that agents share the same understanding of the terms used. Natural language is interpretable but vulnerable to semantic drift, while learned protocols are...

1 min 2 months ago

ear

LOW Academic International

EdgeNav-QE: QLoRA Quantization and Dynamic Early Exit for LAM-based Navigation on Edge Devices

arXiv:2602.15836v1 Announce Type: cross Abstract: Large Action Models (LAMs) have shown immense potential in autonomous navigation by bridging high-level reasoning with low-level control. However, deploying these multi-billion parameter models on edge devices remains a significant challenge due to memory constraints...

1 min 2 months ago

ear

LOW Academic International

Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models

arXiv:2602.15847v1 Announce Type: cross Abstract: Personality steering in large language models (LLMs) commonly relies on injecting trait-specific steering vectors, implicitly assuming that personality traits can be controlled independently. In this work, we examine whether this assumption holds by analysing the...

1 min 2 months ago

ear

LOW Academic International

Artificial Intelligence and Justice in Family Law: Addressing Bias and Promoting Fairness

Artificial Intelligence (AI) plays a crucial role in the legal field today, carrying out processes such as predictive analysis, data interpretation, and decision making. AI is valued for its efficiency and accuracy along with its affordability. However, one problem that...

1 min 2 months ago

ear

LOW Academic International

Narrative Theory-Driven LLM Methods for Automatic Story Generation and Understanding: A Survey

arXiv:2602.15851v1 Announce Type: cross Abstract: Applications of narrative theories using large language models (LLMs) deliver promising use-cases in automatic story generation and understanding tasks. Our survey examines how natural language processing (NLP) research engages with fields of narrative studies, and...

1 min 2 months ago

ear

LOW Academic International

A Lightweight Explainable Guardrail for Prompt Safety

arXiv:2602.15853v1 Announce Type: cross Abstract: We propose a lightweight explainable guardrail (LEG) method for the classification of unsafe prompts. LEG uses a multi-task learning architecture to jointly learn a prompt classifier and an explanation classifier, where the latter labels prompt...

1 min 2 months ago

ear

LOW Academic International

Decoupling Strategy and Execution in Task-Focused Dialogue via Goal-Oriented Preference Optimization

arXiv:2602.15854v1 Announce Type: cross Abstract: Large language models show potential in task-oriented dialogue systems, yet existing training methods often rely on token-level likelihood or preference optimization, which poorly align with long-horizon task success. To address this, we propose Goal-Oriented Preference...

1 min 2 months ago

ear

LOW Academic International

Kalman-Inspired Runtime Stability and Recovery in Hybrid Reasoning Systems

arXiv:2602.15855v1 Announce Type: cross Abstract: Hybrid reasoning systems that combine learned components with model-based inference are increasingly deployed in tool-augmented decision loops, yet their runtime behavior under partial observability and sustained evidence mismatch remains poorly understood. In practice, failures often...

1 min 2 months ago

ear

LOW Academic International

Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective

arXiv:2602.15856v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) effectively grounds Large Language Models (LLMs) with external knowledge and is widely applied to Web-related tasks. However, its scalability is hindered by excessive context length and redundant retrievals. Recent research on soft...

1 min 2 months ago

ear

LOW Academic International

NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey

arXiv:2602.15866v1 Announce Type: cross Abstract: Natural Language Processing (NLP) is integral to social media analytics but often processes content containing Personally Identifiable Information (PII), behavioral cues, and metadata raising privacy risks such as surveillance, profiling, and targeted advertising. To systematically...

1 min 2 months ago

ear

LOW Academic International

Playing With AI: How Do State-Of-The-Art Large Language Models Perform in the 1977 Text-Based Adventure Game Zork?

arXiv:2602.15867v1 Announce Type: cross Abstract: In this positioning paper, we evaluate the problem-solving and reasoning capabilities of contemporary Large Language Models (LLMs) through their performance in Zork, the seminal text-based adventure game first released in 1977. The game's dialogue-based structure...

1 min 2 months ago

ear

LOW Academic International

IT-OSE: Exploring Optimal Sample Size for Industrial Data Augmentation

arXiv:2602.15878v1 Announce Type: cross Abstract: In industrial scenarios, data augmentation is an effective approach to improve model performance. However, its benefits are not unidirectionally beneficial. There is no theoretical research or established estimation for the optimal sample size (OSS) in...

1 min 2 months ago

ear

LOW Academic International

Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance

arXiv:2602.15889v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used in research both as tools and as objects of investigation. Much of this work implicitly assumes that LLM performance under fixed conditions (identical model snapshot, hyperparameters, and prompt)...

1 min 2 months ago

ear

LOW Academic International

Egocentric Bias in Vision-Language Models

arXiv:2602.15892v1 Announce Type: cross Abstract: Visual perspective taking--inferring how the world appears from another's viewpoint--is foundational to social cognition. We introduce FlipSet, a diagnostic benchmark for Level-2 visual perspective taking (L2 VPT) in vision-language models. The task requires simulating 180-degree...

1 min 2 months ago

ear

LOW Academic International

Doc-to-LoRA: Learning to Instantly Internalize Contexts

arXiv:2602.15902v1 Announce Type: cross Abstract: Long input sequences are central to in-context learning, document understanding, and multi-step reasoning of Large Language Models (LLMs). However, the quadratic attention cost of Transformers makes inference memory-intensive and slow. While context distillation (CD) can...

1 min 2 months ago

ear

LOW Academic International

Node Learning: A Framework for Adaptive, Decentralised and Collaborative Network Edge AI

arXiv:2602.16814v1 Announce Type: new Abstract: The expansion of AI toward the edge increasingly exposes the cost and fragility of cen- tralised intelligence. Data transmission, latency, energy consumption, and dependence on large data centres create bottlenecks that scale poorly across heterogeneous,...

1 min 2 months ago

ear

LOW Academic International

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

arXiv:2602.16832v1 Announce Type: new Abstract: Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities understudied. We introduce \textbf{Indic Jailbreak Robustness (IJR)}, a judge-free benchmark for adversarial safety across 12 Indic and South...

1 min 2 months ago

icj

LOW Academic International

LLM-WikiRace: Benchmarking Long-term Planning and Reasoning over Real-World Knowledge Graphs

arXiv:2602.16902v1 Announce Type: new Abstract: We introduce LLM-Wikirace, a benchmark for evaluating planning, reasoning, and world knowledge in large language models (LLMs). In LLM-Wikirace, models must efficiently navigate Wikipedia hyperlinks step by step to reach a target page from a...

1 min 2 months ago

ear

LOW Academic International

Narrow fine-tuning erodes safety alignment in vision-language agents

arXiv:2602.16931v1 Announce Type: new Abstract: Lifelong multimodal agents must continuously adapt to new tasks through post-training, but this creates fundamental tension between acquiring capabilities and preserving safety alignment. We demonstrate that fine-tuning aligned vision-language models on narrow-domain harmful datasets induces...

1 min 2 months ago

ear

LOW Academic International

SourceBench: Can AI Answers Reference Quality Web Sources?

arXiv:2602.16942v1 Announce Type: new Abstract: Large language models (LLMs) increasingly answer queries by citing web sources, but existing evaluations emphasize answer correctness rather than evidence quality. We introduce SourceBench, a benchmark for measuring the quality of cited web sources across...

1 min 2 months ago

ear

LOW Academic International

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

arXiv:2602.16953v1 Announce Type: new Abstract: Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical. High-coverage hardware verification exemplifies this challenge due...

1 min 2 months ago

ear

Causal Effect Estimation with Latent Textual Treatments

Ethical Considerations in Artificial Intelligence: Addressing Bias and Fairness in Algorithmic Decision-Making

How Uncertain Is the Grade? A Benchmark of Uncertainty Metrics for LLM-Based Automatic Assessment

Improving Interactive In-Context Learning from Natural Language Feedback

GPSBench: Do Large Language Models Understand GPS Coordinates?

Learning Personalized Agents from Human Feedback

EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments

Revolutionizing Long-Term Memory in AI: New Horizons with High-Capacity and High-Speed Storage

Toward Scalable Verifiable Reward: Proxy State-Based Evaluation for Multi-turn Tool-Calling LLM Agents

Verifiable Semantics for Agent-to-Agent Communication

EdgeNav-QE: QLoRA Quantization and Dynamic Early Exit for LAM-based Navigation on Edge Devices

Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models

Artificial Intelligence and Justice in Family Law: Addressing Bias and Promoting Fairness

Narrative Theory-Driven LLM Methods for Automatic Story Generation and Understanding: A Survey

A Lightweight Explainable Guardrail for Prompt Safety

Decoupling Strategy and Execution in Task-Focused Dialogue via Goal-Oriented Preference Optimization

Kalman-Inspired Runtime Stability and Recovery in Hybrid Reasoning Systems

Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective

NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey

Playing With AI: How Do State-Of-The-Art Large Language Models Perform in the 1977 Text-Based Adventure Game Zork?

IT-OSE: Exploring Optimal Sample Size for Industrial Data Augmentation

Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance

Egocentric Bias in Vision-Language Models

Doc-to-LoRA: Learning to Instantly Internalize Contexts

Node Learning: A Framework for Adaptive, Decentralised and Collaborative Network Edge AI

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

LLM-WikiRace: Benchmarking Long-term Planning and Reasoning over Real-World Knowledge Graphs

Narrow fine-tuning erodes safety alignment in vision-language agents

SourceBench: Can AI Answers Reference Quality Web Sources?

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.