Understanding the Generalization of Bilevel Programming in Hyperparameter Optimization: A Tale of Bias-Variance Decomposition
arXiv:2602.17947v1 Announce Type: new Abstract: Gradient-based hyperparameter optimization (HPO) have emerged recently, leveraging bilevel programming techniques to optimize hyperparameter by estimating hypergradient w.r.t. validation loss. Nevertheless, previous theoretical works mainly focus on reducing the gap between the estimation and ground-truth...
Data center builders thought farmers would willingly sell land, learn otherwise
Even in a fragile farm economy, million-dollar offers can't sway dedicated farmers.
AIs can generate near-verbatim copies of novels from training data
LLMs memorize more training data than previously thought.
How AI agents could destroy the economy
Citrini Research imagines a report from two years in the future, in which unemployment has doubled and the total value of the stock market has fallen by more than a third.
Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation
Trustworthy Artificial Intelligence (AI) is based on seven technical requirements sustained over three main pillars that should be met throughout the system’s entire life cycle: it should be (1) lawful, (2) ethical, and (3) robust, both from a technical and...
When Remembering and Planning are Worth it: Navigating under Change
arXiv:2602.15274v1 Announce Type: new Abstract: We explore how different types and uses of memory can aid spatial navigation in changing uncertain environments. In the simple foraging task we study, every day, our agent has to find its way from its...
EAA: Automating materials characterization with vision language model agents
arXiv:2602.15294v1 Announce Type: new Abstract: We present Experiment Automation Agents (EAA), a vision-language-model-driven agentic system designed to automate complex experimental microscopy workflows. EAA integrates multimodal reasoning, tool-augmented action, and optional long-term memory to support both autonomous procedures and interactive user-guided...
Improving LLM Reliability through Hybrid Abstention and Adaptive Detection
arXiv:2602.15391v1 Announce Type: new Abstract: Large Language Models (LLMs) deployed in production environments face a fundamental safety-utility trade-off either a strict filtering mechanisms prevent harmful outputs but often block benign queries or a relaxed controls risk unsafe content generation. Conventional...
GenAI-LA: Generative AI and Learning Analytics Workshop (LAK 2026), April 27--May 1, 2026, Bergen, Norway
arXiv:2602.15531v1 Announce Type: new Abstract: This work introduces EduEVAL-DB, a dataset based on teacher roles designed to support the evaluation and training of automatic pedagogical evaluators and AI tutors for instructional explanations. The dataset comprises 854 explanations corresponding to 139...
On inferring cumulative constraints
arXiv:2602.15635v1 Announce Type: new Abstract: Cumulative constraints are central in scheduling with constraint programming, yet propagation is typically performed per constraint, missing multi-resource interactions and causing severe slowdowns on some benchmarks. I present a preprocessing method for inferring additional cumulative...
CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving
arXiv:2602.15645v1 Announce Type: new Abstract: Foundation models, including vision language models, are increasingly used in automated driving to interpret scenes, recommend actions, and generate natural language explanations. However, existing evaluation methods primarily assess outcome based performance, such as safety and...
PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra
arXiv:2602.15669v1 Announce Type: new Abstract: Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning, failing to capture the dynamic and compositional nature of human traits. We introduce PERSONA, a training-free framework that achieves...
Recursive Concept Evolution for Compositional Reasoning in Large Language Models
arXiv:2602.15725v1 Announce Type: new Abstract: Large language models achieve strong performance on many complex reasoning tasks, yet their accuracy degrades sharply on benchmarks that require compositional reasoning, including ARC-AGI-2, GPQA, MATH, BBH, and HLE. Existing methods improve reasoning by expanding...
LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets
arXiv:2602.13209v1 Announce Type: cross Abstract: We introduce LemonadeBench v0.5, a minimal benchmark for evaluating economic intuition, long-term planning, and decision-making under uncertainty in large language models (LLMs) through a simulated lemonade stand business. Models must manage inventory with expiring goods,...
EduResearchBench: A Hierarchical Atomic Task Decomposition Benchmark for Full-Lifecycle Educational Research
arXiv:2602.15034v1 Announce Type: cross Abstract: While Large Language Models (LLMs) are reshaping the paradigm of AI for Social Science (AI4SS), rigorously evaluating their capabilities in scholarly writing remains a major challenge. Existing benchmarks largely emphasize single-shot, monolithic generation and thus...
CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis
arXiv:2602.15037v1 Announce Type: cross Abstract: As large language models (LLMs) advance toward expert-level performance in engineering domains, reliable reasoning under user-specified constraints becomes critical. In circuit analysis, for example, a numerically correct solution is insufficient if it violates established methodological...
Indic-TunedLens: Interpreting Multilingual Models in Indian Languages
arXiv:2602.15038v1 Announce Type: cross Abstract: Multilingual large language models (LLMs) are increasingly deployed in linguistically diverse regions like India, yet most interpretability tools remain tailored to English. Prior work reveals that LLMs often operate in English centric representation spaces, making...
GRACE: an Agentic AI for Particle Physics Experiment Design and Simulation
arXiv:2602.15039v1 Announce Type: cross Abstract: We present GRACE, a simulation-native agent for autonomous experimental design in high-energy and nuclear physics. Given multimodal input in the form of a natural-language prompt or a published experimental paper, the agent extracts a structured...
CLOT: Closed-Loop Global Motion Tracking for Whole-Body Humanoid Teleoperation
arXiv:2602.15060v1 Announce Type: cross Abstract: Long-horizon whole-body humanoid teleoperation remains challenging due to accumulated global pose drift, particularly on full-sized humanoids. Although recent learning-based tracking methods enable agile and coordinated motions, they typically operate in the robot's local frame and...
Safe-SDL:Establishing Safety Boundaries and Control Mechanisms for AI-Driven Self-Driving Laboratories
arXiv:2602.15061v1 Announce Type: cross Abstract: The emergence of Self-Driving Laboratories (SDLs) transforms scientific discovery methodology by integrating AI with robotic automation to create closed-loop experimental systems capable of autonomous hypothesis generation, experimentation, and analysis. While promising to compress research timelines...
AIC CTU@AVerImaTeC: dual-retriever RAG for image-text fact checking
arXiv:2602.15190v1 Announce Type: new Abstract: In this paper, we present our 3rd place system in the AVerImaTeC shared task, which combines our last year's retrieval-augmented generation (RAG) pipeline with a reverse image search (RIS) module. Despite its simplicity, our system...
OpaqueToolsBench: Learning Nuances of Tool Behavior Through Interaction
arXiv:2602.15197v1 Announce Type: new Abstract: Tool-calling is essential for Large Language Model (LLM) agents to complete real-world tasks. While most existing benchmarks assume simple, perfectly documented tools, real-world tools (e.g., general "search" APIs) are often opaque, lacking clear best practices...
Extracting Consumer Insight from Text: A Large Language Model Approach to Emotion and Evaluation Measurement
arXiv:2602.15312v1 Announce Type: new Abstract: Accurately measuring consumer emotions and evaluations from unstructured text remains a core challenge for marketing research and practice. This study introduces the Linguistic eXtractor (LX), a fine-tuned, large language model trained on consumer-authored text that...
Mnemis: Dual-Route Retrieval on Hierarchical Graphs for Long-Term LLM Memory
arXiv:2602.15313v1 Announce Type: new Abstract: AI Memory, specifically how models organizes and retrieves historical messages, becomes increasingly valuable to Large Language Models (LLMs), yet existing methods (RAG and Graph-RAG) primarily retrieve memory through similarity-based mechanisms. While efficient, such System-1-style retrieval...
Measuring Social Integration Through Participation: Categorizing Organizations and Leisure Activities in the Displaced Karelians Interview Archive using LLMs
arXiv:2602.15436v1 Announce Type: new Abstract: Digitized historical archives make it possible to study everyday social life on a large scale, but the information extracted directly from text often does not directly allow one to answer the research questions posed by...
TAROT: Test-driven and Capability-adaptive Curriculum Reinforcement Fine-tuning for Code Generation with Large Language Models
arXiv:2602.15449v1 Announce Type: new Abstract: Large Language Models (LLMs) are changing the coding paradigm, known as vibe coding, yet synthesizing algorithmically sophisticated and robust code still remains a critical challenge. Incentivizing the deep reasoning capabilities of LLMs is essential to...
In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations
arXiv:2602.15456v1 Announce Type: new Abstract: Agents based on Large Language Models (LLMs) are increasingly being deployed as interfaces to information on online platforms. These agents filter, prioritize, and synthesize information retrieved from the platforms' back-end databases or via web search....
LuxMT Technical Report
arXiv:2602.15506v1 Announce Type: new Abstract: We introduce LuxMT, a machine translation system based on Gemma 3 27B and fine-tuned for translation from Luxembourgish (LB) into French (FR) and English (EN). To assess translation performance, we construct a novel benchmark covering...
ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling
arXiv:2602.15537v1 Announce Type: new Abstract: Pure speech language models aim to learn language directly from raw audio without textual resources. A key challenge is that discrete tokens from self-supervised speech encoders result in excessively long sequences, motivating recent work on...
Clinically Inspired Symptom-Guided Depression Detection from Emotion-Aware Speech Representations
arXiv:2602.15578v1 Announce Type: new Abstract: Depression manifests through a diverse set of symptoms such as sleep disturbance, loss of interest, and concentration difficulties. However, most existing works treat depression prediction either as a binary label or an overall severity score...