How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1
arXiv:2602.19526v1 Announce Type: new Abstract: Deep Research agents tackle knowledge-intensive tasks through multi-round retrieval and decision-oriented generation. While reinforcement learning (RL) has been shown to improve performance in this paradigm, its contributions remain underexplored. To fully understand the role of...
Anatomy of Unlearning: The Dual Impact of Fact Salience and Model Fine-Tuning
arXiv:2602.19612v1 Announce Type: new Abstract: Machine Unlearning (MU) enables Large Language Models (LLMs) to remove unsafe or outdated information. However, existing work assumes that all facts are equally forgettable and largely ignores whether the forgotten knowledge originates from pretraining or...
KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge
arXiv:2602.19643v1 Announce Type: new Abstract: Large Language Models (LLMs) possess a remarkable capacity to generate persuasive and intelligible language. However, coherence does not equate to truthfulness, as the responses often contain subtle hallucinations. Existing benchmarks are limited by static and...
Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning
arXiv:2602.18493v1 Announce Type: new Abstract: Long-context LLMs and Retrieval-Augmented Generation (RAG) systems process information passively, deferring state tracking, contradiction resolution, and evidence aggregation to query time, which becomes brittle under ultra long streams with frequent updates. We propose the Unified...
Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data
arXiv:2602.18519v1 Announce Type: new Abstract: Traditional approaches to measuring visual exploratory behavior in soccer rely on counting visual exploratory actions (VEAs) based on rapid head movements exceeding 125{\deg}/s, but this method suffer from player position bias (i.e., a focus on...
GIST: Targeted Data Selection for Instruction Tuning via Coupled Optimization Geometry
arXiv:2602.18584v1 Announce Type: new Abstract: Targeted data selection has emerged as a crucial paradigm for efficient instruction tuning, aiming to identify a small yet influential subset of training examples for a specific target task. In practice, influence is often measured...
Diagnosing LLM Reranker Behavior Under Fixed Evidence Pools
arXiv:2602.18613v1 Announce Type: new Abstract: Standard reranking evaluations study how a reranker orders candidates returned by an upstream retriever. This setup couples ranking behavior with retrieval quality, so differences in output cannot be attributed to the ranking policy alone. We...
In-Context Planning with Latent Temporal Abstractions
arXiv:2602.18694v1 Announce Type: new Abstract: Planning-based reinforcement learning for continuous control is bottlenecked by two practical issues: planning at primitive time scales leads to prohibitive branching and long horizons, while real environments are frequently partially observable and exhibit regime shifts...
Exact Attention Sensitivity and the Geometry of Transformer Stability
arXiv:2602.18849v1 Announce Type: new Abstract: Despite powering modern AI, transformers remain mysteriously brittle to train. We develop a stability theory that explains why pre-LayerNorm works, why DeepNorm uses $N^{-1/4}$ scaling, and why warmup is necessary, all from first principles. Our...
Rank-Aware Spectral Bounds on Attention Logits for Stable Low-Precision Training
arXiv:2602.18851v1 Announce Type: new Abstract: Attention scores in transformers are bilinear forms $S_{ij} = x_i^\top M x_j / \sqrt{d_h}$ whose maximum magnitude governs overflow risk in low-precision training. We derive a \emph{rank-aware concentration inequality}: when the interaction matrix $M =...
Issues with Measuring Task Complexity via Random Policies in Robotic Tasks
arXiv:2602.18856v1 Announce Type: new Abstract: Reinforcement learning (RL) has enabled major advances in fields such as robotics and natural language processing. A key challenge in RL is measuring task complexity, which is essential for creating meaningful benchmarks and designing effective...
PCA-VAE: Differentiable Subspace Quantization without Codebook Collapse
arXiv:2602.18904v1 Announce Type: new Abstract: Vector-quantized autoencoders deliver high-fidelity latents but suffer inherent flaws: the quantizer is non-differentiable, requires straight-through hacks, and is prone to collapse. We address these issues at the root by replacing VQ with a simple, principled,...
Nvidia challenger AI chip startup MatX raised $500M
The startup was founded by former Google TPU engineers in 2023.
Meta strikes up to $100B AMD chip deal as it chases ‘personal superintelligence’
Meta is buying billions of dollars in AMD AI chips in a multiyear deal tied to a 160 million-share warrant, deepening its push to diversify beyond Nvidia and expand data center capacity.
Final 4 days to save up to $680 on your TechCrunch Disrupt 2026 pass
Just 4 days left before savings of up to $680 on your TechCrunch Disrupt 2026 pass end on February 27 at 11:59 p.m. PT. Register to save at one of the most anticipated tech events of the year.
QueryPlot: Generating Geological Evidence Layers using Natural Language Queries for Mineral Exploration
arXiv:2602.17784v1 Announce Type: cross Abstract: Mineral prospectivity mapping requires synthesizing heterogeneous geological knowledge, including textual deposit models and geospatial datasets, to identify regions likely to host specific mineral deposit types. This process is traditionally manual and knowledge-intensive. We present QueryPlot,...
Mind the Style: Impact of Communication Style on Human-Chatbot Interaction
arXiv:2602.17850v1 Announce Type: cross Abstract: Conversational agents increasingly mediate everyday digital interactions, yet the effects of their communication style on user experience and task success remain unclear. Addressing this gap, we describe the results of a between-subject user study where...
Financial time series augmentation using transformer based GAN architecture
arXiv:2602.17865v1 Announce Type: cross Abstract: Time-series forecasting is a critical task across many domains, from engineering to economics, where accurate predictions drive strategic decisions. However, applying advanced deep learning models in challenging, volatile domains like finance is difficult due to...
Games That Teach, Chats That Convince: Comparing Interactive and Static Formats for Persuasive Learning
arXiv:2602.17905v1 Announce Type: cross Abstract: Interactive systems such as chatbots and games are increasingly used to persuade and educate on sustainability-related topics, yet it remains unclear how different delivery formats shape learning and persuasive outcomes when content is held constant....
From Lossy to Verified: A Provenance-Aware Tiered Memory for Agents
arXiv:2602.17913v1 Announce Type: cross Abstract: Long-horizon agents often compress interaction histories into write-time summaries. This creates a fundamental write-before-query barrier: compression decisions are made before the system knows what a future query will hinge on. As a result, summaries can...
MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance
arXiv:2602.17930v1 Announce Type: cross Abstract: Reinforcement learning (RL) agents often suffer from high sample complexity in sparse or delayed reward settings due to limited prior structure. Large language models (LLMs) can provide subgoal decompositions, plausible trajectories, and abstract priors that...
On the scaling relationship between cloze probabilities and language model next-token prediction
arXiv:2602.17848v1 Announce Type: new Abstract: Recent work has shown that larger language models have better predictive power for eye movement and reading time data. While even the best models under-allocate probability mass to human responses, larger models assign higher-quality estimates...
Decomposing Retrieval Failures in RAG for Long-Document Financial Question Answering
arXiv:2602.17981v1 Announce Type: new Abstract: Retrieval-augmented generation is increasingly used for financial question answering over long regulatory filings, yet reliability depends on retrieving the exact context needed to justify answers in high stakes settings. We study a frequent failure mode...
Agentic Adversarial QA for Improving Domain-Specific LLMs
arXiv:2602.18137v1 Announce Type: new Abstract: Large Language Models (LLMs), despite extensive pretraining on broad internet corpora, often struggle to adapt effectively to specialized domains. There is growing interest in fine-tuning these models for such domains; however, progress is constrained by...
Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning
arXiv:2602.18232v1 Announce Type: new Abstract: Recent work on test-time scaling for large language model (LLM) reasoning typically assumes that allocating more inference-time computation uniformly improves correctness. However, prior studies show that reasoning uncertainty is highly localized: a small subset of...
PsihoRo: Depression and Anxiety Romanian Text Corpus
arXiv:2602.18324v1 Announce Type: new Abstract: Psychological corpora in NLP are collections of texts used to analyze human psychology, emotions, and mental health. These texts allow researchers to study psychological constructs, detect mental health issues and analyze emotional language. However, mental...
Tethered Reasoning: Decoupling Entropy from Hallucination in Quantized LLMs via Manifold Steering
arXiv:2602.17691v1 Announce Type: cross Abstract: Quantized language models face a fundamental dilemma: low sampling temperatures yield repetitive, mode-collapsed outputs, while high temperatures (T > 2.0) cause trajectory divergence and semantic incoherence. We present HELIX, a geometric framework that decouples output...
Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory
arXiv:2602.18297v1 Announce Type: cross Abstract: Chain-of-thought (CoT) monitors are LLM-based systems that analyze reasoning traces to detect when outputs may exhibit attributes of interest, such as test-hacking behavior during code generation. In this paper, we use information-theoretic analysis to show...
On the Semantic and Syntactic Information Encoded in Proto-Tokens for One-Step Text Reconstruction
arXiv:2602.18301v1 Announce Type: cross Abstract: Autoregressive large language models (LLMs) generate text token-by-token, requiring n forward passes to produce a sequence of length n. Recent work, Exploring the Latent Capacity of LLMs for One-Step Text Reconstruction (Mezentsev and Oseledets), shows...
Joint Parameter and State-Space Bayesian Optimization: Using Process Expertise to Accelerate Manufacturing Optimization
arXiv:2602.17679v1 Announce Type: new Abstract: Bayesian optimization (BO) is a powerful method for optimizing black-box manufacturing processes, but its performance is often limited when dealing with high-dimensional multi-stage systems, where we can observe intermediate outputs. Standard BO models the process...