PerSoMed: A Large-Scale Balanced Dataset for Persian Social Media Text Classification
arXiv:2602.19333v1 Announce Type: new Abstract: This research introduces the first large-scale, well-balanced Persian social media text classification dataset, specifically designed to address the lack of comprehensive resources in this domain. The dataset comprises 36,000 posts across nine categories (Economic, Artistic,...
How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1
arXiv:2602.19526v1 Announce Type: new Abstract: Deep Research agents tackle knowledge-intensive tasks through multi-round retrieval and decision-oriented generation. While reinforcement learning (RL) has been shown to improve performance in this paradigm, its contributions remain underexplored. To fully understand the role of...
Temporal-Aware Heterogeneous Graph Reasoning with Multi-View Fusion for Temporal Question Answering
arXiv:2602.19569v1 Announce Type: new Abstract: Question Answering over Temporal Knowledge Graphs (TKGQA) has attracted growing interest for handling time-sensitive queries. However, existing methods still struggle with: 1) weak incorporation of temporal constraints in question representation, causing biased reasoning; 2) limited...
Anatomy of Unlearning: The Dual Impact of Fact Salience and Model Fine-Tuning
arXiv:2602.19612v1 Announce Type: new Abstract: Machine Unlearning (MU) enables Large Language Models (LLMs) to remove unsafe or outdated information. However, existing work assumes that all facts are equally forgettable and largely ignores whether the forgotten knowledge originates from pretraining or...
KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge
arXiv:2602.19643v1 Announce Type: new Abstract: Large Language Models (LLMs) possess a remarkable capacity to generate persuasive and intelligible language. However, coherence does not equate to truthfulness, as the responses often contain subtle hallucinations. Existing benchmarks are limited by static and...
Physiologically Informed Deep Learning: A Multi-Scale Framework for Next-Generation PBPK Modeling
arXiv:2602.18472v1 Announce Type: new Abstract: Physiologically Based Pharmacokinetic (PBPK) modeling is a cornerstone of model-informed drug development (MIDD), providing a mechanistic framework to predict drug absorption, distribution, metabolism, and excretion (ADME). Despite its utility, adoption is hindered by high computational...
Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning
arXiv:2602.18493v1 Announce Type: new Abstract: Long-context LLMs and Retrieval-Augmented Generation (RAG) systems process information passively, deferring state tracking, contradiction resolution, and evidence aggregation to query time, which becomes brittle under ultra long streams with frequent updates. We propose the Unified...
Weak-Form Evolutionary Kolmogorov-Arnold Networks for Solving Partial Differential Equations
arXiv:2602.18515v1 Announce Type: new Abstract: Partial differential equations (PDEs) form a central component of scientific computing. Among recent advances in deep learning, evolutionary neural networks have been developed to successively capture the temporal dynamics of time-dependent PDEs via parameter evolution....
Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data
arXiv:2602.18519v1 Announce Type: new Abstract: Traditional approaches to measuring visual exploratory behavior in soccer rely on counting visual exploratory actions (VEAs) based on rapid head movements exceeding 125{\deg}/s, but this method suffer from player position bias (i.e., a focus on...
GIST: Targeted Data Selection for Instruction Tuning via Coupled Optimization Geometry
arXiv:2602.18584v1 Announce Type: new Abstract: Targeted data selection has emerged as a crucial paradigm for efficient instruction tuning, aiming to identify a small yet influential subset of training examples for a specific target task. In practice, influence is often measured...
Diagnosing LLM Reranker Behavior Under Fixed Evidence Pools
arXiv:2602.18613v1 Announce Type: new Abstract: Standard reranking evaluations study how a reranker orders candidates returned by an upstream retriever. This setup couples ranking behavior with retrieval quality, so differences in output cannot be attributed to the ranking policy alone. We...
Global Low-Rank, Local Full-Rank: The Holographic Encoding of Learned Algorithms
arXiv:2602.18649v1 Announce Type: new Abstract: Grokking -- the abrupt transition from memorization to generalization after extended training -- has been linked to the emergence of low-dimensional structure in learning dynamics. Yet neural network parameters inhabit extremely high-dimensional spaces. How can...
Communication-Efficient Personalized Adaptation via Federated-Local Model Merging
arXiv:2602.18658v1 Announce Type: new Abstract: Parameter-efficient fine-tuning methods, such as LoRA, offer a practical way to adapt large vision and language models to client tasks. However, this becomes particularly challenging under task-level heterogeneity in federated deployments. In this regime, personalization...
Transformers for dynamical systems learn transfer operators in-context
arXiv:2602.18679v1 Announce Type: new Abstract: Large-scale foundation models for scientific machine learning adapt to physical settings unseen during training, such as zero-shot transfer between turbulent scales. This phenomenon, in-context learning, challenges conventional understanding of learning and adaptation in physical systems....
In-Context Planning with Latent Temporal Abstractions
arXiv:2602.18694v1 Announce Type: new Abstract: Planning-based reinforcement learning for continuous control is bottlenecked by two practical issues: planning at primitive time scales leads to prohibitive branching and long horizons, while real environments are frequently partially observable and exhibit regime shifts...
Phase-Consistent Magnetic Spectral Learning for Multi-View Clustering
arXiv:2602.18728v1 Announce Type: new Abstract: Unsupervised multi-view clustering (MVC) aims to partition data into meaningful groups by leveraging complementary information from multiple views without labels, yet a central challenge is to obtain a reliable shared structural signal to guide representation...
GLaDiGAtor: Language-Model-Augmented Multi-Relation Graph Learning for Predicting Disease-Gene Associations
arXiv:2602.18769v1 Announce Type: new Abstract: Understanding disease-gene associations is essential for unravelling disease mechanisms and advancing diagnostics and therapeutics. Traditional approaches based on manual curation and literature review are labour-intensive and not scalable, prompting the use of machine learning on...
CaliCausalRank: Calibrated Multi-Objective Ad Ranking with Robust Counterfactual Utility Optimization
arXiv:2602.18786v1 Announce Type: new Abstract: Ad ranking systems must simultaneously optimize multiple objectives including click-through rate (CTR), conversion rate (CVR), revenue, and user experience metrics. However, production systems face critical challenges: score scale inconsistency across traffic segments undermines threshold transferability,...
From Few-Shot to Zero-Shot: Towards Generalist Graph Anomaly Detection
arXiv:2602.18793v1 Announce Type: new Abstract: Graph anomaly detection (GAD) is critical for identifying abnormal nodes in graph-structured data from diverse domains, including cybersecurity and social networks. The existing GAD methods often focus on the learning paradigms of "one-model-for-one-dataset", requiring dataset-specific...
Bayesian Lottery Ticket Hypothesis
arXiv:2602.18825v1 Announce Type: new Abstract: Bayesian neural networks (BNNs) are a useful tool for uncertainty quantification, but require substantially more computational resources than conventional neural networks. For non-Bayesian networks, the Lottery Ticket Hypothesis (LTH) posits the existence of sparse subnetworks...
Exact Attention Sensitivity and the Geometry of Transformer Stability
arXiv:2602.18849v1 Announce Type: new Abstract: Despite powering modern AI, transformers remain mysteriously brittle to train. We develop a stability theory that explains why pre-LayerNorm works, why DeepNorm uses $N^{-1/4}$ scaling, and why warmup is necessary, all from first principles. Our...
Rank-Aware Spectral Bounds on Attention Logits for Stable Low-Precision Training
arXiv:2602.18851v1 Announce Type: new Abstract: Attention scores in transformers are bilinear forms $S_{ij} = x_i^\top M x_j / \sqrt{d_h}$ whose maximum magnitude governs overflow risk in low-precision training. We derive a \emph{rank-aware concentration inequality}: when the interaction matrix $M =...
Issues with Measuring Task Complexity via Random Policies in Robotic Tasks
arXiv:2602.18856v1 Announce Type: new Abstract: Reinforcement learning (RL) has enabled major advances in fields such as robotics and natural language processing. A key challenge in RL is measuring task complexity, which is essential for creating meaningful benchmarks and designing effective...
PCA-VAE: Differentiable Subspace Quantization without Codebook Collapse
arXiv:2602.18904v1 Announce Type: new Abstract: Vector-quantized autoencoders deliver high-fidelity latents but suffer inherent flaws: the quantizer is non-differentiable, requires straight-through hacks, and is prone to collapse. We address these issues at the root by replacing VQ with a simple, principled,...
Oral argument live blog for Monday, March 2
On Monday, March 2, we will be live blogging as the court hears argument in United States v. Hemani, on whether a federal statute that prohibits gun possession by users […]The postOral argument live blog for Monday, March 2appeared first...
SCOTUStoday for Tuesday, February 24
On this day in 1803, the Supreme Court released its ruling in Marbury v. Madison, which established the principle of judicial review (or did it?). Mark the anniversary with us […]The postSCOTUStoday for Tuesday, February 24appeared first onSCOTUSblog.
Chill
Introduction No concept is more pervasive in the law of freedom of speech than chill.[1] The chilled speech doctrine guards against self-censorship: it permits First Amendment challenges based on the allegation that a law deters the plaintiff or others from...
Nvidia challenger AI chip startup MatX raised $500M
The startup was founded by former Google TPU engineers in 2023.
Meta strikes up to $100B AMD chip deal as it chases ‘personal superintelligence’
Meta is buying billions of dollars in AMD AI chips in a multiyear deal tied to a 160 million-share warrant, deepening its push to diversify beyond Nvidia and expand data center capacity.
Final 4 days to save up to $680 on your TechCrunch Disrupt 2026 pass
Just 4 days left before savings of up to $680 on your TechCrunch Disrupt 2026 pass end on February 27 at 11:59 p.m. PT. Register to save at one of the most anticipated tech events of the year.