Discovering Universal Activation Directions for PII Leakage in Language Models
arXiv:2602.16980v1 Announce Type: new Abstract: Modern language models exhibit rich internal structure, yet little is known about how privacy-sensitive behaviors, such as personally identifiable information (PII) leakage, are represented and modulated within their hidden states. We present UniLeak, a mechanistic-interpretability...
MeGU: Machine-Guided Unlearning with Target Feature Disentanglement
arXiv:2602.17088v1 Announce Type: new Abstract: The growing concern over training data privacy has elevated the "Right to be Forgotten" into a critical requirement, thereby raising the demand for effective Machine Unlearning. However, existing unlearning approaches commonly suffer from a fundamental...
Gated Tree Cross-attention for Checkpoint-Compatible Syntax Injection in Decoder-Only LLMs
arXiv:2602.15846v1 Announce Type: new Abstract: Decoder-only large language models achieve strong broad performance but are brittle to minor grammatical perturbations, undermining reliability for downstream reasoning. However, directly injecting explicit syntactic structure into an existing checkpoint can interfere with its pretrained...
Multi-source Heterogeneous Public Opinion Analysis via Collaborative Reasoning and Adaptive Fusion: A Systematically Integrated Approach
arXiv:2602.15857v1 Announce Type: new Abstract: The analysis of public opinion from multiple heterogeneous sources presents significant challenges due to structural differences, semantic variations, and platform-specific biases. This paper introduces a novel Collaborative Reasoning and Adaptive Fusion (CRAF) framework that systematically...
Understanding LLM Failures: A Multi-Tape Turing Machine Analysis of Systematic Errors in Language Model Reasoning
arXiv:2602.15868v1 Announce Type: new Abstract: Large language models (LLMs) exhibit failure modes on seemingly trivial tasks. We propose a formalisation of LLM interaction using a deterministic multi-tape Turing machine, where each tape represents a distinct component: input characters, tokens, vocabulary,...
Surgical Activation Steering via Generative Causal Mediation
arXiv:2602.16080v1 Announce Type: new Abstract: Where should we intervene in a language model (LM) to control behaviors that are diffused across many tokens of a long-form response? We introduce Generative Causal Mediation (GCM), a procedure for selecting model components, e.g.,...
Beyond Learning: A Training-Free Alternative to Model Adaptation
arXiv:2602.16189v1 Announce Type: new Abstract: Despite the continuous research and evolution of language models, they sometimes underperform previous versions. Existing approaches to overcome these challenges are resource-intensive, highlighting the need for alternatives that enable immediate action. We assume that each...
Long-Tail Knowledge in Large Language Models: Taxonomy, Mechanisms, Interventions and Implications
arXiv:2602.16201v1 Announce Type: new Abstract: Large language models (LLMs) are trained on web-scale corpora that exhibit steep power-law distributions, in which the distribution of knowledge is highly long-tailed, with most appearing infrequently. While scaling has improved average-case performance, persistent failures...
MoE-Spec: Expert Budgeting for Efficient Speculative Decoding
arXiv:2602.16052v1 Announce Type: new Abstract: Speculative decoding accelerates Large Language Model (LLM) inference by verifying multiple drafted tokens in parallel. However, for Mixture-of-Experts (MoE) models, this parallelism introduces a severe bottleneck: large draft trees activate many unique experts, significantly increasing...
Extracting and Analyzing Rail Crossing Behavior Signatures from Videos using Tensor Methods
arXiv:2602.16057v1 Announce Type: new Abstract: Railway crossings present complex safety challenges where driver behavior varies by location, time, and conditions. Traditional approaches analyze crossings individually, limiting the ability to identify shared behavioral patterns across locations. We propose a multi-view tensor...
Differentially Private Non-convex Distributionally Robust Optimization
arXiv:2602.16155v1 Announce Type: new Abstract: Real-world deployments routinely face distribution shifts, group imbalances, and adversarial perturbations, under which the traditional Empirical Risk Minimization (ERM) framework can degrade severely. Distributionally Robust Optimization (DRO) addresses this issue by optimizing the worst-case expected...
Deep TPC: Temporal-Prior Conditioning for Time Series Forecasting
arXiv:2602.16188v1 Announce Type: new Abstract: LLM-for-time series (TS) methods typically treat time shallowly, injecting positional or prompt-based cues once at the input of a largely frozen decoder, which limits temporal reasoning as this information degrades through the layers. We introduce...
Bayesian Quadrature: Gaussian Processes for Integration
arXiv:2602.16218v1 Announce Type: new Abstract: Bayesian quadrature is a probabilistic, model-based approach to numerical integration, the estimation of intractable integrals, or expectations. Although Bayesian quadrature was popularised already in the 1980s, no systematic and comprehensive treatment has been published. The...
Nvidia deepens early-stage push into India’s AI startup ecosystem
Nvidia is working with investors, nonprofits, and venture firms to build earlier ties with India's fast-growing AI founder ecosystem.
Beyond Binary Classification: Detecting Fine-Grained Sexism in Social Media Videos
arXiv:2602.15757v1 Announce Type: new Abstract: Online sexism appears in various forms, which makes its detection challenging. Although automated tools can enhance the identification of sexist content, they are often restricted to binary classification. Consequently, more subtle manifestations of sexism may...
The Information Geometry of Softmax: Probing and Steering
arXiv:2602.15293v1 Announce Type: cross Abstract: This paper concerns the question of how AI systems encode semantic structure into the geometric structure of their representation spaces. The motivating observation of this paper is that the natural geometry of these representation spaces...
Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding
arXiv:2602.15159v1 Announce Type: new Abstract: Learning from electronic health records (EHRs) time series is challenging due to irregular sam- pling, heterogeneous missingness, and the resulting sparsity of observations. Prior self-supervised meth- ods either impute before learning, represent missingness through a...
Fast and Effective On-policy Distillation from Reasoning Prefixes
arXiv:2602.15260v1 Announce Type: new Abstract: On-policy distillation (OPD), which samples trajectories from the student model and supervises them with a teacher at the token level, avoids relying solely on verifiable terminal rewards and can yield better generalization than off-policy distillation....
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers
arXiv:2602.15322v1 Announce Type: new Abstract: Training large language models (LLMs) relies almost exclusively on dense adaptive optimizers with increasingly sophisticated preconditioners. We challenge this by showing that randomly masking parameter updates can be highly effective, with a masked variant of...
Directional Reasoning Trajectory Change (DRTC): Identifying Critical Trace Segments in Reasoning Models
arXiv:2602.15332v1 Announce Type: new Abstract: Understanding how language models carry out long-horizon reasoning remains an open challenge. Existing interpretability methods often highlight tokens or spans correlated with an answer, but they rarely reveal where the model makes consequential reasoning turns,...
CDRL: A Reinforcement Learning Framework Inspired by Cerebellar Circuits and Dendritic Computational Strategies
arXiv:2602.15367v1 Announce Type: new Abstract: Reinforcement learning (RL) has achieved notable performance in high-dimensional sequential decision-making tasks, yet remains limited by low sample efficiency, sensitivity to noise, and weak generalization under partial observability. Most existing approaches address these issues primarily...
Logit Distance Bounds Representational Similarity
arXiv:2602.15438v1 Announce Type: new Abstract: For a broad family of discriminative models that includes autoregressive language models, identifiability results imply that if two models induce the same conditional distributions, then their internal representations agree up to an invertible linear transformation....
Certified Per-Instance Unlearning Using Individual Sensitivity Bounds
arXiv:2602.15602v1 Announce Type: new Abstract: Certified machine unlearning can be achieved via noise injection leading to differential privacy guarantees, where noise is calibrated to worst-case sensitivity. Such conservative calibration often results in performance degradation, limiting practical applicability. In this work,...
Join the Largest Global Community in Computing
IEEE Computer Society is the top source for information, inspiration, and collaboration in computer science and engineering, empowering technologist worldwide
Knowing When Not to Answer: Abstention-Aware Scientific Reasoning
arXiv:2602.14189v1 Announce Type: new Abstract: Large language models are increasingly used to answer and verify scientific claims, yet existing evaluations typically assume that a model must always produce a definitive answer. In scientific settings, however, unsupported or uncertain conclusions can...
Detecting LLM Hallucinations via Embedding Cluster Geometry: A Three-Type Taxonomy with Measurable Signatures
arXiv:2602.14259v1 Announce Type: new Abstract: We propose a geometric taxonomy of large language model hallucinations based on observable signatures in token embedding cluster structure. By analyzing the static embedding spaces of 11 transformer models spanning encoder (BERT, RoBERTa, ELECTRA, DeBERTa,...
Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook
arXiv:2602.14299v1 Announce Type: new Abstract: As large language model agents increasingly populate networked environments, a fundamental question arises: do artificial intelligence (AI) agent societies undergo convergence dynamics similar to human social systems? Lately, Moltbook approximates a plausible future scenario in...
Accelerated Discovery of Cryoprotectant Cocktails via Multi-Objective Bayesian Optimization
arXiv:2602.13398v1 Announce Type: new Abstract: Designing cryoprotectant agent (CPA) cocktails for vitrification is challenging because formulations must be concentrated enough to suppress ice formation yet non-toxic enough to preserve cell viability. This tradeoff creates a large, multi-objective design space in...