VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study
arXiv:2602.16833v1 Announce Type: new Abstract: Exploration remains a key bottleneck for reinforcement learning (RL) post-training of large language models (LLMs), where sparse feedback and large action spaces can lead to premature collapse into repetitive behaviors. We propose Verbalized Action Masking...
A Residual-Aware Theory of Position Bias in Transformers
arXiv:2602.16837v1 Announce Type: new Abstract: Transformer models systematically favor certain token positions, yet the architectural origins of this position bias remain poorly understood. Under causal masking at infinite depth, prior theoretical analyses of attention rollout predict an inevitable collapse of...
What is the Value of Censored Data? An Exact Analysis for the Data-driven Newsvendor
arXiv:2602.16842v1 Announce Type: new Abstract: We study the offline data-driven newsvendor problem with censored demand data. In contrast to prior works where demand is fully observed, we consider the setting where demand is censored at the inventory level and only...
Construction of a classification model for dementia among Brazilian adults aged 50 and over
arXiv:2602.16887v1 Announce Type: new Abstract: To build a dementia classification model for middle-aged and elderly Brazilians, implemented in Python, combining variable selection and multivariable analysis, using low-cost variables with modification potential. Observational study with a predictive modeling approach using a...
Exact Certification of Data-Poisoning Attacks Using Mixed-Integer Programming
arXiv:2602.16944v1 Announce Type: new Abstract: This work introduces a verification framework that provides both sound and complete guarantees for data poisoning attacks during neural network training. We formulate adversarial data manipulation, model training, and test-time evaluation in a single mixed-integer...
Beyond Message Passing: A Symbolic Alternative for Expressive and Interpretable Graph Learning
arXiv:2602.16947v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) have become essential in high-stakes domains such as drug discovery, yet their black-box nature remains a significant barrier to trustworthiness. While self-explainable GNNs attempt to bridge this gap, they often rely...
Multi-Agent Lipschitz Bandits
arXiv:2602.16965v1 Announce Type: new Abstract: We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximizes collective reward, with coordination costs...
A Unified Framework for Locality in Scalable MARL
arXiv:2602.16966v1 Announce Type: new Abstract: Scalable Multi-Agent Reinforcement Learning (MARL) is fundamentally challenged by the curse of dimensionality. A common solution is to exploit locality, which hinges on an Exponential Decay Property (EDP) of the value function. However, existing conditions...
Discovering Universal Activation Directions for PII Leakage in Language Models
arXiv:2602.16980v1 Announce Type: new Abstract: Modern language models exhibit rich internal structure, yet little is known about how privacy-sensitive behaviors, such as personally identifiable information (PII) leakage, are represented and modulated within their hidden states. We present UniLeak, a mechanistic-interpretability...
Action-Graph Policies: Learning Action Co-dependencies in Multi-Agent Reinforcement Learning
arXiv:2602.17009v1 Announce Type: new Abstract: Coordinating actions is the most fundamental form of cooperation in multi-agent reinforcement learning (MARL). Successful decentralized decision-making often depends not only on good individual actions, but on selecting compatible actions across agents to synchronize behavior,...
Malliavin Calculus as Stochastic Backpropogation
arXiv:2602.17013v1 Announce Type: new Abstract: We establish a rigorous connection between pathwise (reparameterization) and score-function (Malliavin) gradient estimators by showing that both arise from the Malliavin integration-by-parts identity. Building on this equivalence, we introduce a unified and variance-aware hybrid estimator...
WS-GRPO: Weakly-Supervised Group-Relative Policy Optimization for Rollout-Efficient Reasoning
arXiv:2602.17025v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) is effective for training language models on complex reasoning. However, since the objective is defined relative to a group of sampled trajectories, extended deliberation can create more chances to realize...
Forecasting Anomaly Precursors via Uncertainty-Aware Time-Series Ensembles
arXiv:2602.17028v1 Announce Type: new Abstract: Detecting anomalies in time-series data is critical in domains such as industrial operations, finance, and cybersecurity, where early identification of abnormal patterns is essential for ensuring system reliability and enabling preventive maintenance. However, most existing...
MeGU: Machine-Guided Unlearning with Target Feature Disentanglement
arXiv:2602.17088v1 Announce Type: new Abstract: The growing concern over training data privacy has elevated the "Right to be Forgotten" into a critical requirement, thereby raising the demand for effective Machine Unlearning. However, existing unlearning approaches commonly suffer from a fundamental...
FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
arXiv:2602.17095v1 Announce Type: new Abstract: Parameter-efficient fine-tuning techniques such as low-rank adaptation (LoRA) enable large language models (LLMs) to adapt to downstream tasks efficiently. Federated learning (FL) further facilitates this process by enabling collaborative fine-tuning across distributed clients without sharing...
FCC asks stations for "pro-America" programming, like daily Pledge of Allegiance
Brendan Carr wants "patriotic" shows for Trump's yearlong America 250 celebration.
Wikipedia blacklists Archive.today, starts removing 695,000 archive links
If DDoSing a blog wasn't bad enough, archive site also tampered with web snapshots.
Supreme Court blocks Trump's emergency tariffs, billions in refunds may be owed
Economists estimated more than $175 billion may need to be refunded.
UAE’s G42 teams up with Cerebras to deploy 8 exaflops of compute in India
Abu Dhabi-based tech company G42 has partnered with U.S.-based chipmaker Cerebras to deploy 8 exaflops of compute through a new system in India.
KD4MT: A Survey of Knowledge Distillation for Machine Translation
arXiv:2602.15845v1 Announce Type: new Abstract: Knowledge Distillation (KD) as a research area has gained a lot of traction in recent years as a compression tool to address challenges related to ever-larger models in NLP. Remarkably, Machine Translation (MT) offers a...
Gated Tree Cross-attention for Checkpoint-Compatible Syntax Injection in Decoder-Only LLMs
arXiv:2602.15846v1 Announce Type: new Abstract: Decoder-only large language models achieve strong broad performance but are brittle to minor grammatical perturbations, undermining reliability for downstream reasoning. However, directly injecting explicit syntactic structure into an existing checkpoint can interfere with its pretrained...
Multi-source Heterogeneous Public Opinion Analysis via Collaborative Reasoning and Adaptive Fusion: A Systematically Integrated Approach
arXiv:2602.15857v1 Announce Type: new Abstract: The analysis of public opinion from multiple heterogeneous sources presents significant challenges due to structural differences, semantic variations, and platform-specific biases. This paper introduces a novel Collaborative Reasoning and Adaptive Fusion (CRAF) framework that systematically...
From Transcripts to AI Agents: Knowledge Extraction, RAG Integration, and Robust Evaluation of Conversational AI Assistants
arXiv:2602.15859v1 Announce Type: new Abstract: Building reliable conversational AI assistants for customer-facing industries remains challenging due to noisy conversational data, fragmented knowledge, and the requirement for accurate human hand-off - particularly in domains that depend heavily on real-time information. This...
CheckIfExist: Detecting Citation Hallucinations in the Era of AI-Generated Content
arXiv:2602.15871v1 Announce Type: new Abstract: The proliferation of large language models (LLMs) in academic workflows has introduced unprecedented challenges to bibliographic integrity, particularly through reference hallucination -- the generation of plausible but non-existent citations. Recent investigations have documented the presence...
P-RAG: Prompt-Enhanced Parametric RAG with LoRA and Selective CoT for Biomedical and Multi-Hop QA
arXiv:2602.15874v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate remarkable capabilities but remain limited by their reliance on static training data. Retrieval-Augmented Generation (RAG) addresses this constraint by retrieving external knowledge during inference, though it still depends heavily on...
Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens
arXiv:2602.15896v1 Announce Type: new Abstract: Multi-modal knowledge graph reasoning (MMKGR) aims to predict the missing links by exploiting both graph structure information and multi-modal entity contents. Most existing works are designed for a transductive setting, which learns dataset-specific embeddings and...
MultiCube-RAG for Multi-hop Question Answering
arXiv:2602.15898v1 Announce Type: new Abstract: Multi-hop question answering (QA) necessitates multi-step reasoning and retrieval across interconnected subjects, attributes, and relations. Existing retrieval-augmented generation (RAG) methods struggle to capture these structural semantics accurately, resulting in suboptimal performance. Graph-based RAGs structure such...
A Curious Class of Adpositional Multiword Expressions in Korean
arXiv:2602.16023v1 Announce Type: new Abstract: Multiword expressions (MWEs) have been widely studied in cross-lingual annotation frameworks such as PARSEME. However, Korean MWEs remain underrepresented in these efforts. In particular, Korean multiword adpositions lack systematic analysis, annotated resources, and integration into...
Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs
arXiv:2602.16085v1 Announce Type: new Abstract: Research on mental state reasoning in language models (LMs) has the potential to inform theories of human social cognition--such as the theory that mental state reasoning emerges in part from language exposure--and our understanding of...
Missing-by-Design: Certifiable Modality Deletion for Revocable Multimodal Sentiment Analysis
arXiv:2602.16144v1 Announce Type: new Abstract: As multimodal systems increasingly process sensitive personal data, the ability to selectively revoke specific data modalities has become a critical requirement for privacy compliance and user autonomy. We present Missing-by-Design (MBD), a unified framework for...