Wikipedia blacklists Archive.today, starts removing 695,000 archive links
If DDoSing a blog wasn't bad enough, archive site also tampered with web snapshots.
KD4MT: A Survey of Knowledge Distillation for Machine Translation
arXiv:2602.15845v1 Announce Type: new Abstract: Knowledge Distillation (KD) as a research area has gained a lot of traction in recent years as a compression tool to address challenges related to ever-larger models in NLP. Remarkably, Machine Translation (MT) offers a...
Gated Tree Cross-attention for Checkpoint-Compatible Syntax Injection in Decoder-Only LLMs
arXiv:2602.15846v1 Announce Type: new Abstract: Decoder-only large language models achieve strong broad performance but are brittle to minor grammatical perturbations, undermining reliability for downstream reasoning. However, directly injecting explicit syntactic structure into an existing checkpoint can interfere with its pretrained...
Multi-source Heterogeneous Public Opinion Analysis via Collaborative Reasoning and Adaptive Fusion: A Systematically Integrated Approach
arXiv:2602.15857v1 Announce Type: new Abstract: The analysis of public opinion from multiple heterogeneous sources presents significant challenges due to structural differences, semantic variations, and platform-specific biases. This paper introduces a novel Collaborative Reasoning and Adaptive Fusion (CRAF) framework that systematically...
P-RAG: Prompt-Enhanced Parametric RAG with LoRA and Selective CoT for Biomedical and Multi-Hop QA
arXiv:2602.15874v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate remarkable capabilities but remain limited by their reliance on static training data. Retrieval-Augmented Generation (RAG) addresses this constraint by retrieving external knowledge during inference, though it still depends heavily on...
Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens
arXiv:2602.15896v1 Announce Type: new Abstract: Multi-modal knowledge graph reasoning (MMKGR) aims to predict the missing links by exploiting both graph structure information and multi-modal entity contents. Most existing works are designed for a transductive setting, which learns dataset-specific embeddings and...
MultiCube-RAG for Multi-hop Question Answering
arXiv:2602.15898v1 Announce Type: new Abstract: Multi-hop question answering (QA) necessitates multi-step reasoning and retrieval across interconnected subjects, attributes, and relations. Existing retrieval-augmented generation (RAG) methods struggle to capture these structural semantics accurately, resulting in suboptimal performance. Graph-based RAGs structure such...
Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs
arXiv:2602.16085v1 Announce Type: new Abstract: Research on mental state reasoning in language models (LMs) has the potential to inform theories of human social cognition--such as the theory that mental state reasoning emerges in part from language exposure--and our understanding of...
Missing-by-Design: Certifiable Modality Deletion for Revocable Multimodal Sentiment Analysis
arXiv:2602.16144v1 Announce Type: new Abstract: As multimodal systems increasingly process sensitive personal data, the ability to selectively revoke specific data modalities has become a critical requirement for privacy compliance and user autonomy. We present Missing-by-Design (MBD), a unified framework for...
Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution
arXiv:2602.16154v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning sometimes fails to faithfully reflect the true computation of a large language model (LLM), hampering its utility in explaining how LLMs arrive at their answers. Moreover, optimizing for faithfulness and interpretability in...
Beyond Learning: A Training-Free Alternative to Model Adaptation
arXiv:2602.16189v1 Announce Type: new Abstract: Despite the continuous research and evolution of language models, they sometimes underperform previous versions. Existing approaches to overcome these challenges are resource-intensive, highlighting the need for alternatives that enable immediate action. We assume that each...
Aladdin-FTI @ AMIYA Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation
arXiv:2602.16290v1 Announce Type: new Abstract: Arabic dialects have long been under-represented in Natural Language Processing (NLP) research due to their non-standardization and high variability, which pose challenges for computational modeling. Recent advances in the field, such as Large Language Models...
Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents
arXiv:2602.16346v1 Announce Type: new Abstract: LLM-based agents execute real-world workflows via tools and memory. These affordances enable ill-intended adversaries to also use these agents to carry out complex misuse scenarios. Existing agent misuse benchmarks largely test single-prompt instructions, leaving a...
Verifier-Constrained Flow Expansion for Discovery Beyond the Data
arXiv:2602.15984v1 Announce Type: new Abstract: Flow and diffusion models are typically pre-trained on limited available data (e.g., molecular samples), covering only a fraction of the valid design space (e.g., the full molecular space). As a consequence, they tend to generate...
MoE-Spec: Expert Budgeting for Efficient Speculative Decoding
arXiv:2602.16052v1 Announce Type: new Abstract: Speculative decoding accelerates Large Language Model (LLM) inference by verifying multiple drafted tokens in parallel. However, for Mixture-of-Experts (MoE) models, this parallelism introduces a severe bottleneck: large draft trees activate many unique experts, significantly increasing...
Differentially Private Non-convex Distributionally Robust Optimization
arXiv:2602.16155v1 Announce Type: new Abstract: Real-world deployments routinely face distribution shifts, group imbalances, and adversarial perturbations, under which the traditional Empirical Risk Minimization (ERM) framework can degrade severely. Distributionally Robust Optimization (DRO) addresses this issue by optimizing the worst-case expected...
Deep TPC: Temporal-Prior Conditioning for Time Series Forecasting
arXiv:2602.16188v1 Announce Type: new Abstract: LLM-for-time series (TS) methods typically treat time shallowly, injecting positional or prompt-based cues once at the input of a largely frozen decoder, which limits temporal reasoning as this information degrades through the layers. We introduce...
Bayesian Quadrature: Gaussian Processes for Integration
arXiv:2602.16218v1 Announce Type: new Abstract: Bayesian quadrature is a probabilistic, model-based approach to numerical integration, the estimation of intractable integrals, or expectations. Although Bayesian quadrature was popularised already in the 1980s, no systematic and comprehensive treatment has been published. The...
SEMixer: Semantics Enhanced MLP-Mixer for Multiscale Mixing and Long-term Time Series Forecasting
arXiv:2602.16220v1 Announce Type: new Abstract: Modeling multiscale patterns is crucial for long-term time series forecasting (TSF). However, redundancy and noise in time series, together with semantic gaps between non-adjacent scales, make the efficient alignment and integration of multi-scale temporal dependencies...
Factored Latent Action World Models
arXiv:2602.16229v1 Announce Type: new Abstract: Learning latent actions from action-free video has emerged as a powerful paradigm for scaling up controllable world model learning. Latent actions provide a natural interface for users to iteratively generate and manipulate videos. However, most...
OpenAI deepens India push with Pine Labs fintech partnership
OpenAI moves beyond ChatGPT in India with a Pine Labs deal targeting enterprise payments and AI-driven commerce.
How to Train Your Long-Context Visual Document Model
arXiv:2602.15257v1 Announce Type: cross Abstract: We present the first comprehensive, large-scale study of training long-context vision language models up to 344K context, targeting long-document visual question answering with measured transfer to long-context text. While several such strong are open-weight, namely...
FrameRef: A Framing Dataset and Simulation Testbed for Modeling Bounded Rational Information Health
arXiv:2602.15273v1 Announce Type: cross Abstract: Information ecosystems increasingly shape how people internalize exposure to adverse digital experiences, raising concerns about the long-term consequences for information health. In modern search and recommendation systems, ranking and personalization policies play a central role...
The Information Geometry of Softmax: Probing and Steering
arXiv:2602.15293v1 Announce Type: cross Abstract: This paper concerns the question of how AI systems encode semantic structure into the geometric structure of their representation spaces. The motivating observation of this paper is that the natural geometry of these representation spaces...
Near-Optimal Sample Complexity for Online Constrained MDPs
arXiv:2602.15076v1 Announce Type: new Abstract: Safety is a fundamental challenge in reinforcement learning (RL), particularly in real-world applications such as autonomous driving, robotics, and healthcare. To address this, Constrained Markov Decision Processes (CMDPs) are commonly used to enforce safety constraints...
Hybrid Feature Learning with Time Series Embeddings for Equipment Anomaly Prediction
arXiv:2602.15089v1 Announce Type: new Abstract: In predictive maintenance of equipment, deep learning-based time series anomaly detection has garnered significant attention; however, pure deep learning approaches often fail to achieve sufficient accuracy on real-world data. This study proposes a hybrid approach...
Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding
arXiv:2602.15159v1 Announce Type: new Abstract: Learning from electronic health records (EHRs) time series is challenging due to irregular sam- pling, heterogeneous missingness, and the resulting sparsity of observations. Prior self-supervised meth- ods either impute before learning, represent missingness through a...
MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference
arXiv:2602.15206v1 Announce Type: new Abstract: Reward learning typically relies on a single feedback type or combines multiple feedback types using manually weighted loss terms. Currently, it remains unclear how to jointly learn reward functions from heterogeneous feedback types such as...
BindCLIP: A Unified Contrastive-Generative Representation Learning Framework for Virtual Screening
arXiv:2602.15236v1 Announce Type: new Abstract: Virtual screening aims to efficiently identify active ligands from massive chemical libraries for a given target pocket. Recent CLIP-style models such as DrugCLIP enable scalable virtual screening by embedding pockets and ligands into a shared...
Closing the Distribution Gap in Adversarial Training for LLMs
arXiv:2602.15238v1 Announce Type: new Abstract: Adversarial training for LLMs is one of the most promising methods to reliably improve robustness against adversaries. However, despite significant progress, models remain vulnerable to simple in-distribution exploits, such as rewriting prompts in the past...