Protein Language Models Diverge from Natural Language: Comparative Analysis and Improved Inference
arXiv:2602.20449v1 Announce Type: cross Abstract: Modern Protein Language Models (PLMs) apply transformer-based model architectures from natural language processing to biological sequences, predicting a variety of protein functions and properties. However, protein language has key differences from natural language, such as...
GATES: Self-Distillation under Privileged Context with Consensus Gating
arXiv:2602.20574v1 Announce Type: cross Abstract: We study self-distillation in settings where supervision is unreliable: there are no ground truth labels, verifiable rewards, or external graders to evaluate answers. We focus on document-grounded question answering with asymmetric context, where a single...
Multimodal MRI Report Findings Supervised Brain Lesion Segmentation with Substructures
arXiv:2602.20994v1 Announce Type: cross Abstract: Report-supervised (RSuper) learning seeks to alleviate the need for dense tumor voxel labels with constraints derived from radiology reports (e.g., volumes, counts, sizes, locations). In MRI studies of brain tumors, however, we often involve multi-parametric...
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
arXiv:2602.20197v1 Announce Type: new Abstract: Reinforcement Learning with verifiable rewards (RLVR) has emerged as a primary learning paradigm for enhancing the reasoning capabilities of multi-modal large language models (MLLMs). However, during RL training, the enormous state space of MLLM and...
Golden Layers and Where to Find Them: Improved Knowledge Editing for Large Language Models Via Layer Gradient Analysis
arXiv:2602.20207v1 Announce Type: new Abstract: Knowledge editing in Large Language Models (LLMs) aims to update the model's prediction for a specific query to a desired target while preserving its behavior on all other inputs. This process typically involves two stages:...
Model Merging in the Essential Subspace
arXiv:2602.20208v1 Announce Type: new Abstract: Model merging aims to integrate multiple task-specific fine-tuned models derived from a shared pre-trained checkpoint into a single multi-task model without additional training. Despite extensive research, task interference remains a major obstacle that often undermines...
MultiModalPFN: Extending Prior-Data Fitted Networks for Multimodal Tabular Learning
arXiv:2602.20223v1 Announce Type: new Abstract: Recently, TabPFN has gained attention as a foundation model for tabular data. However, it struggles to integrate heterogeneous modalities such as images and text, which are common in domains like healthcare and marketing, thereby limiting...
Uncertainty-Aware Delivery Delay Duration Prediction via Multi-Task Deep Learning
arXiv:2602.20271v1 Announce Type: new Abstract: Accurate delivery delay prediction is critical for maintaining operational efficiency and customer satisfaction across modern supply chains. Yet the increasing complexity of logistics networks, spanning multimodal transportation, cross-country routing, and pronounced regional variability, makes this...
The Truthfulness Spectrum Hypothesis
arXiv:2602.20273v1 Announce Type: new Abstract: Large language models (LLMs) have been reported to linearly encode truthfulness, yet recent work questions this finding's generality. We reconcile these views with the truthfulness spectrum hypothesis: the representational space contains directions ranging from broadly...
Learning to Solve Complex Problems via Dataset Decomposition
arXiv:2602.20296v1 Announce Type: new Abstract: Curriculum learning is a class of training strategies that organizes the data being exposed to a model by difficulty, gradually from simpler to more complex examples. This research explores a reverse curriculum generation approach that...
In-context Pre-trained Time-Series Foundation Models adapt to Unseen Tasks
arXiv:2602.20307v1 Announce Type: new Abstract: Time-series foundation models (TSFMs) have demonstrated strong generalization capabilities across diverse datasets and tasks. However, existing foundation models are typically pre-trained to enhance performance on specific tasks and often struggle to generalize to unseen tasks...
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models
arXiv:2602.20309v1 Announce Type: new Abstract: Vision-language-action (VLA) models unify perception, language, and control for embodied agents but face significant challenges in practical deployment due to rapidly increasing compute and memory demands, especially as models scale to longer horizons and larger...
Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction
arXiv:2602.20344v1 Announce Type: new Abstract: Graph self-supervised learning (GSSL) has demonstrated strong potential for generating expressive graph embeddings without the need for human annotations, making it particularly valuable in domains with high labeling costs such as molecular graph analysis. However,...
cc-Shapley: Measuring Multivariate Feature Importance Needs Causal Context
arXiv:2602.20396v1 Announce Type: new Abstract: Explainable artificial intelligence promises to yield insights into relevant features, thereby enabling humans to examine and scrutinize machine learning models or even facilitating scientific discovery. Considering the widespread technique of Shapley values, we find that...
Oracle-Robust Online Alignment for Large Language Models
arXiv:2602.20457v1 Announce Type: new Abstract: We study online alignment of large language models under misspecified preference feedback, where the observed preference oracle deviates from an ideal but unknown ground-truth oracle. The online LLM alignment problem is a bi-level reinforcement problem...
A Long-Short Flow-Map Perspective for Drifting Models
arXiv:2602.20463v1 Announce Type: new Abstract: This paper provides a reinterpretation of the Drifting Model~\cite{deng2026generative} through a semigroup-consistent long-short flow-map factorization. We show that a global transport process can be decomposed into a long-horizon flow map followed by a short-time terminal...
Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA
arXiv:2602.20492v1 Announce Type: new Abstract: Decentralized federated learning (DFL) based on low-rank adaptation (LoRA) enables mobile devices with multi-task datasets to collaboratively fine-tune a large language model (LLM) by exchanging locally updated parameters with a subset of neighboring devices via...
A Generalized Apprenticeship Learning Framework for Capturing Evolving Student Pedagogical Strategies
arXiv:2602.20527v1 Announce Type: new Abstract: Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) have advanced rapidly in recent years and have been successfully applied to e-learning environments like intelligent tutoring systems (ITSs). Despite great success, the broader application of DRL...
Memory-guided Prototypical Co-occurrence Learning for Mixed Emotion Recognition
arXiv:2602.20530v1 Announce Type: new Abstract: Emotion recognition from multi-modal physiological and behavioral signals plays a pivotal role in affective computing, yet most existing models remain constrained to the prediction of singular emotions in controlled laboratory settings. Real-world human emotional experiences,...
Sample-efficient evidence estimation of score based priors for model selection
arXiv:2602.20549v1 Announce Type: new Abstract: The choice of prior is central to solving ill-posed imaging inverse problems, making it essential to select one consistent with the measurements $y$ to avoid severe bias. In Bayesian inverse problems, this could be achieved...
Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs
arXiv:2602.20567v1 Announce Type: new Abstract: Push-Sum-based decentralized learning enables optimization over directed communication networks, where information exchange may be asymmetric. While convergence properties of such methods are well understood, their finite-iteration stability and generalization behavior remain unclear due to structural...
Benchmarking GNN Models on Molecular Regression Tasks with CKA-Based Representation Analysis
arXiv:2602.20573v1 Announce Type: new Abstract: Molecules are commonly represented as SMILES strings, which can be readily converted to fixed-size molecular fingerprints. These fingerprints serve as feature vectors to train ML/DL models for molecular property prediction tasks in the field of...
Salesforce CEO Marc Benioff: This isn’t our first SaaSpocalypse
Salesforce reported a solid year-end earnings and then pulled out all the stops to ward off more talk of the death of its business to AI.
Gushwork bets on AI search for customer leads — and early results are emerging
Gushwork has raised $9 million in a seed round led by SIG and Lightspeed. The startup has seen early customer traction from AI search tools like ChatGPT.
Nvidia has another record quarter amid record capex spends
"The demand for tokens in the world has gone completely exponential," Nvidia CEO Jensen Huang said about the company's earnings.
Alphabet-owned robotics software company Intrinsic joins Google
Nearly five years after graduating into an independent Alphabet company, Intrinsic is moving under Google's domain.
Wearable startup CUDIS launches a new health ring line with an AI-fueled ‘coach’
The wearable incentivizes healthy behavior with points that can be redeemed for health products.
OpenClaw creator’s advice to AI builders is to be more playful and allow yourself time to improve
Peter Steinberger talks about the creation of his viral AI agent OpenClaw and how being more "playful" makes for a better way to learn AI coding.
About 12% of US teens turn to AI for emotional support or advice
General-purpose tools like ChatGPT, Claude, and Grok are not designed for this use, making mental health professionals wary.
TriTopic: Tri-Modal Graph-Based Topic Modeling with Iterative Refinement and Archetypes
arXiv:2602.19079v1 Announce Type: new Abstract: Topic modeling extracts latent themes from large text collections, but leading approaches like BERTopic face critical limitations: stochastic instability, loss of lexical precision ("Embedding Blur"), and reliance on a single data perspective. We present TriTopic,...