Evaluating Progress in Graph Foundation Models: A Comprehensive Benchmark and New Insights
arXiv:2603.10033v1 Announce Type: new Abstract: Graph foundation models (GFM) aim to acquire transferable knowledge by pre-training on diverse graphs, which can be adapted to various downstream tasks. However, domain shift in graphs is inherently two-dimensional: graphs differ not only in...
GR-SAP: Generative Replay for Safety Alignment Preservation during Fine-Tuning
arXiv:2603.10243v1 Announce Type: new Abstract: Recent studies show that the safety alignment of large language models (LLMs) can be easily compromised even by seemingly non-adversarial fine-tuning. To preserve safety alignment during fine-tuning, a widely used strategy is to jointly optimize...
HTMuon: Improving Muon via Heavy-Tailed Spectral Correction
arXiv:2603.10067v1 Announce Type: new Abstract: Muon has recently shown promising results in LLM training. In this work, we study how to further improve Muon. We argue that Muon's orthogonalized update rule suppresses the emergence of heavy-tailed weight spectra and over-emphasizes...
Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs
arXiv:2603.10100v1 Announce Type: new Abstract: Modern CNNs' high computational demands hinder edge deployment, as traditional ``hard'' sparsity (skipping mathematical zeros) loses effectiveness in deep layers or with smooth activations like Tanh. We propose a ``soft sparsity'' paradigm using a hardware...
DT-BEHRT: Disease Trajectory-aware Transformer for Interpretable Patient Representation Learning
arXiv:2603.10180v1 Announce Type: new Abstract: The growing adoption of electronic health record (EHR) systems has provided unprecedented opportunities for predictive modeling to guide clinical decision making. Structured EHRs contain longitudinal observations of patients across hospital visits, where each visit is...
Improving TabPFN's Synthetic Data Generation by Integrating Causal Structure
arXiv:2603.10254v1 Announce Type: new Abstract: Synthetic tabular data generation addresses data scarcity and privacy constraints in a variety of domains. Tabular Prior-Data Fitted Network (TabPFN), a recent foundation model for tabular data, has been shown capable of generating high-quality synthetic...
On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD
arXiv:2603.10397v1 Announce Type: new Abstract: One crucial factor behind the success of deep learning lies in the implicit bias induced by noise inherent in gradient-based training algorithms. Motivated by empirical observations that training with noisy labels improves model generalization, we...
WordPress debuts a private workspace that runs in your browser via a new service, my.WordPress.net
WordPress’s new browser-based service lets users create private sites without hosting or signing up, turning the platform into a personal workspace for writing, research, and AI tools.
What is a Tort?
What is a tort, and what is tort law for? On one leading scholarly account, torts are legal liability rules that seek to promote the welfare of society at large by disincentivizing socially suboptimal behavior and distributing the costs of...
TaSR-RAG: Taxonomy-guided Structured Reasoning for Retrieval-Augmented Generation
arXiv:2603.09341v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) helps large language models (LLMs) answer knowledge-intensive and time-sensitive questions by conditioning generation on external evidence. However, most RAG systems still retrieve unstructured chunks and rely on one-shot generation, which often yields...
Social-R1: Towards Human-like Social Reasoning in LLMs
arXiv:2603.09249v1 Announce Type: new Abstract: While large language models demonstrate remarkable capabilities across numerous domains, social intelligence - the capacity to perceive social cues, infer mental states, and generate appropriate responses - remains a critical challenge, particularly for enabling effective...
Reward Prediction with Factorized World States
arXiv:2603.09400v1 Announce Type: new Abstract: Agents must infer action outcomes and select actions that maximize a reward signal indicating how close the goal is to being reached. Supervised learning of reward models could introduce biases inherent to training data, limiting...
One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations
arXiv:2603.08869v1 Announce Type: new Abstract: Do the features learned by Sparse Autoencoders (SAEs) represent abstract meaning, or are they tied to how text is written? We investigate this question using Serbian digraphia as a controlled testbed: Serbian is written interchangeably...
Meissa: Multi-modal Medical Agentic Intelligence
arXiv:2603.09018v1 Announce Type: new Abstract: Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However, these systems rely...
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
arXiv:2603.09095v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) can process text presented as images, yet they often perform worse than when the same content is provided as textual tokens. We systematically diagnose this "modality gap" by evaluating seven...
Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents
arXiv:2603.09203v1 Announce Type: new Abstract: Retrieval-augmented agents can query external evidence, yet their reliability in multi-step reasoning remains limited: noisy retrieval may derail multi-hop question answering, while outcome-only reinforcement learning provides credit signals that are too coarse to optimize intermediate...
Telogenesis: Goal Is All U Need
arXiv:2603.09476v1 Announce Type: new Abstract: Goal-conditioned systems assume goals are provided externally. We ask whether attentional priorities can emerge endogenously from an agent's internal cognitive state. We propose a priority function that generates observation targets from three epistemic gaps: ignorance...
Think Before You Lie: How Reasoning Improves Honesty
arXiv:2603.09957v1 Announce Type: new Abstract: While existing evaluations of large language models (LLMs) measure deception rates, the underlying conditions that give rise to deceptive behavior are poorly understood. We investigate this question using a novel dataset of realistic moral trade-offs...
Logics-Parsing-Omni Technical Report
arXiv:2603.09677v1 Announce Type: new Abstract: Addressing the challenges of fragmented task definitions and the heterogeneity of unstructured data in multimodal parsing, this paper proposes the Omni Parsing framework. This framework establishes a Unified Taxonomy covering documents, images, and audio-visual streams,...
ALARM: Audio-Language Alignment for Reasoning Models
arXiv:2603.09556v1 Announce Type: new Abstract: Large audio language models (ALMs) extend LLMs with auditory understanding. A common approach freezes the LLM and trains only an adapter on self-generated targets. However, this fails for reasoning LLMs (RLMs) whose built-in chain-of-thought traces...
RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation
arXiv:2603.09723v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used across the scientific workflow, including to draft peer-review reports. However, many AI-generated reviews are superficial and insufficiently actionable, leaving authors without concrete, implementable guidance and motivating the gap...
One-Eval: An Agentic System for Automated and Traceable LLM Evaluation
arXiv:2603.09821v1 Announce Type: new Abstract: Reliable evaluation is essential for developing and deploying large language models, yet in practice it often requires substantial manual effort: practitioners must identify appropriate benchmarks, reproduce heterogeneous evaluation codebases, configure dataset schema mappings, and interpret...
Self-hosted Lecture-to-Quiz: Local LLM MCQ Generation with Deterministic Quality Control
arXiv:2603.08729v1 Announce Type: cross Abstract: We present an end-to-end self-hosted (API-free) pipeline, where API-free means that lecture content is not sent to any external LLM service, that converts lecture PDFs into multiple-choice questions (MCQs) using a local LLM plus deterministic...
Quantifying Memorization and Privacy Risks in Genomic Language Models
arXiv:2603.08913v1 Announce Type: new Abstract: Genomic language models (GLMs) have emerged as powerful tools for learning representations of DNA sequences, enabling advances in variant prediction, regulatory element identification, and cross-task transfer learning. However, as these models are increasingly trained or...
Semantic Level of Detail: Multi-Scale Knowledge Representation via Heat Kernel Diffusion on Hyperbolic Manifolds
arXiv:2603.08965v1 Announce Type: new Abstract: AI memory systems increasingly organize knowledge into graph structures -- knowledge graphs, entity relations, community hierarchies -- yet lack a principled mechanism for continuous resolution control: where do the qualitative boundaries between abstraction levels lie,...
SCALAR: Learning and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding
arXiv:2603.09036v1 Announce Type: new Abstract: LM-based agents excel when given high-level action APIs but struggle to ground language into low-level control. Prior work has LLMs generate skills or reward functions for RL, but these one-shot approaches lack feedback to correct...
Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms
arXiv:2603.09090v1 Announce Type: new Abstract: In reinforcement learning environments with state-dependent action validity, action masking consistently outperforms penalty-based handling of invalid actions, yet existing theory only shows that masking preserves the policy gradient theorem. We identify a distinct failure mode...
Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL
arXiv:2603.09161v1 Announce Type: new Abstract: Learning effective netlist representations is fundamentally constrained by the scarcity of labeled datasets, as real designs are protected by Intellectual Property (IP) and costly to annotate. Existing work therefore focuses on small-scale circuits with clean...
Zoom introduces an AI-powered office suite, says AI avatars for meetings arrive this month
Zoom is also introducing real-time deepfake detection tech for meetings.
A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness
arXiv:2603.06594v1 Announce Type: new Abstract: Automated \enquote{LLM-as-a-Judge} frameworks have become the de facto standard for scalable evaluation across natural language processing. For instance, in safety evaluation, these judges are relied upon to evaluate harmfulness in order to benchmark the robustness...