Mashup Learning: Faster Finetuning by Remixing Past Checkpoints
arXiv:2603.10156v1 Announce Type: new Abstract: Finetuning on domain-specific data is a well-established method for enhancing LLM performance on downstream tasks. Training on each dataset produces a new set of model weights, resulting in a multitude of checkpoints saved in-house or...
DT-BEHRT: Disease Trajectory-aware Transformer for Interpretable Patient Representation Learning
arXiv:2603.10180v1 Announce Type: new Abstract: The growing adoption of electronic health record (EHR) systems has provided unprecedented opportunities for predictive modeling to guide clinical decision making. Structured EHRs contain longitudinal observations of patients across hospital visits, where each visit is...
Actor-Accelerated Policy Dual Averaging for Reinforcement Learning in Continuous Action Spaces
arXiv:2603.10199v1 Announce Type: new Abstract: Policy Dual Averaging (PDA) offers a principled Policy Mirror Descent (PMD) framework that more naturally admits value function approximation than standard PMD, enabling the use of approximate advantage (or Q-) functions while retaining strong convergence...
Rethinking the Harmonic Loss via Non-Euclidean Distance Layers
arXiv:2603.10225v1 Announce Type: new Abstract: Cross-entropy loss has long been the standard choice for training deep neural networks, yet it suffers from interpretability limitations, unbounded weight growth, and inefficiencies that can contribute to costly training dynamics. The harmonic loss is...
SiMPO: Measure Matching for Online Diffusion Reinforcement Learning
arXiv:2603.10250v1 Announce Type: new Abstract: A commonly used family of RL algorithms for diffusion policies conducts softmax reweighting over the behavior policy, which usually induces an over-greedy policy and fails to leverage feedback from negative samples. In this work, we...
Improving TabPFN's Synthetic Data Generation by Integrating Causal Structure
arXiv:2603.10254v1 Announce Type: new Abstract: Synthetic tabular data generation addresses data scarcity and privacy constraints in a variety of domains. Tabular Prior-Data Fitted Network (TabPFN), a recent foundation model for tabular data, has been shown capable of generating high-quality synthetic...
Discovery of a Hematopoietic Manifold in scGPT Yields a Method for Extracting Performant Algorithms from Biological Foundation Model Internals
arXiv:2603.10261v1 Announce Type: new Abstract: We report the discovery and extraction of a compact hematopoietic algorithm from the single-cell foundation model scGPT, to our knowledge the first biologically useful, competitive algorithm extracted from a foundation model via mechanistic interpretability. We...
Robust Post-Training for Generative Recommenders: Why Exponential Reward-Weighted SFT Outperforms RLHF
arXiv:2603.10279v1 Announce Type: new Abstract: Aligning generative recommender systems to user preferences via post-training is critical for closing the gap between next-item prediction and actual recommendation quality. Existing post-training methods are ill-suited for production-scale systems: RLHF methods reward hack due...
GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need
arXiv:2603.10283v1 Announce Type: new Abstract: Geometry-grounded learning asks models to respect structure in the problem domain rather than treating observations as arbitrary vectors. Motivated by this view, we revisit a classical but underused primitive for comparing datasets: linear relations between...
Copula-ResLogit: A Deep-Copula Framework for Unobserved Confounding Effects
arXiv:2603.10284v1 Announce Type: new Abstract: A key challenge in travel demand analysis is the presence of unobserved factors that may generate non-causal dependencies, obscuring the true causal effects. To address the issue, the study introduces a novel deep learning based...
GaLoRA: Parameter-Efficient Graph-Aware LLMs for Node Classification
arXiv:2603.10298v1 Announce Type: new Abstract: The rapid rise of large language models (LLMs) and their ability to capture semantic relationships has led to their adoption in a wide range of applications. Text-attributed graphs (TAGs) are a notable example where LLMs...
Regime-aware financial volatility forecasting via in-context learning
arXiv:2603.10299v1 Announce Type: new Abstract: This work introduces a regime-aware in-context learning framework that leverages large language models (LLMs) for financial volatility forecasting under nonstationary market conditions. The proposed approach deploys pretrained LLMs to reason over historical volatility patterns and...
What do near-optimal learning rate schedules look like?
arXiv:2603.10301v1 Announce Type: new Abstract: A basic unanswered question in neural network training is: what is the best learning rate schedule shape for a given workload? The choice of learning rate schedule is a key factor in the success or...
How to make the most of your masked language model for protein engineering
arXiv:2603.10302v1 Announce Type: new Abstract: A plethora of protein language models have been released in recent years. Yet comparatively little work has addressed how to best sample from them to optimize desired biological properties. We fill this gap by proposing...
Data-Driven Integration Kernels for Interpretable Nonlocal Operator Learning
arXiv:2603.10305v1 Announce Type: new Abstract: Machine learning models can represent climate processes that are nonlocal in horizontal space, height, and time, often by combining information across these dimensions in highly nonlinear ways. While this can improve predictive skill, it makes...
Federated Active Learning Under Extreme Non-IID and Global Class Imbalance
arXiv:2603.10341v1 Announce Type: new Abstract: Federated active learning (FAL) seeks to reduce annotation cost under privacy constraints, yet its effectiveness degrades in realistic settings with severe global class imbalance and highly heterogeneous clients. We conduct a systematic study of query-model...
Causal Concept Graphs in LLM Latent Space for Stepwise Reasoning
arXiv:2603.10377v1 Announce Type: new Abstract: Sparse autoencoders can localize where concepts live in language models, but not how they interact during multi-step reasoning. We propose Causal Concept Graphs (CCG): a directed acyclic graph over sparse, interpretable latent features, where edges...
Variance-Aware Adaptive Weighting for Diffusion Model Training
arXiv:2603.10391v1 Announce Type: new Abstract: Diffusion models have recently achieved remarkable success in generative modeling, yet their training dynamics across different noise levels remain highly imbalanced, which can lead to inefficient optimization and unstable learning behavior. In this work, we...
Graph-GRPO: Training Graph Flow Models with Reinforcement Learning
arXiv:2603.10395v1 Announce Type: new Abstract: Graph generation is a fundamental task with broad applications, such as drug discovery. Recently, discrete flow matching-based graph generation, \aka, graph flow model (GFM), has emerged due to its superior performance and flexible sampling. However,...
On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD
arXiv:2603.10397v1 Announce Type: new Abstract: One crucial factor behind the success of deep learning lies in the implicit bias induced by noise inherent in gradient-based training algorithms. Motivated by empirical observations that training with noisy labels improves model generalization, we...
Trump administration urges Supreme Court to allow it to revoke protected status for Haitian nationals
The Trump administration on Wednesday asked the Supreme Court to pause a ruling by a federal judge in Washington, D.C., that barred the government from ending a program that allows […]The postTrump administration urges Supreme Court to allow it to...
The 14th Amendment’s citizenship clause does not codify English principles of subjectship
Critics and supporters of President Donald Trump’s executive order on birthright citizenship often focus on the order’s barring of automatic citizenship to children born to individuals unlawfully present in the […]The postThe 14th Amendment’s citizenship clause does not codify English...
The First Amendment’s application to public university students: an explainer
Free speech on university campuses is a perennially hot topic, perhaps most recently reflected in protests about the Israeli-Palestinian conflict at places like Ball State University, Harvard, and Columbia. This […]The postThe First Amendment’s application to public university students: an...
Abandoning the separation of powers in times of war
Courtly Observations is a recurring series by Erwin Chemerinsky that focuses on what the Supreme Court’s decisions will mean for the law, for lawyers and lower courts, and for people’s lives. […]The postAbandoning the separation of powers in times of...
SCOTUStoday for Wednesday, March 11
You’ve likely heard of AI bots being used improperly by lawyers, but what about lawsuits over AI bots practicing law without a license? Reuters reported on one such case last […]The postSCOTUStoday for Wednesday, March 11appeared first onSCOTUSblog.
Binance sues WSJ, panicked by gov’t probes into sanctioned crypto transfers
Binance’s lawsuit accusing WSJ of defamation unlikely to stall government probes.
What crackdown? Trump's EPA enforcement claims don't pass sniff test.
75% of the criminal cases closed last fiscal year originated before Trump took office.
AI ‘actor’ Tilly Norwood put out the worst song I’ve ever heard
This song is an AI actor's rallying cry to other AI actors, urging them to keep going despite the naysayers who doubt their humanity. Literally no one can relate to this.
Zendesk acquires agentic customer service startup Forethought
Forethought was years ahead of its time and the 2018 winner of TechCrunch Battlefield.
Replit snags $9B valuation 6 months after hitting $3B
Replit raised a new $400 million round and said it hopes to have $1B in ARR by year's end.