LaTeX Compilation: Challenges in the Era of LLMs
arXiv:2603.02873v1 Announce Type: new Abstract: As large language models (LLMs) increasingly assist scientific writing, limitations and the significant token cost of TeX become more and more visible. This paper analyzes TeX's fundamental defects in compilation and user experience design to...
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?
arXiv:2603.03202v1 Announce Type: new Abstract: As large language models (LLMs) advance their mathematical capabilities toward the IMO level, the scarcity of challenging, high-quality problems for training and evaluation has become a significant bottleneck. Simultaneously, recent code agents have demonstrated sophisticated...
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
arXiv:2603.03205v1 Announce Type: new Abstract: Agentic language models operate in a fundamentally different safety regime than chat models: they must plan, call tools, and execute long-horizon actions where a single misstep, such as accessing files or entering credentials, can cause...
Using Learning Progressions to Guide AI Feedback for Science Learning
arXiv:2603.03249v1 Announce Type: new Abstract: Generative artificial intelligence (AI) offers scalable support for formative feedback, yet most AI-generated feedback relies on task-specific rubrics authored by domain experts. While effective, rubric authoring is time-consuming and limits scalability across instructional contexts. Learning...
Routing Absorption in Sparse Attention: Why Random Gates Are Hard to Beat
arXiv:2603.02227v1 Announce Type: cross Abstract: Can a transformer learn which attention entries matter during training? In principle, yes: attention distributions are highly concentrated, and a small gate network can identify the important entries post-hoc with near-perfect accuracy. In practice, barely....
Safety Training Persists Through Helpfulness Optimization in LLM Agents
arXiv:2603.02229v1 Announce Type: cross Abstract: Safety post-training has been studied extensively in single-step "chat" settings where safety typically refers to refusing harmful requests. We study an "agentic" (i.e., multi-step, tool-use) setting where safety refers to harmful actions directly taken by...
MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction
arXiv:2603.02221v1 Announce Type: new Abstract: In healthcare tabular predictions, classical models with feature engineering often outperform neural approaches. Recent advances in Large Language Models enable the integration of domain knowledge into feature engineering, offering a promising direction. However, existing approaches...
Characterizing and Predicting Wildfire Evacuation Behavior: A Dual-Stage ML Approach
arXiv:2603.02223v1 Announce Type: new Abstract: Wildfire evacuation behavior is highly variable and influenced by complex interactions among household resources, preparedness, and situational cues. Using a large-scale MTurk survey of residents in California, Colorado, and Oregon, this study integrates unsupervised and...
Graph Attention Based Prioritization of Disease Responsible Genes from Multimodal Alzheimer's Network
arXiv:2603.02273v1 Announce Type: new Abstract: Prioritizing disease-associated genes is central to understanding the molecular mechanisms of complex disorders such as Alzheimer's disease (AD). Traditional network-based approaches rely on static centrality measures and often fail to capture cross-modal biological heterogeneity. We...
Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles
arXiv:2603.02406v1 Announce Type: new Abstract: Generative models have recently advanced $\textit{de novo}$ protein design by learning the statistical regularities of natural structures. However, current approaches face three key limitations: (1) Existing methods cannot jointly learn protein geometry and design tasks,...
A Unified Revisit of Temperature in Classification-Based Knowledge Distillation
arXiv:2603.02430v1 Announce Type: new Abstract: A central idea of knowledge distillation is to expose relational structure embedded in the teacher's weights for the student to learn, which is often facilitated using a temperature parameter. Despite its widespread use, there remains...
Court unanimously sides with government in immigration dispute
The Supreme Court unanimously sided with the federal government on Wednesday in Urias-Orellana v. Bondi, holding in an opinion by Justice Ketanji Brown Jackson that federal courts of appeals must […]The postCourt unanimously sides with government in immigration disputeappeared first...
Lawsuit: Google Gemini sent man on violent missions, set suicide "countdown"
Gemini allegedly called man its "husband," said they could be together in death.
Distribution-Aware Companding Quantization of Large Language Models
arXiv:2603.00364v1 Announce Type: new Abstract: Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample...
Policy Compliance of User Requests in Natural Language for AI Systems
arXiv:2603.00369v1 Announce Type: new Abstract: Consider an organization whose users send requests in natural language to an AI system that fulfills them by carrying out specific tasks. In this paper, we consider the problem of ensuring such user requests comply...
CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles
arXiv:2603.00523v1 Announce Type: new Abstract: Mechanistic circuit discovery is notoriously sensitive to arbitrary analyst choices, especially pruning thresholds and feature dictionaries, often yielding brittle "one-shot" explanations with no principled notion of uncertainty. We reframe circuit discovery as an uncertainty-quantification problem...
Super Research: Answering Highly Complex Questions with Large Language Models through Super Deep and Super Wide Research
arXiv:2603.00582v1 Announce Type: new Abstract: While Large Language Models (LLMs) have demonstrated proficiency in Deep Research or Wide Search, their capacity to solve highly complex questions-those requiring long-horizon planning, massive evidence gathering, and synthesis across heterogeneous sources-remains largely unexplored. We...
From Literature to Hypotheses: An AI Co-Scientist System for Biomarker-Guided Drug Combination Hypothesis Generation
arXiv:2603.00612v1 Announce Type: new Abstract: The rapid growth of biomedical literature and curated databases has made it increasingly difficult for researchers to systematically connect biomarker mechanisms to actionable drug combination hypotheses. We present AI Co-Scientist (CoDHy), an interactive, human-in-the-loop system...
SSKG Hub: An Expert-Guided Platform for LLM-Empowered Sustainability Standards Knowledge Graphs
arXiv:2603.00669v1 Announce Type: new Abstract: Sustainability disclosure standards (e.g., GRI, SASB, TCFD, IFRS S2) are comprehensive yet lengthy, terminology-dense, and highly cross-referential, hindering structured analysis and downstream use. We present SSKG Hub (Sustainability Standards Knowledge Graph Hub), a research prototype...
RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis
arXiv:2603.00686v1 Announce Type: new Abstract: Large Language Models have evolved from single-round generators into long-horizon agents, capable of complex text synthesis scenarios. However, current evaluation frameworks lack the ability to assess the actual synthesis operations, such as outlining, drafting, and...
Thoth: Mid-Training Bridges LLMs to Time Series Understanding
arXiv:2603.01042v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable success in general-purpose reasoning. However, they still struggle to understand and reason about time series data, which limits their effectiveness in decision-making scenarios that depend on temporal dynamics....
M3-AD: Reflection-aware Multi-modal, Multi-category, and Multi-dimensional Benchmark and Framework for Industrial Anomaly Detection
arXiv:2603.00055v1 Announce Type: new Abstract: Although multimodal large language models (MLLMs) have advanced industrial anomaly detection toward a zero-shot paradigm, they still tend to produce high-confidence yet unreliable decisions in fine-grained and structurally complex industrial scenarios, and lack effective self-corrective...
Certainty-Validity: A Diagnostic Framework for Discrete Commitment Systems
arXiv:2603.00070v1 Announce Type: new Abstract: Standard evaluation metrics for machine learning -- accuracy, precision, recall, and AUROC -- assume that all errors are equivalent: a confident incorrect prediction is penalized identically to an uncertain one. For discrete commitment systems (architectures...
SEval-NAS: A Search-Agnostic Evaluation for Neural Architecture Search
arXiv:2603.00099v1 Announce Type: new Abstract: Neural architecture search (NAS) automates the discovery of neural networks that meet specified criteria, yet its evaluation procedures are often hardcoded, limiting the ability to introduce new metrics. This issue is especially pronounced in hardware-aware...
LIDS: LLM Summary Inference Under the Layered Lens
arXiv:2603.00105v1 Announce Type: new Abstract: Large language models (LLMs) have gained significant attention by many researchers and practitioners in natural language processing (NLP) since the introduction of ChatGPT in 2022. One notable feature of ChatGPT is its ability to generate...
OSF: On Pre-training and Scaling of Sleep Foundation Models
arXiv:2603.00190v1 Announce Type: new Abstract: Polysomnography (PSG) provides the gold standard for sleep assessment but suffers from substantial heterogeneity across recording devices and cohorts. There have been growing efforts to build general-purpose foundation models (FMs) for sleep physiology, but lack...
A medical coding language model trained on clinical narratives from a population-wide cohort of 1.8 million patients
arXiv:2603.00221v1 Announce Type: new Abstract: Medical coding translates clinical documentation into standardized codes for billing, research, and public health, but manual coding is time-consuming and error-prone. Existing automation efforts rely on small datasets that poorly represent real-world patient heterogeneity. We...
Hereditary Geometric Meta-RL: Nonlocal Generalization via Task Symmetries
arXiv:2603.00396v1 Announce Type: new Abstract: Meta-Reinforcement Learning (Meta-RL) commonly generalizes via smoothness in the task encoding. While this enables local generalization around each training task, it requires dense coverage of the task space and leaves richer task space structure untapped....
FCC chair calls Paramount/WBD merger "a lot cleaner" than defunct Netflix deal
FCC to review foreign debt, but Carr indicates it will be a formality.
Humans and LLMs Diverge on Probabilistic Inferences
arXiv:2602.23546v1 Announce Type: new Abstract: Human reasoning often involves working over limited information to arrive at probabilistic conclusions. In its simplest form, this involves making an inference that is not strictly entailed by a premise, but rather only likely given...