CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models
arXiv:2602.17684v1 Announce Type: cross Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large language models by leveraging execution-based feedback from unit tests, but its scalability is fundamentally constrained by the availability and reliability of high-quality...
Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO
arXiv:2602.17686v1 Announce Type: cross Abstract: Distilling Chain-of-Thought (CoT) reasoning from large language models into compact student models presents a fundamental challenge: teacher rationales are often too verbose for smaller models to faithfully reproduce. Existing approaches either compress reasoning into single-step,...
DesignAsCode: Bridging Structural Editability and Visual Fidelity in Graphic Design Generation
arXiv:2602.17690v1 Announce Type: cross Abstract: Graphic design generation demands a delicate balance between high visual fidelity and fine-grained structural editability. However, existing approaches typically bifurcate into either non-editable raster image synthesis or abstract layout generation devoid of visual content. Recent...
A Case Study of Selected PTQ Baselines for Reasoning LLMs on Ascend NPU
arXiv:2602.17693v1 Announce Type: cross Abstract: Post-Training Quantization (PTQ) is crucial for efficient model deployment, yet its effectiveness on Ascend NPU remains under-explored compared to GPU architectures. This paper presents a case study of representative PTQ baselines applied to reasoning-oriented models...
AsynDBT: Asynchronous Distributed Bilevel Tuning for efficient In-Context Learning with Large Language Models
arXiv:2602.17694v1 Announce Type: cross Abstract: With the rapid development of large language models (LLMs), an increasing number of applications leverage cloud-based LLM APIs to reduce usage costs. However, since cloud-based models' parameters and gradients are agnostic, users have to manually...
ScaleBITS: Scalable Bitwidth Search for Hardware-Aligned Mixed-Precision LLMs
arXiv:2602.17698v1 Announce Type: cross Abstract: Post-training weight quantization is crucial for reducing the memory and inference cost of large language models (LLMs), yet pushing the average precision below 4 bits remains challenging due to highly non-uniform weight sensitivity and the...
UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems
arXiv:2602.17709v1 Announce Type: cross Abstract: All-atom molecular simulation serves as a quintessential ``computational microscope'' for understanding the machinery of life, yet it remains fundamentally limited by the trade-off between quantum-mechanical (QM) accuracy and biological scale. We present UBio-MolFM, a universal...
Detection and Classification of Cetacean Echolocation Clicks using Image-based Object Detection Methods applied to Advanced Wavelet-based Transformations
arXiv:2602.17749v1 Announce Type: cross Abstract: A challenge in marine bioacoustic analysis is the detection of animal signals, like calls, whistles and clicks, for behavioral studies. Manual labeling is too time-consuming to process sufficient data to get reasonable results. Thus, an...
Investigating Target Class Influence on Neural Network Compressibility for Energy-Autonomous Avian Monitoring
arXiv:2602.17751v1 Announce Type: cross Abstract: Biodiversity loss poses a significant threat to humanity, making wildlife monitoring essential for assessing ecosystem health. Avian species are ideal subjects for this due to their popularity and the ease of identifying them through their...
Impact of Artificial Intelligence on Dental Education: A Review and Guide for Curriculum Update
In this intellectual work, the clinical and educational aspects of dentistry were confronted with practical applications of artificial intelligence (AI). The aim was to provide an up-to-date overview of the upcoming changes and a brief analysis of the influential advancements...
Spilled Energy in Large Language Models
arXiv:2602.18671v1 Announce Type: new Abstract: We reinterpret the final Large Language Model (LLM) softmax classifier as an Energy-Based Model (EBM), decomposing the sequence-to-sequence probability chain into multiple interacting EBMs at inference. This principled approach allows us to track "energy spills"...
Many AI Analysts, One Dataset: Navigating the Agentic Data Science Multiverse
arXiv:2602.18710v1 Announce Type: new Abstract: The conclusions of empirical research depend not only on data but on a sequence of analytic decisions that published results seldom make explicit. Past ``many-analyst" studies have demonstrated this: independent teams testing the same hypothesis...
GenPlanner: From Noise to Plans -- Emergent Reasoning in Flow Matching and Diffusion Models
arXiv:2602.18812v1 Announce Type: new Abstract: Path planning in complex environments is one of the key problems of artificial intelligence because it requires simultaneous understanding of the geometry of space and the global structure of the problem. In this paper, we...
TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models
arXiv:2602.18884v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs), particularly smaller, deployable variants, exhibit a critical deficiency in understanding temporal and procedural visual data, a bottleneck hindering their application in real-world embodied AI. This gap is largely caused by...
DREAM: Deep Research Evaluation with Agentic Metrics
arXiv:2602.18940v1 Announce Type: new Abstract: Deep Research Agents generate analyst-grade reports, yet evaluating them remains challenging due to the absence of a single ground truth and the multidimensional nature of research quality. Recent benchmarks propose distinct methodologies, yet they suffer...
High Dimensional Procedural Content Generation
arXiv:2602.18943v1 Announce Type: new Abstract: Procedural content generation (PCG) has made substantial progress in shaping static 2D/3D geometry, while most methods treat gameplay mechanics as auxiliary and optimize only over space. We argue that this limits controllability and expressivity, and...
(Perlin) Noise as AI coordinator
arXiv:2602.18947v1 Announce Type: new Abstract: Large scale control of nonplayer agents is central to modern games, while production systems still struggle to balance several competing goals: locally smooth, natural behavior, and globally coordinated variety across space and time. Prior approaches...
Robust and Efficient Tool Orchestration via Layered Execution Structures with Reflective Correction
arXiv:2602.18968v1 Announce Type: new Abstract: Tool invocation is a core capability of agentic systems, yet failures often arise not from individual tool calls but from how multiple tools are organized and executed together. Existing approaches tightly couple tool execution with...
When Do LLM Preferences Predict Downstream Behavior?
arXiv:2602.18971v1 Announce Type: new Abstract: Preference-driven behavior in LLMs may be a necessary precondition for AI misalignment such as sandbagging: models cannot strategically pursue misaligned goals unless their behavior is influenced by their preferences. Yet prior work has typically prompted...
InfEngine: A Self-Verifying and Self-Optimizing Intelligent Engine for Infrared Radiation Computing
arXiv:2602.18985v1 Announce Type: new Abstract: Infrared radiation computing underpins advances in climate science, remote sensing and spectroscopy but remains constrained by manual workflows. We introduce InfEngine, an autonomous intelligent computational engine designed to drive a paradigm shift from human-led orchestration...
Asking the Right Questions: Improving Reasoning with Generated Stepping Stones
arXiv:2602.19069v1 Announce Type: new Abstract: Recent years have witnessed tremendous progress in enabling LLMs to solve complex reasoning tasks such as math and coding. As we start to apply LLMs to harder tasks that they may not be able to...
Defining Explainable AI for Requirements Analysis
arXiv:2602.19071v1 Announce Type: new Abstract: Explainable Artificial Intelligence (XAI) has become popular in the last few years. The Artificial Intelligence (AI) community in general, and the Machine Learning (ML) community in particular, is coming to the realisation that in many...
Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions
arXiv:2602.19109v1 Announce Type: new Abstract: We study three-digit addition in Meta-Llama-3-8B (base) under a one-token readout to characterize how arithmetic answers are finalized after cross-token routing becomes causally irrelevant. Causal residual patching and cumulative attention ablations localize a sharp boundary...
Reasoning Capabilities of Large Language Models. Lessons Learned from General Game Playing
arXiv:2602.19160v1 Announce Type: new Abstract: This paper examines the reasoning capabilities of Large Language Models (LLMs) from a novel perspective, focusing on their ability to operate within formally specified, rule-governed environments. We evaluate four LLMs (Gemini 2.5 Pro and Flash...
Limited Reasoning Space: The cage of long-horizon reasoning in LLMs
arXiv:2602.19281v1 Announce Type: new Abstract: The test-time compute strategy, such as Chain-of-Thought (CoT), has significantly enhanced the ability of large language models to solve complex tasks like logical reasoning. However, empirical studies indicate that simply increasing the compute budget can...
Automated Generation of Microfluidic Netlists using Large Language Models
arXiv:2602.19297v1 Announce Type: new Abstract: Microfluidic devices have emerged as powerful tools in various laboratory applications, but the complexity of their design limits accessibility for many practitioners. While progress has been made in microfluidic design automation (MFDA), a practical and...
Time Series, Vision, and Language: Exploring the Limits of Alignment in Contrastive Representation Spaces
arXiv:2602.19367v1 Announce Type: new Abstract: The Platonic Representation Hypothesis posits that learned representations from models trained on different modalities converge to a shared latent structure of the world. However, this hypothesis has largely been examined in vision and language, and...
IR$^3$: Contrastive Inverse Reinforcement Learning for Interpretable Detection and Mitigation of Reward Hacking
arXiv:2602.19416v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) enables powerful LLM alignment but can introduce reward hacking - models exploit spurious correlations in proxy rewards without genuine alignment. Compounding this, the objectives internalized during RLHF remain opaque,...
ComplLLM: Fine-tuning LLMs to Discover Complementary Signals for Decision-making
arXiv:2602.19458v1 Announce Type: new Abstract: Multi-agent decision pipelines can outperform single agent workflows when complementarity holds, i.e., different agents bring unique information to the table to inform a final decision. We propose ComplLLM, a post-training framework based on decision theory...
ConfSpec: Efficient Step-Level Speculative Reasoning via Confidence-Gated Verification
arXiv:2602.18447v1 Announce Type: new Abstract: Chain-of-Thought reasoning significantly improves the performance of large language models on complex tasks, but incurs high inference latency due to long generation traces. Step-level speculative reasoning aims to mitigate this cost, yet existing approaches face...