NL2LOGIC: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models
arXiv:2602.13237v1 Announce Type: new Abstract: Automated reasoning is critical in domains such as law and governance, where verifying claims against facts in documents requires both accuracy and interpretability. Recent work adopts structured reasoning pipelines that translate natural language into first-order...
AST-PAC: AST-guided Membership Inference for Code
arXiv:2602.13240v1 Announce Type: new Abstract: Code Large Language Models are frequently trained on massive datasets containing restrictively licensed source code. This creates urgent data governance and copyright challenges. Membership Inference Attacks (MIAs) can serve as an auditing mechanism to detect...
General learned delegation by clones
arXiv:2602.13262v1 Announce Type: new Abstract: Frontier language models improve with additional test-time computation, but serial reasoning or uncoordinated parallel sampling can be compute-inefficient under fixed inference budgets. We propose SELFCEST, which equips a base model with the ability to spawn...
Artificial Organisations
arXiv:2602.13275v1 Announce Type: new Abstract: Alignment research focuses on making individual AI systems reliable. Human institutions achieve reliable collective behaviour differently: they mitigate the risk posed by misaligned individuals through organisational structure. Multi-agent AI systems should follow this institutional model...
Accuracy Standards for AI at Work vs. Personal Life: Evidence from an Online Survey
arXiv:2602.13283v1 Announce Type: new Abstract: We study how people trade off accuracy when using AI-powered tools in professional versus personal contexts for adoption purposes, the determinants of those trade-offs, and how users cope when AI/apps are unavailable. Because modern AI...
DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing
arXiv:2602.13318v1 Announce Type: new Abstract: Automatically generating and iteratively editing academic slide decks requires more than document summarization. It demands faithful content selection, coherent slide organization, layout-aware rendering, and robust multi-turn instruction following. However, existing benchmarks and evaluation protocols do...
Situation Graph Prediction: Structured Perspective Inference for User Modeling
arXiv:2602.13319v1 Announce Type: new Abstract: Perspective-Aware AI requires modeling evolving internal states--goals, emotions, contexts--not merely preferences. Progress is limited by a data bottleneck: digital footprints are privacy-sensitive and perspective states are rarely labeled. We propose Situation Graph Prediction (SGP), a...
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
arXiv:2602.13367v1 Announce Type: new Abstract: We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small...
MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents
arXiv:2602.13372v1 Announce Type: new Abstract: Evaluating moral alignment in agents navigating conflicting, hierarchically structured human norms is a critical challenge at the intersection of AI safety, moral philosophy, and cognitive science. We introduce Morality Chains, a novel formalism for representing...
On-Policy Supervised Fine-Tuning for Efficient Reasoning
arXiv:2602.13407v1 Announce Type: new Abstract: Large reasoning models (LRMs) are commonly trained with reinforcement learning (RL) to explore long chain-of-thought reasoning, achieving strong performance at high computational cost. Recent methods add multi-reward objectives to jointly optimize correctness and brevity, but...
Translating Dietary Standards into Healthy Meals with Minimal Substitutions
arXiv:2602.13502v1 Announce Type: new Abstract: An important goal for personalized diet systems is to improve nutritional quality without compromising convenience or affordability. We present an end-to-end framework that converts dietary standards into complete meals with minimal change. Using the What...
REMem: Reasoning with Episodic Memory in Language Agent
arXiv:2602.13530v1 Announce Type: new Abstract: Humans excel at remembering concrete experiences along spatiotemporal contexts and performing reasoning across those events, i.e., the capacity for episodic memory. In contrast, memory in language agents remains mainly semantic, and current agents are not...
A First Proof Sprint
arXiv:2602.13587v1 Announce Type: new Abstract: This monograph reports a multi-agent proof sprint on ten research-level problems, combining rapid draft generation with adversarial verification, targeted repair, and explicit provenance. The workflow uses wiring-diagram decompositions of claim dependencies to localize gaps and...
The Quantization Trap: Breaking Linear Scaling Laws in Multi-Hop Reasoning
arXiv:2602.13595v1 Announce Type: new Abstract: Neural scaling laws provide a predictable recipe for AI advancement: reducing numerical precision should linearly improve computational efficiency and energy profile (E proportional to bits). In this paper, we demonstrate that this scaling law breaks...
DiffusionRollout: Uncertainty-Aware Rollout Planning in Long-Horizon PDE Solving
arXiv:2602.13616v1 Announce Type: new Abstract: We propose DiffusionRollout, a novel selective rollout planning strategy for autoregressive diffusion models, aimed at mitigating error accumulation in long-horizon predictions of physical systems governed by partial differential equations (PDEs). Building on the recently validated...
Multimodal Consistency-Guided Reference-Free Data Selection for ASR Accent Adaptation
arXiv:2602.13263v1 Announce Type: new Abstract: Automatic speech recognition (ASR) systems often degrade on accented speech because acoustic-phonetic and prosodic shifts induce a mismatch to training data, making labeled accent adaptation costly. However, common pseudo-label selection heuristics are largely text-centric (e.g.,...
Metaphors' journeys across time and genre: tracking the evolution of literary metaphors with temporal embeddings
arXiv:2602.13701v1 Announce Type: new Abstract: Metaphors are a distinctive feature of literary language, yet they remain less studied experimentally than everyday metaphors. Moreover, previous psycholinguistic and computational approaches overlooked the temporal dimension, although many literary metaphors were coined centuries apart...
RMPL: Relation-aware Multi-task Progressive Learning with Stage-wise Training for Multimedia Event Extraction
arXiv:2602.13748v1 Announce Type: new Abstract: Multimedia Event Extraction (MEE) aims to identify events and their arguments from documents that contain both text and images. It requires grounding event semantics across different modalities. Progress in MEE is limited by the lack...
How Do Lexical Senses Correspond Between Spoken German and German Sign Language?
arXiv:2602.13790v1 Announce Type: new Abstract: Sign language lexicographers construct bilingual dictionaries by establishing word-to-sign mappings, where polysemous and homonymous words corresponding to different signs across contexts are often underrepresented. A usage-based approach examining how word senses map to signs can...
The acquisition of English irregular inflections by Yemeni L1 Arabic learners: A Universal Grammar approach
arXiv:2602.13816v1 Announce Type: new Abstract: This study examines the acquisition of English irregular inflections by Yemeni learners of English as a second language (L2), utilizing a Universal Grammar (UG) approach. Within the UG approach, the study considers Feature Reassembly Hypothesis...
Evaluating Prompt Engineering Techniques for RAG in Small Language Models: A Multi-Hop QA Approach
arXiv:2602.13890v1 Announce Type: new Abstract: Retrieval Augmented Generation (RAG) is a powerful approach for enhancing the factual grounding of language models by integrating external knowledge. While widely studied for large language models, the optimization of RAG for Small Language Models...
Pre-Editorial Normalization for Automatically Transcribed Medieval Manuscripts in Old French and Latin
arXiv:2602.13905v1 Announce Type: new Abstract: Recent advances in Automatic Text Recognition (ATR) have improved access to historical archives, yet a methodological divide persists between palaeographic transcriptions and normalized digital editions. While ATR models trained on more palaeographically-oriented datasets such as...
Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models
arXiv:2602.14039v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) embedding models combine expert outputs using weighted linear summation, implicitly assuming a linear subspace structure in the embedding space. This assumption is shown to be inconsistent with the geometry of expert representations. Geometric...
LogitsCoder: Towards Efficient Chain-of-Thought Path Search via Logits Preference Decoding for Code Generation
arXiv:2602.14054v1 Announce Type: new Abstract: Code generation remains a challenging task that requires precise and structured reasoning. Existing Test Time Scaling (TTS) methods, including structured tree search, have made progress in exploring reasoning paths but still face two major challenges:...
LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts
arXiv:2602.14060v1 Announce Type: new Abstract: We introduce LM-Lexicon, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small...
da Costa and Tarski meet Goguen and Carnap: a novel approach for ontological heterogeneity based on consequence systems
arXiv:2602.15158v1 Announce Type: new Abstract: This paper presents a novel approach for ontological heterogeneity that draws heavily from Carnapian-Goguenism, as presented by Kutz, Mossakowski and L\"ucke (2010). The approach is provisionally designated da Costian-Tarskianism, named after da Costa's Principle of...
Ontology-Guided Neuro-Symbolic Inference: Grounding Language Models with Mathematical Domain Knowledge
arXiv:2602.17826v1 Announce Type: new Abstract: Language models exhibit fundamental limitations -- hallucination, brittleness, and lack of formal grounding -- that are particularly problematic in high-stakes specialist fields requiring verifiable reasoning. I investigate whether formal domain ontologies can enhance language model...
The Token Games: Evaluating Language Model Reasoning with Puzzle Duels
arXiv:2602.17831v1 Announce Type: new Abstract: Evaluating the reasoning capabilities of Large Language Models is increasingly challenging as models improve. Human curation of hard questions is highly expensive, especially in recent benchmarks using PhD-level domain knowledge to challenge the most capable...
Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets
arXiv:2602.18025v1 Announce Type: new Abstract: Scalable robot policy pre-training has been hindered by the high cost of collecting high-quality demonstrations for each platform. In this study, we address this issue by uniting offline reinforcement learning (offline RL) with cross-embodiment learning....
Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies
arXiv:2602.18291v1 Announce Type: new Abstract: Online Multi-Agent Reinforcement Learning (MARL) is a prominent framework for efficient agent coordination. Crucially, enhancing policy expressiveness is pivotal for achieving superior performance. Diffusion-based generative models are well-positioned to meet this demand, having demonstrated remarkable...