A Multi-head-based architecture for effective morphological tagging in Russian with open dictionary
arXiv:2604.02926v1 Announce Type: new Abstract: The article proposes a new architecture based on Multi-head attention to solve the problem of morphological tagging for the Russian language. The preprocessing of the word vectors includes splitting the words into subtokens, followed by...
Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models
arXiv:2604.02340v1 Announce Type: new Abstract: Recent advances in masked diffusion language models (MDLMs) narrow the quality gap to autoregressive LMs, but their sampling remains expensive because generation requires many full-sequence denoising passes with a large Transformer and, unlike autoregressive decoding,...
DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models
arXiv:2604.02733v1 Announce Type: new Abstract: Reasoning benchmarks typically evaluate whether a model derives the correct answer from a fixed premise set, but they under-measure a closely related capability that matters in dynamic environments: belief revision under minimal evidence change. We...
SIEVE: Sample-Efficient Parametric Learning from Natural Language
arXiv:2604.02339v1 Announce Type: new Abstract: Natural language context-such as instructions, knowledge, or feedback-contains rich signal for adapting language models. While in-context learning provides adaptation via the prompt, parametric learning persists into model weights and can improve performance further, though is...
R2-Write: Reflection and Revision for Open-Ended Writing with Deep Reasoning
arXiv:2604.03004v1 Announce Type: new Abstract: While deep reasoning with long chain-of-thought has dramatically improved large language models in verifiable domains like mathematics, its effectiveness for open-ended tasks such as writing remains unexplored. In this paper, we conduct a systematic investigation...
Multi-Aspect Knowledge Distillation for Language Model with Low-rank Factorization
arXiv:2604.03110v1 Announce Type: new Abstract: Knowledge distillation is an effective technique for pre-trained language model compression. However, existing methods only focus on the knowledge distribution among layers, which may cause the loss of fine-grained information in the alignment process. To...
Copilot is ‘for entertainment purposes only,’ according to Microsoft’s terms of use
AI skeptics aren’t the only ones warning users not to unthinkingly trust models’ outputs — that’s what the AI companies say themselves in their terms of service.
Can orbital data centers help justify a massive valuation for SpaceX?
On the latest episode of TechCrunch’s Equity podcast, we debated Elon Musk's vision for data centers in space.
In Japan, the robot isn’t coming for your job; it’s filling the one nobody wants
Driven by labor shortages, Japan is pushing physical AI from pilot projects into real-world deployment.
Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage
It’s about to become more expensive for Claude Code subscribers to use Anthropic’s coding assistant with OpenClaw and other third-party tools.
Trump ignores biggest reasons his AI data center buildout is failing
Nearly 50% of data center projects delayed as China holds key to power infrastructure.
OpenAI executive shuffle includes new role for COO Brad Lightcap to lead ‘special projects’
In addition to Lightcap's new role, OpenAI CMO Kate Rouch will be stepping away from the company to focus on cancer recovery, with a plan to return when her health allows.
Anthropic buys biotech startup Coefficient Bio in $400M deal: Reports
Anthropic has purchased the stealth biotech AI startup Coefficient Bio in a $400 million stock deal, according to The Information and Eric Newcomer.
AI companies are building huge natural gas plants to power data centers. What could go wrong?
Meta, Microsoft, and Google are all betting big on new natural gas power plants to run their AI data centers. They may regret it.
People would rather have an Amazon warehouse in their backyard than a data center
A new poll shows that the debate over data centers is far from settled.
The Facebook insider building content moderation for the AI era
Moonbounce has raised $12 million to grow its AI control engine that converts content moderation policies into consistent, predictable AI behavior.
The Enumerated-Rights Reading of the Privileges or Immunities Clause: A Response to Barnett and Bernick
ARTICLE The Enumerated-Rights Reading of the Privileges or Immunities Clause: A Response to Barnett and Bernick Kurt T. Lash* In 1871, John Bingham explained the meaning of the Fourteenth Amendment’s Privileges or Immunities Clause—a clause Bingham himself drafted and had...
Alexa+ gets new food ordering experiences with Uber Eats and Grubhub
You can now order from Uber Eats and Grubhub using Alexa+, an experience Amazon says will be similar to chatting with a waiter at a restaurant or placing an order at a drive-thru.
Optimizing EEG Graph Structure for Seizure Detection: An Information Bottleneck and Self-Supervised Learning Approach
arXiv:2604.01595v1 Announce Type: new Abstract: Seizure detection from EEG signals is highly challenging due to complex spatiotemporal dynamics and extreme inter-patient variability. To model them, recent methods construct dynamic graphs via statistical correlations, predefined similarity measures, or implicit learning, yet...
How Do Language Models Process Ethical Instructions? Deliberation, Consistency, and Other-Recognition Across Four Models
arXiv:2604.00021v1 Announce Type: cross Abstract: Alignment safety research assumes that ethical instructions improve model behavior, but how language models internally process such instructions remains unknown. We conducted over 600 multi-agent simulations across four models (Llama 3.3 70B, GPT-4o mini, Qwen3-Next-80B-A3B,...
15% of Americans say they’d be willing to work for an AI boss, according to new poll
According to a Quinnipiac University poll, 15% of Americans say they'd be willing to have a job where their direct supervisor was an AI program that assigned tasks and set schedules.
Cognichip wants AI to design the chips that power AI, and just raised $60M to try
The firm says it can reduce the cost of chip development by more than 75% and cut the timeline by more than half.
Mantis Biotech is making ‘digital twins’ of humans to help solve medicine’s data availability problem
Mantis takes disparate sources of data to make synthetic datasets that can be used to build so-called "digital twins" of the human body, representing anatomy, physiology and behavior.
In harmony with gpt-oss
arXiv:2604.00362v1 Announce Type: new Abstract: No one has independently reproduced OpenAI's published scores for gpt-oss-20b with tools, because the original paper discloses neither the tools nor the agent harness. We reverse-engineered the model's in-distribution tools: when prompted without tool definitions,...
Learning ECG Image Representations via Dual Physiological-Aware Alignments
arXiv:2604.01526v1 Announce Type: new Abstract: Electrocardiograms (ECGs) are among the most widely used diagnostic tools for cardiovascular diseases, and a large amount of ECG data worldwide appears only in image form. However, most existing automated ECG analysis methods rely on...
Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via Diffusion Sampler
arXiv:2604.01870v1 Announce Type: new Abstract: In modern process industries, data-driven models are important tools for real-time monitoring when key performance indicators are difficult to measure directly. While accurate predictions are essential, reliable uncertainty quantification (UQ) is equally critical for safety,...
Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial
arXiv:2604.01328v1 Announce Type: new Abstract: Traditional scientific discovery relies on an iterative hypothesise-experiment-refine cycle that has driven progress for centuries, but its intuitive, ad-hoc implementation often wastes resources, yields inefficient designs, and misses critical insights. This tutorial presents Bayesian Optimisation...
CircuitProbe: Predicting Reasoning Circuits in Transformers via Stability Zone Detection
arXiv:2604.00716v1 Announce Type: new Abstract: Transformer language models contain localized reasoning circuits, contiguous layer blocks that improve reasoning when duplicated at inference time. Finding these circuits currently requires brute-force sweeps costing 25 GPU hours per model. We propose CircuitProbe, which...
Common TF-IDF variants arise as key components in the test statistic of a penalized likelihood-ratio test for word burstiness
arXiv:2604.00672v1 Announce Type: new Abstract: TF-IDF is a classical formula that is widely used for identifying important terms within documents. We show that TF-IDF-like scores arise naturally from the test statistic of a penalized likelihood-ratio test setup capturing word burstiness...
REM-CTX: Automated Peer Review via Reinforcement Learning with Auxiliary Context
arXiv:2604.00248v1 Announce Type: new Abstract: Most automated peer review systems rely on textual manuscript content alone, leaving visual elements such as figures and external scholarly signals underutilized. We introduce REM-CTX, a reinforcement-learning system that incorporates auxiliary context into the review...