Episode 34: In the Family: Family Tropes in International Law - EJIL: The Podcast!
Donate to support AI Safety | CAIS
CAIS is a 501(c)(3) nonprofit institute aimed at advancing trustworthy, reliable, and safe AI through innovative field-building and research creation.
Announcements Archives - AI Now Institute
The U.S. Public Wants Regulation (or Prohibition) of Expert‑Level and Superhuman AI
Three‑quarters of U.S. adults want strong regulations on AI development, preferring oversight akin to pharmaceuticals rather than industry "self‑regulation."
The Balancing Act: Looking Backward, Looking Ahead
Conferences - JURIX
Jurix organises yearly conferences on the topic of Legal Knowledge and Information Systems, the first one in 1988. The proceedings of the conferences are published in the Frontiers of Artificial Intelligence and Applications series of IOS Press, the recent ones...
JURIX 2019
The 32nd International Conference on Legal Knowledge and Information Systems
Under Trump, EPA’s enforcement of environmental laws collapses, report finds
The Environmental Protection Agency has drastically pulled back on holding polluters accountable.
Science
Featuring the latest in daily science news, Verge Science is all you need to keep track of what’s going on in health, the environment, and your whole world. Through our articles, we keep a close eye on the overlap between...
Netflix
With nearly 150 million subscribers around the world, Netflix has a commanding lead in the streaming wars. But it’s also facing heavy competition from deep-pocketed conglomerates like Disney, Apple, and AT&T, and an ongoing wave of narrow, targeted streaming sites...
Amazon
Once a modest online seller of books, Amazon is now one of the largest companies in the world, and its former CEO, Jeff Bezos, is the world’s most wealthy person. We track developments, both of Bezos and Amazon, its growth...
Wearable
The Verge is about technology and how it makes us feel. Founded in 2011, we offer our audience everything from breaking news to reviews to award-winning features and investigations, on our site, in video, and in podcasts.
Antitrust
How big is too big? And when does a company become so big that the government is forced to step in and make it smaller? Politicians have been struggling with those questions for at least a hundred years. But as...
Creators
YouTube, Instagram, SoundCloud, and other online platforms are changing the way people create and consume media. The Verge’s Creators section covers the people using these platforms, what they’re making, and how those platforms are changing (for better and worse) in...
Space
Verge Science is here to bring you the most up-to-date space news and analysis, whether it’s about the latest findings from NASA or comprehensive coverage of the next SpaceX rocket launch to the International Space Station. We’ll take you inside...
Health
The Verge is about technology and how it makes us feel. Founded in 2011, we offer our audience everything from breaking news to reviews to award-winning features and investigations, on our site, in video, and in podcasts.
BotzoneBench: Scalable LLM Evaluation via Graded AI Anchors
arXiv:2602.13214v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in interactive environments requiring strategic decision-making, yet systematic evaluation of these capabilities remains challenging. Existing benchmarks for LLMs primarily assess static reasoning through isolated tasks and fail to...
Intelligence as Trajectory-Dominant Pareto Optimization
arXiv:2602.13230v1 Announce Type: new Abstract: Despite recent advances in artificial intelligence, many systems exhibit stagnation in long-horizon adaptability despite continued performance optimization. This work argues that such limitations do not primarily arise from insufficient learning, data, or model capacity, but...
NL2LOGIC: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models
arXiv:2602.13237v1 Announce Type: new Abstract: Automated reasoning is critical in domains such as law and governance, where verifying claims against facts in documents requires both accuracy and interpretability. Recent work adopts structured reasoning pipelines that translate natural language into first-order...
General learned delegation by clones
arXiv:2602.13262v1 Announce Type: new Abstract: Frontier language models improve with additional test-time computation, but serial reasoning or uncoordinated parallel sampling can be compute-inefficient under fixed inference budgets. We propose SELFCEST, which equips a base model with the ability to spawn...
TemporalBench: A Benchmark for Evaluating LLM-Based Agents on Contextual and Event-Informed Time Series Tasks
arXiv:2602.13272v1 Announce Type: new Abstract: It is unclear whether strong forecasting performance reflects genuine temporal understanding or the ability to reason under contextual and event-driven conditions. We introduce TemporalBench, a multi-domain benchmark designed to evaluate temporal reasoning behavior under progressively...
Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol
arXiv:2602.13320v1 Announce Type: new Abstract: As AI agents powered by large language models (LLMs) increasingly use external tools for high-stakes decisions, a critical reliability question arises: how do errors propagate across sequential tool calls? We introduce the first theoretical framework...
NeuroWeaver: An Autonomous Evolutionary Agent for Exploring the Programmatic Space of EEG Analysis Pipelines
arXiv:2602.13473v1 Announce Type: new Abstract: Although foundation models have demonstrated remarkable success in general domains, the application of these models to electroencephalography (EEG) analysis is constrained by substantial data requirements and high parameterization. These factors incur prohibitive computational costs, thereby...
Who Do LLMs Trust? Human Experts Matter More Than Other LLMs
arXiv:2602.13568v1 Announce Type: new Abstract: Large language models (LLMs) increasingly operate in environments where they encounter social information such as other agents' answers, tool outputs, or human recommendations. In humans, such inputs influence judgments in ways that depend on the...
The Quantization Trap: Breaking Linear Scaling Laws in Multi-Hop Reasoning
arXiv:2602.13595v1 Announce Type: new Abstract: Neural scaling laws provide a predictable recipe for AI advancement: reducing numerical precision should linearly improve computational efficiency and energy profile (E proportional to bits). In this paper, we demonstrate that this scaling law breaks...
AllMem: A Memory-centric Recipe for Efficient Long-context Modeling
arXiv:2602.13680v1 Announce Type: new Abstract: Large Language Models (LLMs) encounter significant performance bottlenecks in long-sequence tasks due to the computational complexity and memory overhead inherent in the self-attention mechanism. To address these challenges, we introduce \textsc{AllMem}, a novel and efficient...
LLM-Powered Automatic Translation and Urgency in Crisis Scenarios
arXiv:2602.13452v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly proposed for crisis preparedness and response, particularly for multilingual communication. However, their suitability for high-stakes crisis contexts remains insufficiently evaluated. This work examines the performance of state-of-the-art LLMs and...
Language Model Memory and Memory Models for Language
arXiv:2602.13466v1 Announce Type: new Abstract: The ability of machine learning models to store input information in hidden layer vector embeddings, analogous to the concept of `memory', is widely employed but not well characterized. We find that language model embeddings typically...