Academic

CREATE: Testing LLMs for Associative Creativity

arXiv:2603.09970v1 Announce Type: new Abstract: A key component of creativity is associative reasoning: the ability to draw novel yet meaningful connections between concepts. We introduce CREATE, a benchmark designed to evaluate models' capacity for creative associative reasoning. CREATE requires models to generate sets of paths connecting concepts in a model's parametric knowledge. Paths should have high specificity (distinctiveness and closeness of the concept connection) and high diversity (dissimilarity from other paths), and models are scored more highly if they produce a larger set of strong, diverse paths. This task shares demands of real creativity tasks like hypothesis generation, including an extremely large search space, but enables collection of a sizable benchmark with objective answer grading. Evaluation of frontier models shows that the strongest models achieve higher creative utility than others, with the high multiplicity of answers and complexity of the search making

Manya Wadhwa, Tiasa Singha Roy, Harvey Lederman, Junyi Jessy Li, Greg Durrett · March 11, 2026 · 1 min read · 27 views

#cs.CL

Executive Summary

This article introduces CREATE, a benchmark designed to evaluate models' capacity for creative associative reasoning. The benchmark requires models to generate sets of paths connecting concepts in a model's parametric knowledge, with a focus on high specificity and diversity. The results show that the strongest models achieve higher creative utility, but benchmark saturation is difficult to achieve. The study also highlights that thinking models are not always more effective, even with high token budgets, and that recent approaches for creative prompting provide limited additional improvement. The CREATE benchmark provides a sandbox for developing new methods to improve models' capacity for associative creativity, addressing the need for a standardized evaluation framework in the field of artificial intelligence.

Key Points

▸ CREATE is a benchmark designed to evaluate models' capacity for creative associative reasoning.
▸ The benchmark requires models to generate sets of paths connecting concepts in a model's parametric knowledge.
▸ The results show that the strongest models achieve higher creative utility, but benchmark saturation is difficult to achieve.

Merits

Strength

The CREATE benchmark provides a standardized evaluation framework for assessing models' capacity for creative associative reasoning, which is essential for advancing the field of artificial intelligence.

Strength

The study highlights the limitations of current models and provides insights into the potential benefits of developing new methods to improve models' capacity for associative creativity.

Demerits

Limitation

The benchmark may not accurately capture the complexity of real-world creativity tasks, which may involve multiple stages and iterative refinement.

Limitation

The study relies on a specific set of models and task configurations, which may not be representative of the broader range of models and tasks in the field.

Expert Commentary

This article makes a significant contribution to the field of artificial intelligence by introducing a benchmark for evaluating models' capacity for creative associative reasoning. The study highlights the limitations of current models and provides insights into the potential benefits of developing new methods to improve models' capacity for associative creativity. However, the benchmark may not accurately capture the complexity of real-world creativity tasks, and the study relies on a specific set of models and task configurations. Nevertheless, the CREATE benchmark provides a valuable tool for assessing models' capacity for creative associative reasoning and can be used to inform the development of more advanced models and algorithms in the field. As the field of artificial intelligence continues to evolve, it is essential to develop standardized evaluation frameworks like CREATE that can accurately capture the complexity of real-world tasks and provide insights into the potential benefits of different models and approaches.

Recommendations

✓ Develop and test new methods for improving models' capacity for associative creativity using the CREATE benchmark.
✓ Explore the application of the CREATE benchmark to other domains and tasks, such as natural language processing and computer vision.

Sources

arXiv - cs.CL

CREATE: Testing LLMs for Associative Creativity

AI Commentary

Executive Summary

Key Points

Merits

Strength

Strength

Demerits

Limitation

Limitation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs