CREATE: Testing LLMs for Associative Creativity
arXiv:2603.09970v1 Announce Type: new Abstract: A key component of creativity is associative reasoning: the ability to draw novel yet meaningful connections between concepts. We introduce CREATE, a benchmark designed to evaluate models' capacity for creative associative reasoning. CREATE requires models to generate sets of paths connecting concepts in a model's parametric knowledge. Paths should have high specificity (distinctiveness and closeness of the concept connection) and high diversity (dissimilarity from other paths), and models are scored more highly if they produce a larger set of strong, diverse paths. This task shares demands of real creativity tasks like hypothesis generation, including an extremely large search space, but enables collection of a sizable benchmark with objective answer grading. Evaluation of frontier models shows that the strongest models achieve higher creative utility than others, with the high multiplicity of answers and complexity of the search making
arXiv:2603.09970v1 Announce Type: new Abstract: A key component of creativity is associative reasoning: the ability to draw novel yet meaningful connections between concepts. We introduce CREATE, a benchmark designed to evaluate models' capacity for creative associative reasoning. CREATE requires models to generate sets of paths connecting concepts in a model's parametric knowledge. Paths should have high specificity (distinctiveness and closeness of the concept connection) and high diversity (dissimilarity from other paths), and models are scored more highly if they produce a larger set of strong, diverse paths. This task shares demands of real creativity tasks like hypothesis generation, including an extremely large search space, but enables collection of a sizable benchmark with objective answer grading. Evaluation of frontier models shows that the strongest models achieve higher creative utility than others, with the high multiplicity of answers and complexity of the search making benchmark saturation difficult to achieve. Furthermore, our results illustrate that thinking models are not always more effective on our task, even with high token budgets. Recent approaches for creative prompting give some but limited additional improvement. CREATE provides a sandbox for developing new methods to improve models' capacity for associative creativity.
Executive Summary
This article introduces CREATE, a benchmark designed to evaluate models' capacity for creative associative reasoning. The benchmark requires models to generate sets of paths connecting concepts in a model's parametric knowledge, with a focus on high specificity and diversity. The results show that the strongest models achieve higher creative utility, but benchmark saturation is difficult to achieve. The study also highlights that thinking models are not always more effective, even with high token budgets, and that recent approaches for creative prompting provide limited additional improvement. The CREATE benchmark provides a sandbox for developing new methods to improve models' capacity for associative creativity, addressing the need for a standardized evaluation framework in the field of artificial intelligence.
Key Points
- ▸ CREATE is a benchmark designed to evaluate models' capacity for creative associative reasoning.
- ▸ The benchmark requires models to generate sets of paths connecting concepts in a model's parametric knowledge.
- ▸ The results show that the strongest models achieve higher creative utility, but benchmark saturation is difficult to achieve.
Merits
Strength
The CREATE benchmark provides a standardized evaluation framework for assessing models' capacity for creative associative reasoning, which is essential for advancing the field of artificial intelligence.
Strength
The study highlights the limitations of current models and provides insights into the potential benefits of developing new methods to improve models' capacity for associative creativity.
Demerits
Limitation
The benchmark may not accurately capture the complexity of real-world creativity tasks, which may involve multiple stages and iterative refinement.
Limitation
The study relies on a specific set of models and task configurations, which may not be representative of the broader range of models and tasks in the field.
Expert Commentary
This article makes a significant contribution to the field of artificial intelligence by introducing a benchmark for evaluating models' capacity for creative associative reasoning. The study highlights the limitations of current models and provides insights into the potential benefits of developing new methods to improve models' capacity for associative creativity. However, the benchmark may not accurately capture the complexity of real-world creativity tasks, and the study relies on a specific set of models and task configurations. Nevertheless, the CREATE benchmark provides a valuable tool for assessing models' capacity for creative associative reasoning and can be used to inform the development of more advanced models and algorithms in the field. As the field of artificial intelligence continues to evolve, it is essential to develop standardized evaluation frameworks like CREATE that can accurately capture the complexity of real-world tasks and provide insights into the potential benefits of different models and approaches.
Recommendations
- ✓ Develop and test new methods for improving models' capacity for associative creativity using the CREATE benchmark.
- ✓ Explore the application of the CREATE benchmark to other domains and tasks, such as natural language processing and computer vision.