LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets
arXiv:2602.13209v1 Announce Type: cross Abstract: We introduce LemonadeBench v0.5, a minimal benchmark for evaluating economic intuition, long-term planning, and decision-making under uncertainty in large language …
Aidan Vyas
3 views