DesignSense: A Human Preference Dataset and Reward Modeling Framework for Graphic Layout Generation
arXiv:2602.23438v1 Announce Type: cross Abstract: Graphic layouts serve as an important and engaging medium for visual communication across different channels. While recent layout generation models have demonstrated impressive capabilities, they frequently fail to align with nuanced human aesthetic judgment. Existing preference datasets and reward models trained on text-to-image generation do not generalize to layout evaluation, where the spatial arrangement of identical elements determines quality. To address this critical gap, we introduce DesignSense-10k, a large-scale dataset of 10,235 human-annotated preference pairs for graphic layout evaluation. We propose a five-stage curation pipeline that generates visually coherent layout transformations across diverse aspect ratios, using semantic grouping, layout prediction, filtering, clustering, and VLM-based refinement to produce high-quality comparison pairs. Human preferences are annotated using a 4-class scheme (left, right, both go
arXiv:2602.23438v1 Announce Type: cross Abstract: Graphic layouts serve as an important and engaging medium for visual communication across different channels. While recent layout generation models have demonstrated impressive capabilities, they frequently fail to align with nuanced human aesthetic judgment. Existing preference datasets and reward models trained on text-to-image generation do not generalize to layout evaluation, where the spatial arrangement of identical elements determines quality. To address this critical gap, we introduce DesignSense-10k, a large-scale dataset of 10,235 human-annotated preference pairs for graphic layout evaluation. We propose a five-stage curation pipeline that generates visually coherent layout transformations across diverse aspect ratios, using semantic grouping, layout prediction, filtering, clustering, and VLM-based refinement to produce high-quality comparison pairs. Human preferences are annotated using a 4-class scheme (left, right, both good, both bad) to capture subjective ambiguity. Leveraging this dataset, we train DesignSense, a vision-language model-based classifier that substantially outperforms existing open-source and proprietary models across comprehensive evaluation metrics (54.6% improvement in Macro F1 over the strongest proprietary baseline). Our analysis shows that frontier VLMs remain unreliable overall and fail catastrophically on the full four-class task, underscoring the need for specialized, preference-aware models. Beyond the dataset, our reward model DesignSense yields tangible downstream gains in layout generation. Using our judge during RL based training improves generator win rate by about 3%, while inference-time scaling, which involves generating multiple candidates and selecting the best one, provides a 3.6% improvement. These results highlight the practical impact of specialized, layout-aware preference modeling on real-world layout generation quality.
Executive Summary
This article introduces DesignSense-10k, a large-scale dataset of human-annotated preference pairs for graphic layout evaluation, and DesignSense, a vision-language model-based classifier that substantially outperforms existing models. The DesignSense dataset and reward model are designed to address the gap in existing preference datasets and reward models, which fail to generalize to layout evaluation. The results show that DesignSense yields tangible downstream gains in layout generation, including improved generator win rate and inference-time scaling. This study highlights the practical impact of specialized, layout-aware preference modeling on real-world layout generation quality. The proposed framework has the potential to improve the accuracy and efficiency of graphic layout generation models, enabling more effective visual communication across different channels.
Key Points
- ▸ DesignSense-10k is a large-scale dataset of human-annotated preference pairs for graphic layout evaluation.
- ▸ DesignSense is a vision-language model-based classifier that substantially outperforms existing models.
- ▸ The DesignSense dataset and reward model are designed to address the gap in existing preference datasets and reward models.
- ▸ The results show that DesignSense yields tangible downstream gains in layout generation.
Merits
Strength
The proposed framework addresses a critical gap in existing preference datasets and reward models, enabling more effective graphic layout generation models.
Strength
The DesignSense dataset and reward model outperform existing models across comprehensive evaluation metrics.
Strength
The study demonstrates the practical impact of specialized, layout-aware preference modeling on real-world layout generation quality.
Demerits
Limitation
The proposed framework may require significant computational resources and expertise to implement.
Limitation
The study focuses primarily on graphic layout generation and may not generalize to other visual design tasks.
Expert Commentary
The article makes a significant contribution to the field of visual design, particularly in the area of graphic layout generation. The proposed framework addresses a critical gap in existing preference datasets and reward models, enabling more effective graphic layout generation models. However, the study's focus on graphic layout generation may limit its generalizability to other visual design tasks. Furthermore, the proposed framework may require significant computational resources and expertise to implement. Nevertheless, the study's findings have important implications for the development of more accurate and efficient visual design models.
Recommendations
- ✓ Future studies should investigate the application of the proposed framework to other visual design tasks, such as image and video editing.
- ✓ Researchers should explore the development of more efficient and scalable methods for implementing the proposed framework.