Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models
arXiv:2602.15270v1 Announce Type: new Abstract: Generating realistic synthetic populations is essential for agent-based models (ABM) in transportation and urban planning. Current methods face two major limitations. First, many rely on a single dataset or follow a sequential data fusion and generation process, which means they fail to capture the complex interplay between features. Second, these approaches struggle with sampling zeros (valid but unobserved attribute combinations) and structural zeros (infeasible combinations due to logical constraints), which reduce the diversity and feasibility of the generated data. This study proposes a novel method to simultaneously integrate and synthesize multi-source datasets using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty. This joint learning method improves both the diversity and feasibility of synthetic data by defining a regularization term (inverse gradient penalty) for the generator loss function. For the ev
arXiv:2602.15270v1 Announce Type: new Abstract: Generating realistic synthetic populations is essential for agent-based models (ABM) in transportation and urban planning. Current methods face two major limitations. First, many rely on a single dataset or follow a sequential data fusion and generation process, which means they fail to capture the complex interplay between features. Second, these approaches struggle with sampling zeros (valid but unobserved attribute combinations) and structural zeros (infeasible combinations due to logical constraints), which reduce the diversity and feasibility of the generated data. This study proposes a novel method to simultaneously integrate and synthesize multi-source datasets using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty. This joint learning method improves both the diversity and feasibility of synthetic data by defining a regularization term (inverse gradient penalty) for the generator loss function. For the evaluation, we implement a unified evaluation metric for similarity, and place special emphasis on measuring diversity and feasibility through recall, precision, and the F1 score. Results show that the proposed joint approach outperforms the sequential baseline, with recall increasing by 7\% and precision by 15\%. Additionally, the regularization term further improves diversity and feasibility, reflected in a 10\% increase in recall and 1\% in precision. We assess similarity distributions using a five-metric score. The joint approach performs better overall, and reaches a score of 88.1 compared to 84.6 for the sequential method. Since synthetic populations serve as a key input for ABM, this multi-source generative approach has the potential to significantly enhance the accuracy and reliability of ABM.
Executive Summary
This study proposes a novel method for generating realistic synthetic populations using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty. The joint learning approach integrates and synthesizes multiple datasets, improving diversity and feasibility of the generated data. The results show that the proposed method outperforms a sequential baseline, with significant increases in recall and precision. The regularization term further enhances diversity and feasibility. The study highlights the potential of this multi-source generative approach to improve the accuracy and reliability of agent-based models (ABM) in transportation and urban planning.
Key Points
- ▸ Joint population synthesis from multi-source data using generative models is proposed.
- ▸ The method integrates and synthesizes multiple datasets, improving diversity and feasibility.
- ▸ The WGAN with gradient penalty and regularization term outperforms the sequential baseline.
Merits
Improved Diversity and Feasibility
The proposed method significantly enhances diversity and feasibility of synthetic data, enabling more accurate and reliable agent-based models (ABM).
Multi-Source Data Integration
The joint learning approach integrates and synthesizes multiple datasets, capturing the complex interplay between features.
Demerits
Data Quality and Availability
The performance of the proposed method may be limited by the quality and availability of the source datasets.
Computational Complexity
The WGAN with gradient penalty and regularization term may require significant computational resources and processing time.
Expert Commentary
The proposed method is a significant contribution to the field of agent-based modeling (ABM) and data integration and synthesis. The WGAN with gradient penalty and regularization term offers a robust and efficient approach to generating realistic synthetic populations. However, the method's performance may be sensitive to data quality and availability, and the computational complexity may be a limiting factor. Furthermore, the study highlights the need for policymakers to consider the diversity and feasibility of synthetic data when developing and implementing ABM-based policies.
Recommendations
- ✓ Future studies should investigate the application of the proposed method in various domains, including healthcare and finance.
- ✓ The authors should explore the use of other generative models, such as Variational Autoencoders (VAEs), to compare their performance with the proposed method.