Academic

JANUS: Structured Bidirectional Generation for Guaranteed Constraints and Analytical Uncertainty

arXiv:2603.03748v1 Announce Type: new Abstract: High-stakes synthetic data generation faces a fundamental Quadrilemma: achieving Fidelity to the original distribution, Control over complex logical constraints, Reliability in uncertainty estimation, and Efficiency in computational cost -- simultaneously. State-of-the-art Deep Generative Models (CTGAN, TabDDPM) excel at fidelity but rely on inefficient rejection sampling for continuous range constraints. Conversely, Structural Causal Models offer logical control but struggle with high-dimensional fidelity and complex noise inversion. We introduce JANUS (Joint Ancestral Network for Uncertainty and Synthesis), a framework that unifies these capabilities using a DAG of Bayesian Decision Trees. Our key innovation is Reverse-Topological Back-filling, an algorithm that propagates constraints backwards through the causal graph, achieving 100% constraint satisfaction on feasible constraint sets without rejection sampling. This is paired with an

T
Taha Racicot
· · 1 min read · 10 views

arXiv:2603.03748v1 Announce Type: new Abstract: High-stakes synthetic data generation faces a fundamental Quadrilemma: achieving Fidelity to the original distribution, Control over complex logical constraints, Reliability in uncertainty estimation, and Efficiency in computational cost -- simultaneously. State-of-the-art Deep Generative Models (CTGAN, TabDDPM) excel at fidelity but rely on inefficient rejection sampling for continuous range constraints. Conversely, Structural Causal Models offer logical control but struggle with high-dimensional fidelity and complex noise inversion. We introduce JANUS (Joint Ancestral Network for Uncertainty and Synthesis), a framework that unifies these capabilities using a DAG of Bayesian Decision Trees. Our key innovation is Reverse-Topological Back-filling, an algorithm that propagates constraints backwards through the causal graph, achieving 100% constraint satisfaction on feasible constraint sets without rejection sampling. This is paired with an Analytical Uncertainty Decomposition derived from Dirichlet priors, enabling 128x faster uncertainty estimation than Monte Carlo methods. Across 15 datasets and 523 constrained scenarios, JANUS achieves state-of-the-art fidelity (Detection Score 0.497), eliminates mode collapse on imbalanced data, and provides exact handling of complex inter-column constraints (e.g., Salary_offered >= Salary_requested) where baselines fail entirely.

Executive Summary

JANUS, a structured bidirectional generation framework, addresses the Quadrilemma in high-stakes synthetic data generation by unifying capabilities in fidelity, logical control, reliability, and efficiency. The framework leverages a DAG of Bayesian Decision Trees and Reverse-Topological Back-filling, achieving 100% constraint satisfaction without rejection sampling. Analytical Uncertainty Decomposition derived from Dirichlet priors enables 128x faster uncertainty estimation than Monte Carlo methods. JANUS achieves state-of-the-art fidelity, eliminates mode collapse, and handles complex inter-column constraints with exactitude, outperforming baselines in 523 constrained scenarios.

Key Points

  • JANUS addresses the Quadrilemma in high-stakes synthetic data generation
  • The framework unifies capabilities in fidelity, logical control, reliability, and efficiency
  • Reverse-Topological Back-filling achieves 100% constraint satisfaction without rejection sampling

Merits

Unified framework

JANUS integrates multiple capabilities, providing a comprehensive solution for high-stakes synthetic data generation

Efficient constraint satisfaction

Reverse-Topological Back-filling enables 100% constraint satisfaction without rejection sampling, reducing computational cost

Analytical uncertainty estimation

JANUS' Analytical Uncertainty Decomposition derived from Dirichlet priors enables 128x faster uncertainty estimation than Monte Carlo methods

Demerits

Complexity

The framework's reliance on a DAG of Bayesian Decision Trees may introduce additional complexity, requiring expertise in graph-based models

Scalability

JANUS' performance on high-dimensional datasets and large constraint sets has not been extensively evaluated

Expert Commentary

JANUS is a significant contribution to the field of synthetic data generation, addressing a pressing challenge in the Quadrilemma. The framework's innovative use of Reverse-Topological Back-filling and Analytical Uncertainty Decomposition demonstrates a deep understanding of the underlying complexities. While the framework's complexity and scalability may pose challenges, JANUS has the potential to revolutionize industries that rely on synthetic data. As the field continues to evolve, it will be essential to explore the intersection of causal inference, graph-based models, and synthetic data generation.

Recommendations

  • Further evaluation of JANUS on high-dimensional datasets and large constraint sets is necessary to assess its scalability and performance
  • Investigation of the framework's potential applications in emerging industries, such as autonomous vehicles and robotics, is warranted

Sources