Interpretable clustering via optimal multiway-split decision trees
arXiv:2602.13586v1 Announce Type: new Abstract: Clustering serves as a vital tool for uncovering latent data structures, and achieving both high accuracy and interpretability is essential. To this end, existing methods typically construct binary decision trees by solving mixed-integer nonlinear optimization problems, often leading to significant computational costs and suboptimal solutions. Furthermore, binary decision trees frequently result in excessively deep structures, which makes them difficult to interpret. To mitigate these issues, we propose an interpretable clustering method based on optimal multiway-split decision trees, formulated as a 0-1 integer linear optimization problem. This reformulation renders the optimization problem more tractable compared to existing models. A key feature of our method is the integration of a one-dimensional K-means algorithm for the discretization of continuous variables, allowing for flexible and data-driven branching. Extensive numerical exp
arXiv:2602.13586v1 Announce Type: new Abstract: Clustering serves as a vital tool for uncovering latent data structures, and achieving both high accuracy and interpretability is essential. To this end, existing methods typically construct binary decision trees by solving mixed-integer nonlinear optimization problems, often leading to significant computational costs and suboptimal solutions. Furthermore, binary decision trees frequently result in excessively deep structures, which makes them difficult to interpret. To mitigate these issues, we propose an interpretable clustering method based on optimal multiway-split decision trees, formulated as a 0-1 integer linear optimization problem. This reformulation renders the optimization problem more tractable compared to existing models. A key feature of our method is the integration of a one-dimensional K-means algorithm for the discretization of continuous variables, allowing for flexible and data-driven branching. Extensive numerical experiments on publicly available real-world datasets demonstrate that our method outperforms baseline methods in terms of clustering accuracy and interpretability. Our method yields multiway-split decision trees with concise decision rules while maintaining competitive performance across various evaluation metrics.