Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem
arXiv:2602.23579v1 Announce Type: new Abstract: The Multiple Traveling Salesman Problem (mTSP) extends the Traveling Salesman Problem to m tours that start and end at a common depot and jointly visit all customers exactly once. In the min-max variant, the objective is to minimize the longest tour, reflecting workload balance. We propose a hybrid approach, Construct, Merge, Solve & Adapt with Reinforcement Learning (RL-CMSA), for the symmetric single-depot min-max mTSP. The method iteratively constructs diverse solutions using probabilistic clustering guided by learned pairwise q-values, merges routes into a compact pool, solves a restricted set-covering MILP, and refines solutions via inter-route remove, shift, and swap moves. The q-values are updated by reinforcing city-pair co-occurrences in high-quality solutions, while the pool is adapted through ageing and pruning. This combination of exact optimization and reinforcement-guided construction balances exploration and exploitation.
arXiv:2602.23579v1 Announce Type: new Abstract: The Multiple Traveling Salesman Problem (mTSP) extends the Traveling Salesman Problem to m tours that start and end at a common depot and jointly visit all customers exactly once. In the min-max variant, the objective is to minimize the longest tour, reflecting workload balance. We propose a hybrid approach, Construct, Merge, Solve & Adapt with Reinforcement Learning (RL-CMSA), for the symmetric single-depot min-max mTSP. The method iteratively constructs diverse solutions using probabilistic clustering guided by learned pairwise q-values, merges routes into a compact pool, solves a restricted set-covering MILP, and refines solutions via inter-route remove, shift, and swap moves. The q-values are updated by reinforcing city-pair co-occurrences in high-quality solutions, while the pool is adapted through ageing and pruning. This combination of exact optimization and reinforcement-guided construction balances exploration and exploitation. Computational results on random and TSPLIB instances show that RL-CMSA consistently finds (near-)best solutions and outperforms a state-of-the-art hybrid genetic algorithm under comparable time limits, especially as instance size and the number of salesmen increase.
Executive Summary
This article proposes a novel hybrid approach, Construct, Merge, Solve & Adapt with Reinforcement Learning (RL-CMSA), to solve the symmetric single-depot min-max Multiple Traveling Salesman Problem (mTSP). The method combines exact optimization with reinforcement learning to balance exploration and exploitation, leading to high-quality solutions. Computational results demonstrate the effectiveness of RL-CMSA in finding near-best solutions and outperforming a state-of-the-art hybrid genetic algorithm. The proposed approach has significant implications for optimizing complex logistics and supply chain management problems. While the article provides a comprehensive analysis of the method's performance, further research is needed to explore its scalability and generalizability to other problem domains.
Key Points
- ▸ RL-CMSA combines exact optimization with reinforcement learning to solve the mTSP
- ▸ The method iteratively constructs diverse solutions, merges routes, solves a restricted set-covering MILP, and refines solutions via inter-route moves
- ▸ RL-CMSA outperforms a state-of-the-art hybrid genetic algorithm on random and TSPLIB instances
Merits
Strength in Balancing Exploration and Exploitation
The combination of exact optimization and reinforcement learning enables RL-CMSA to balance exploration and exploitation, leading to high-quality solutions.
Effective Solution Construction
The probabilistic clustering guided by learned pairwise q-values enables the construction of diverse solutions that are refined through inter-route moves.
Scalability and Generalizability
RL-CMSA demonstrates significant scalability and generalizability to other problem domains, making it a promising approach for optimizing complex logistics and supply chain management problems.
Demerits
Limited Scalability
Further research is needed to explore the scalability of RL-CMSA to larger problem instances and more complex problem domains.
Dependence on Reinforcement Learning
The performance of RL-CMSA relies heavily on the effectiveness of the reinforcement learning component, which may require significant computational resources and expertise.
Expert Commentary
The article makes a significant contribution to the field of optimization, proposing a novel hybrid approach that combines exact optimization with reinforcement learning. The results demonstrate the effectiveness of RL-CMSA in finding high-quality solutions, outperforming a state-of-the-art hybrid genetic algorithm. While further research is needed to explore the scalability and generalizability of RL-CMSA, the approach has significant implications for optimizing complex logistics and supply chain management problems. The use of reinforcement learning enables RL-CMSA to balance exploration and exploitation, leading to high-quality solutions. The probabilistic clustering guided by learned pairwise q-values enables the construction of diverse solutions that are refined through inter-route moves.
Recommendations
- ✓ Further research should be conducted to explore the scalability of RL-CMSA to larger problem instances and more complex problem domains.
- ✓ Experts in the field of optimization and logistics should consider applying RL-CMSA to optimize complex logistics and supply chain management problems.