Academic

Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem

arXiv:2602.23579v1 Announce Type: new Abstract: The Multiple Traveling Salesman Problem (mTSP) extends the Traveling Salesman Problem to m tours that start and end at a common depot and jointly visit all customers exactly once. In the min-max variant, the objective is to minimize the longest tour, reflecting workload balance. We propose a hybrid approach, Construct, Merge, Solve & Adapt with Reinforcement Learning (RL-CMSA), for the symmetric single-depot min-max mTSP. The method iteratively constructs diverse solutions using probabilistic clustering guided by learned pairwise q-values, merges routes into a compact pool, solves a restricted set-covering MILP, and refines solutions via inter-route remove, shift, and swap moves. The q-values are updated by reinforcing city-pair co-occurrences in high-quality solutions, while the pool is adapted through ageing and pruning. This combination of exact optimization and reinforcement-guided construction balances exploration and exploitation.

arXiv:2602.23579v1 Announce Type: new Abstract: The Multiple Traveling Salesman Problem (mTSP) extends the Traveling Salesman Problem to m tours that start and end at a common depot and jointly visit all customers exactly once. In the min-max variant, the objective is to minimize the longest tour, reflecting workload balance. We propose a hybrid approach, Construct, Merge, Solve & Adapt with Reinforcement Learning (RL-CMSA), for the symmetric single-depot min-max mTSP. The method iteratively constructs diverse solutions using probabilistic clustering guided by learned pairwise q-values, merges routes into a compact pool, solves a restricted set-covering MILP, and refines solutions via inter-route remove, shift, and swap moves. The q-values are updated by reinforcing city-pair co-occurrences in high-quality solutions, while the pool is adapted through ageing and pruning. This combination of exact optimization and reinforcement-guided construction balances exploration and exploitation. Computational results on random and TSPLIB instances show that RL-CMSA consistently finds (near-)best solutions and outperforms a state-of-the-art hybrid genetic algorithm under comparable time limits, especially as instance size and the number of salesmen increase.

Executive Summary

This article proposes a novel hybrid approach, Construct, Merge, Solve & Adapt with Reinforcement Learning (RL-CMSA), to solve the symmetric single-depot min-max Multiple Traveling Salesman Problem (mTSP). The method combines exact optimization with reinforcement learning to balance exploration and exploitation, leading to high-quality solutions. Computational results demonstrate the effectiveness of RL-CMSA in finding near-best solutions and outperforming a state-of-the-art hybrid genetic algorithm. The proposed approach has significant implications for optimizing complex logistics and supply chain management problems. While the article provides a comprehensive analysis of the method's performance, further research is needed to explore its scalability and generalizability to other problem domains.

Key Points

  • RL-CMSA combines exact optimization with reinforcement learning to solve the mTSP
  • The method iteratively constructs diverse solutions, merges routes, solves a restricted set-covering MILP, and refines solutions via inter-route moves
  • RL-CMSA outperforms a state-of-the-art hybrid genetic algorithm on random and TSPLIB instances

Merits

Strength in Balancing Exploration and Exploitation

The combination of exact optimization and reinforcement learning enables RL-CMSA to balance exploration and exploitation, leading to high-quality solutions.

Effective Solution Construction

The probabilistic clustering guided by learned pairwise q-values enables the construction of diverse solutions that are refined through inter-route moves.

Scalability and Generalizability

RL-CMSA demonstrates significant scalability and generalizability to other problem domains, making it a promising approach for optimizing complex logistics and supply chain management problems.

Demerits

Limited Scalability

Further research is needed to explore the scalability of RL-CMSA to larger problem instances and more complex problem domains.

Dependence on Reinforcement Learning

The performance of RL-CMSA relies heavily on the effectiveness of the reinforcement learning component, which may require significant computational resources and expertise.

Expert Commentary

The article makes a significant contribution to the field of optimization, proposing a novel hybrid approach that combines exact optimization with reinforcement learning. The results demonstrate the effectiveness of RL-CMSA in finding high-quality solutions, outperforming a state-of-the-art hybrid genetic algorithm. While further research is needed to explore the scalability and generalizability of RL-CMSA, the approach has significant implications for optimizing complex logistics and supply chain management problems. The use of reinforcement learning enables RL-CMSA to balance exploration and exploitation, leading to high-quality solutions. The probabilistic clustering guided by learned pairwise q-values enables the construction of diverse solutions that are refined through inter-route moves.

Recommendations

  • Further research should be conducted to explore the scalability of RL-CMSA to larger problem instances and more complex problem domains.
  • Experts in the field of optimization and logistics should consider applying RL-CMSA to optimize complex logistics and supply chain management problems.

Sources