| [1] |
Applegate D, Bixby RE, Chvátal V, Cook WJ. 2019. Concorde TSP Solver. www.math.uwaterloo.ca/tsp/concorde.html (Accessed 16 May 2024) |
| [2] |
Gurobi Optimization, LLC. 2022. Gurobi Optimizer Reference Manual. www.gurobi.com |
| [3] |
Flood MM. 1956. The traveling-salesman problem. |
| [4] |
Rosenkrantz DJ, Stearns RE, Lewis PM. 1977. An analysis of several heuristics for the traveling salesman problem. |
| [5] |
Mazyavkina N, Sviridov S, Ivanov S, Burnaev E. 2021. Reinforcement learning for combinatorial optimization: a survey. |
| [6] |
Chen D, Imdahl C, Lai D, Van Woensel T. 2025. The Dynamic Traveling Salesman Problem with Time-Dependent and Stochastic travel times: a deep reinforcement learning approach. |
| [7] |
Lähdeaho O, Hilmola OP. 2024. An exploration of quantitative models and algorithms for vehicle routing optimization and traveling salesman problems. |
| [8] |
Li J, Ma Y, Gao R, Cao Z, Lim A, et al. 2022. Deep reinforcement learning for solving the heterogeneous capacitated vehicle routing problem. |
| [9] |
Zhang R, Prokhorchuk A, Dauwels J. 2020. Deep reinforcement learning for traveling salesman problem with time windows and rejections. 2020 International Joint Conference on Neural Networks (IJCNN). July 19−24, 2020. Glasgow, United Kingdom. USA: IEEE. pp. 1−8 doi: 10.1109/ijcnn48605.2020.9207026 |
| [10] |
Zhang R, Zhang C, Cao Z, Song W, Tan PS, et al. 2023. Learning to solve multiple-TSP with time window and rejections via deep reinforcement learning. |
| [11] |
Golden BL, Levy L, Vohra R. 1987. The orienteering problem. |
| [12] |
Tsiligirides T. 1984. Heuristic methods applied to orienteering. |
| [13] |
Kobeaga G, Merino M, Lozano JA. 2020. A revisited branch-and-cut algorithm for large-scale orienteering problems. |
| [14] |
Kobeaga G, Merino M, Lozano JA. 2018. An efficient evolutionary algorithm for the orienteering problem. |
| [15] |
Bellman R. 1957. A Markovian decision process. |
| [16] |
Karp RM. 1977. Probabilistic analysis of partitioning algorithms for the traveling-salesman problem in the plane. |
| [17] |
Traub V, Vygen J. 2024. Approximation Algorithms for Traveling Salesman Problems. Cambridge, UK: Cambridge University Press. doi: 10.1017/9781009445436 |
| [18] |
Strutz T. 2021. Travelling santa problem: optimization of a million-households tour within one hour. |
| [19] |
Valenzuela CL, Jones AJ. 1993. Evolutionary divide and conquer (I): a novel genetic approach to the TSP. |
| [20] |
Liao E, Liu C. 2018. A hierarchical algorithm based on density peaks clustering and ant colony optimization for traveling salesman problem. |
| [21] |
Mariescu-Istodor R, Fränti P. 2021. Solving the large-scale TSP problem in 1 h: santa Claus challenge 2020. |
| [22] |
Alanzi E, El Bachir Menai M. 2025. Solving the traveling salesman problem with machine learning: a review of recent advances and challenges. |
| [23] |
Bengio Y, Lodi A, Prouvost A. 2021. Machine learning for combinatorial optimization: a methodological tour d'horizon. |
| [24] |
Deudon M, Cournut P, Lacoste A, Adulyasak Y, Rousseau LM. 2018. Learning heuristics for the TSP by policy gradient. In Integration of Constraint Programming, Artificial Intelligence, and Operations Research, ed. van Hoeve WJ. Cham: Springer. pp. 170−181 doi: 10.1007/978-3-319-93031-2_12 |
| [25] |
Vinyals O, Fortunato M, Jaitly N. 2015. Pointer Networks. Advances in Neural Information Processing Systems 28 (NIPS 2015). pp. 1−9 https://proceedings.neurips.cc/paper_files/paper/2015/hash/29921001f2f04bd3baee84a12e98098f-Abstract.html |
| [26] |
Bello I, Pham H, Le QV, Norouzi M, Bengio S. 2016. Neural combinatorial optimization with reinforcement learning. |
| [27] |
Kool W, van Hoof H, Welling M. 2018. Attention, learn to solve routing problems! |
| [28] |
Wang J, Xiao C, Wang S, Ruan Y. 2023. Reinforcement learning for the traveling salesman problem: Performance comparison of three algorithms. |
| [29] |
Bresson X, Laurent T. 2021. The transformer network for the traveling salesman problem. |
| [30] |
Dai H, Khalil EB, Zhang Y, Dilkina B, Song L. 2017. Learning combinatorial optimization algorithms over graphs. |
| [31] |
Joshi CK, Laurent T, Bresson X. 2019. An efficient graph convolutional network technique for the travelling salesman problem. |
| [32] |
BinJubier MB, Ismail MA, Tusher EH, Aljanabi M, University A. 2024. A GPU accelerated parallel genetic algorithm for the traveling salesman problem. |
| [33] |
Ruan Y, Cai W, Wang J. 2024. Combining reinforcement learning algorithm and genetic algorithm to solve the traveling salesman problem. |
| [34] |
Watkins CJCH, Dayan P. 1992. Q-learning. |
| [35] |
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, et al. 2015. Human-level control through deep reinforcement learning. |
| [36] |
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, et al. 2013. Playing atari with deep reinforcement learning. |
| [37] |
Van Hasselt H, Guez A, Silver D. 2016. Deep reinforcement learning with double Q-learning. |
| [38] |
Hasselt H. 2010. Double Q-learning. Advances in Neural Information Processing Systems 23 (NIPS 2010). pp. 1−9 https://proceedings.neurips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html |
| [39] |
Schulman J, Moritz P, Levine S, Jordan M, Abbeel P. 2015. High-dimensional continuous control using generalized advantage estimation. |
| [40] |
Sutton RS, McAllester D, Singh S, Mansour Y. 1999. Policy gradient methods for reinforcement learning with function approximation. Proceedings of the 13th International Conference on Neural Information Processing Systems, 29 November 1999, Denver, CO. ACM. pp. 1057−1063 https://proceedings.neurips.cc/paper_files/paper/1999/hash/464d828b85b0bed98e80ade0a5c43b0f-Abstract.html |
| [41] |
Weng L. 2018. Exploration strategies in deep reinforcement learning. https://lilianweng.github.io/posts/2020-06-07-exploration-drl/ (Accessed 16 May 2024) |
| [42] |
Achiam J. 2018. Spinning up in deep reinforcement learning. https://spinningup.openai.com/en/latest/algorithms/sac.html (Accessed 16 May 2024) |
| [43] |
Williams RJ. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. |
| [44] |
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, et al. 2016. Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning, 20–22 June 2016, New York, USA. vol. 48. PMLR. pp. 1928–1937 https://proceedings.mlr.press/v48/mniha16.html?ref= |
| [45] |
Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, et al. 2017. Overcoming catastrophic forgetting in neural networks. |
| [46] |
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. 2017. Proximal policy optimization algorithms. |
| [47] |
Haarnoja T, Zhou A, Abbeel P, Levine S. 2018. Soft Actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the 35th International Conference on Machine Learning, 10–15 July 2018, Stockholmsmässan, Stockholm Sweden. vol. 80. PMLR. pp. 1861–1870 https://proceedings.mlr.press/v80/haarnoja18b |
| [48] |
Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, et al. 2018. Soft actor-critic algorithms and applications. |
| [49] |
Duan J, Wang W, Xiao L, Gao J, Li SE, et al. 2025. Distributional soft actor-critic with three refinements. |
| [50] |
Bahdanau D, Cho K, Bengio Y. 2014. Neural machine translation by jointly learning to align and translate. |
| [51] |
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, et al. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. pp. 5998–6008 https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html |
| [52] |
Hochreiter S, Schmidhuber J. 1997. Long short-term memory. |
| [53] |
Alammar J. 2018. The illustrated transformer. https://jalammar.github.io/illustrated-transformer (Accessed 16 May 2024) |
| [54] |
Nazari M, Oroojlooy A, Snyder LV, Takáč M. 2018. Reinforcement learning for solving the vehicle routing problem. |
| [55] |
Weng J, Chen H, Yan D, You K, Duburcq A, et al. 2021. Tianshou: a highly modularized deep reinforcement learning library. |
| [56] |
Pinto L, Davidson J, Sukthankar R, Gupta A. 2017. Robust adversarial reinforcement learning. Proceedings of the 34th International Conference on Machine Learning. August 6−11, 2017, Sydney, NSW, Australia. vol. 70. PMLR. pp. 2817−2826 https://proceedings.mlr.press/v70/pinto17a.html |
| [57] |
Liessner R, Schmitt J, Dietermann A, Bäker B. 2019. Hyperparameter optimization for deep reinforcement learning in vehicle energy management. Proceedings of the 11th International Conference on Agents and Artificial Intelligence. February 19−21, 2019. Prague, Czech Republic. Portugal: SciTePress. pp. 134−144 doi: 10.5220/0007364701340144 |