[1]

Applegate D, Bixby RE, Chvátal V, Cook WJ. 2019. Concorde TSP Solver. www.math.uwaterloo.ca/tsp/concorde.html (Accessed 16 May 2024)

[2]

Gurobi Optimization, LLC. 2022. Gurobi Optimizer Reference Manual. www.gurobi.com

[3]

Flood MM. 1956. The traveling-salesman problem. Operations Research 4(1):61−75

doi: 10.1287/opre.4.1.61
[4]

Rosenkrantz DJ, Stearns RE, Lewis PM. 1977. An analysis of several heuristics for the traveling salesman problem. SIAM Journal on Computing 6(3):563−581

doi: 10.1137/0206041
[5]

Mazyavkina N, Sviridov S, Ivanov S, Burnaev E. 2021. Reinforcement learning for combinatorial optimization: a survey. Computers & Operations Research 134:105400

doi: 10.1016/j.cor.2021.105400
[6]

Chen D, Imdahl C, Lai D, Van Woensel T. 2025. The Dynamic Traveling Salesman Problem with Time-Dependent and Stochastic travel times: a deep reinforcement learning approach. Transportation Research Part C: Emerging Technologies 172:105022

doi: 10.1016/j.trc.2025.105022
[7]

Lähdeaho O, Hilmola OP. 2024. An exploration of quantitative models and algorithms for vehicle routing optimization and traveling salesman problems. Supply Chain Analytics 5:100056

doi: 10.1016/j.sca.2023.100056
[8]

Li J, Ma Y, Gao R, Cao Z, Lim A, et al. 2022. Deep reinforcement learning for solving the heterogeneous capacitated vehicle routing problem. IEEE Transactions on Cybernetics 52(12):13572−13585

doi: 10.1109/TCYB.2021.3111082
[9]

Zhang R, Prokhorchuk A, Dauwels J. 2020. Deep reinforcement learning for traveling salesman problem with time windows and rejections. 2020 International Joint Conference on Neural Networks (IJCNN). July 19−24, 2020. Glasgow, United Kingdom. USA: IEEE. pp. 1−8 doi: 10.1109/ijcnn48605.2020.9207026

[10]

Zhang R, Zhang C, Cao Z, Song W, Tan PS, et al. 2023. Learning to solve multiple-TSP with time window and rejections via deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems 24(1):1325−1336

doi: 10.1109/TITS.2022.3207011
[11]

Golden BL, Levy L, Vohra R. 1987. The orienteering problem. Naval Research Logistics 34(3):307−318

doi: 10.1002/1520-6750(198706)34:3307::aid-nav3220340302>3.0.co;2-d
[12]

Tsiligirides T. 1984. Heuristic methods applied to orienteering. Journal of the Operational Research Society 35(9):797−809

doi: 10.1057/jors.1984.162
[13]

Kobeaga G, Merino M, Lozano JA. 2020. A revisited branch-and-cut algorithm for large-scale orienteering problems. arXiv 2011.02743

doi: 10.48550/arXiv.2011.02743
[14]

Kobeaga G, Merino M, Lozano JA. 2018. An efficient evolutionary algorithm for the orienteering problem. Computers & Operations Research 90:42−59

doi: 10.1016/j.cor.2017.09.003
[15]

Bellman R. 1957. A Markovian decision process. Indiana University Mathematics Journal 6(4):679−684

doi: 10.1512/iumj.1957.6.56038
[16]

Karp RM. 1977. Probabilistic analysis of partitioning algorithms for the traveling-salesman problem in the plane. Mathematics of Operations Research 2(3):209−224

doi: 10.1287/moor.2.3.209
[17]

Traub V, Vygen J. 2024. Approximation Algorithms for Traveling Salesman Problems. Cambridge, UK: Cambridge University Press. doi: 10.1017/9781009445436

[18]

Strutz T. 2021. Travelling santa problem: optimization of a million-households tour within one hour. Frontiers in Robotics and AI 8:652417

doi: 10.3389/frobt.2021.652417
[19]

Valenzuela CL, Jones AJ. 1993. Evolutionary divide and conquer (I): a novel genetic approach to the TSP. Evolutionary Computation 1(4):313−333

doi: 10.1162/evco.1993.1.4.313
[20]

Liao E, Liu C. 2018. A hierarchical algorithm based on density peaks clustering and ant colony optimization for traveling salesman problem. IEEE Access 6:38921−38933

doi: 10.1109/ACCESS.2018.2853129
[21]

Mariescu-Istodor R, Fränti P. 2021. Solving the large-scale TSP problem in 1 h: santa Claus challenge 2020. Frontiers in Robotics and AI 8:689908

doi: 10.3389/frobt.2021.689908
[22]

Alanzi E, El Bachir Menai M. 2025. Solving the traveling salesman problem with machine learning: a review of recent advances and challenges. Artificial Intelligence Review 58(9):267

doi: 10.1007/s10462-025-11267-x
[23]

Bengio Y, Lodi A, Prouvost A. 2021. Machine learning for combinatorial optimization: a methodological tour d'horizon. European Journal of Operational Research 290(2):405−421

doi: 10.1016/j.ejor.2020.07.063
[24]

Deudon M, Cournut P, Lacoste A, Adulyasak Y, Rousseau LM. 2018. Learning heuristics for the TSP by policy gradient. In Integration of Constraint Programming, Artificial Intelligence, and Operations Research, ed. van Hoeve WJ. Cham: Springer. pp. 170−181 doi: 10.1007/978-3-319-93031-2_12

[25]

Vinyals O, Fortunato M, Jaitly N. 2015. Pointer Networks. Advances in Neural Information Processing Systems 28 (NIPS 2015). pp. 1−9 https://proceedings.neurips.cc/paper_files/paper/2015/hash/29921001f2f04bd3baee84a12e98098f-Abstract.html

[26]

Bello I, Pham H, Le QV, Norouzi M, Bengio S. 2016. Neural combinatorial optimization with reinforcement learning. arXiv 1611.09940

doi: 10.48550/arXiv.1611.09940
[27]

Kool W, van Hoof H, Welling M. 2018. Attention, learn to solve routing problems! arXiv 1803.08475

doi: 10.48550/arXiv.1803.08475
[28]

Wang J, Xiao C, Wang S, Ruan Y. 2023. Reinforcement learning for the traveling salesman problem: Performance comparison of three algorithms. The Journal of Engineering 2023(9):e12303

doi: 10.1049/tje2.12303
[29]

Bresson X, Laurent T. 2021. The transformer network for the traveling salesman problem. arXiv 2103.03012

doi: 10.48550/arXiv.2103.03012
[30]

Dai H, Khalil EB, Zhang Y, Dilkina B, Song L. 2017. Learning combinatorial optimization algorithms over graphs. arXiv 1704.01665

doi: 10.48550/arXiv.1704.01665
[31]

Joshi CK, Laurent T, Bresson X. 2019. An efficient graph convolutional network technique for the travelling salesman problem. arXiv 1906.01227

doi: 10.48550/arXiv.1906.01227
[32]

BinJubier MB, Ismail MA, Tusher EH, Aljanabi M, University A. 2024. A GPU accelerated parallel genetic algorithm for the traveling salesman problem. Journal of Soft Computing and Data Mining 5(2):137−150

doi: 10.30880/jscdm.2024.05.02.010
[33]

Ruan Y, Cai W, Wang J. 2024. Combining reinforcement learning algorithm and genetic algorithm to solve the traveling salesman problem. The Journal of Engineering 2024(6):e12393

doi: 10.1049/tje2.12393
[34]

Watkins CJCH, Dayan P. 1992. Q-learning. Machine Learning 8(3):279−292

doi: 10.1007/BF00992698
[35]

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, et al. 2015. Human-level control through deep reinforcement learning. Nature 518:529−533

doi: 10.1038/nature14236
[36]

Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, et al. 2013. Playing atari with deep reinforcement learning. arXiv 1312.5602

doi: 10.48550/arXiv.1312.5602
[37]

Van Hasselt H, Guez A, Silver D. 2016. Deep reinforcement learning with double Q-learning. Proceedings of the AAAI Conference on Artificial Intelligence 30(1):2094−3100

doi: 10.1609/aaai.v30i1.10295
[38]

Hasselt H. 2010. Double Q-learning. Advances in Neural Information Processing Systems 23 (NIPS 2010). pp. 1−9 https://proceedings.neurips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html

[39]

Schulman J, Moritz P, Levine S, Jordan M, Abbeel P. 2015. High-dimensional continuous control using generalized advantage estimation. arXiv 1506.02438

doi: 10.48550/arXiv.1506.02438
[40]

Sutton RS, McAllester D, Singh S, Mansour Y. 1999. Policy gradient methods for reinforcement learning with function approximation. Proceedings of the 13th International Conference on Neural Information Processing Systems, 29 November 1999, Denver, CO. ACM. pp. 1057−1063 https://proceedings.neurips.cc/paper_files/paper/1999/hash/464d828b85b0bed98e80ade0a5c43b0f-Abstract.html

[41]

Weng L. 2018. Exploration strategies in deep reinforcement learning. https://lilianweng.github.io/posts/2020-06-07-exploration-drl/ (Accessed 16 May 2024)

[42]

Achiam J. 2018. Spinning up in deep reinforcement learning. https://spinningup.openai.com/en/latest/algorithms/sac.html (Accessed 16 May 2024)

[43]

Williams RJ. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8(3):229−256

doi: 10.1007/BF00992696
[44]

Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, et al. 2016. Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning, 20–22 June 2016, New York, USA. vol. 48. PMLR. pp. 1928–1937 https://proceedings.mlr.press/v48/mniha16.html?ref=

[45]

Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America 114(13):3521−3526

doi: 10.1073/pnas.1611835114
[46]

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. 2017. Proximal policy optimization algorithms. arXiv 1707.06347

doi: 10.48550/arXiv.1707.06347
[47]

Haarnoja T, Zhou A, Abbeel P, Levine S. 2018. Soft Actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the 35th International Conference on Machine Learning, 10–15 July 2018, Stockholmsmässan, Stockholm Sweden. vol. 80. PMLR. pp. 1861–1870 https://proceedings.mlr.press/v80/haarnoja18b

[48]

Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, et al. 2018. Soft actor-critic algorithms and applications. arXiv 1812.05905

doi: 10.48550/arXiv.1812.05905
[49]

Duan J, Wang W, Xiao L, Gao J, Li SE, et al. 2025. Distributional soft actor-critic with three refinements. IEEE Transactions on Pattern Analysis and Machine Intelligence 47(5):3935−3946

doi: 10.1109/TPAMI.2025.3537087
[50]

Bahdanau D, Cho K, Bengio Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv 1409.0473

doi: 10.48550/arXiv.1409.0473
[51]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, et al. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. pp. 5998–6008 https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

[52]

Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Computation 9(8):1735−1780

doi: 10.1162/neco.1997.9.8.1735
[53]

Alammar J. 2018. The illustrated transformer. https://jalammar.github.io/illustrated-transformer (Accessed 16 May 2024)

[54]

Nazari M, Oroojlooy A, Snyder LV, Takáč M. 2018. Reinforcement learning for solving the vehicle routing problem. arXiv 1802.04240

doi: 10.48550/arXiv.1802.04240
[55]

Weng J, Chen H, Yan D, You K, Duburcq A, et al. 2021. Tianshou: a highly modularized deep reinforcement learning library. arXiv 2107.14171

doi: 10.48550/arXiv.2107.14171
[56]

Pinto L, Davidson J, Sukthankar R, Gupta A. 2017. Robust adversarial reinforcement learning. Proceedings of the 34th International Conference on Machine Learning. August 6−11, 2017, Sydney, NSW, Australia. vol. 70. PMLR. pp. 2817−2826 https://proceedings.mlr.press/v70/pinto17a.html

[57]

Liessner R, Schmitt J, Dietermann A, Bäker B. 2019. Hyperparameter optimization for deep reinforcement learning in vehicle energy management. Proceedings of the 11th International Conference on Agents and Artificial Intelligence. February 19−21, 2019. Prague, Czech Republic. Portugal: SciTePress. pp. 134−144 doi: 10.5220/0007364701340144