[1]

Patil A. 2025. Advancing reasoning in large language models: promising methods and approache. arXiv 2502.03671v2

doi: 10.48550/arXiv.2502.03671
[2]

Multiworks. 2015. What is multi-hop reasoning. www.moveworks.com/us/en/resources/ai-terms-glossary/multi-hop-reasoning

[3]

Abnave. 2025. OpenAI’s deep research: a leap towards AGI. https://medium.com/@pratikabnave97/openais-deep-research-a-leap-towards-agi-e05339823715

[4]

Zhang Z, Lin P, Wang Z, Zhang Y, Xu JQZ. 2024. Initialization is critical to whether transformers fit composite functions by reasoning or memorizing. arXiv 2405.05409v5

doi: 10.48550/arXiv.2405.05409
[5]

De Asis K, Hernandez-Garcia J, Holland G, Sutton R. 2018. Multi-step reinforcement learning: a unifying algorithm. Proceedings of the AAAI Conference on Artificial Intelligence, 2−7 February 2018, New Orleans, Lousiana, USA, Vol. 32. Palo Alto, CA, USA: AAAI Press. doi: 10.1609/aaai.v32i1.11631

[6]

Deepseek-AI, Liu A, Feng B, Wang B, Wang B, et al. 2024. Deepseek-v2: a strong, economical, and efficient mixture-of-experts language model. CoRR 2024. https://openreview.net/forum?id=MwHAn6R7OS&referrer=%5Bthe%20profile%20of%20Bo%20Liu%5D(%2Fprofile%3Fid%3D~Bo_Liu17

[7]

Ye Y, Zhang T, Jiang W, Huang H. 2025. Process-supervised reinforcement learning for code generation. arXiv 2502.01715v1

doi: 10.48550/arXiv.2502.01715
[8]

Kim S, Kim S. 2024. System-2 reasoning via generality and adaptation. The First Workshop on System-2 Reasoning at Scale, NeurIPS'24: Sys2-Reasoning. https://openreview.net/group?id=NeurIPS.cc/2024/Workshop/Sys2-Reasoning#tab-accept-poster

[9]

Bereska L, Gavves E. 2024. Mechanistic interpretability for AI safety − a review. arXiv 2404.14082v3

doi: 10.48550/arXiv.2404.14082
[10]

Agarwal P, Rahman AA, St-Charles P-L, Prince SJ, Kahou SE. 2023. Transformers in reinforcement learning: a survey. arXiv 2307.05979v1

doi: 10.48550/arXiv.2307.05979
[11]

Li W, Luo H, Lin Z, Zhang C, Lu Z, Ye D. 2023. A survey on transformers in reinforcement learning. arXiv 2301.03044v3

doi: 10.48550/arXiv.2301.03044
[12]

Esslinger K, Platt R, Amato C. 2022. Deep transformer q-networks for partially observable reinforcement learning. arXiv 2404.14082v3

doi: 10.48550/arXiv.2206.01078
[13]

Barto AG, Mahadevan S. 2003. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13:341−79

doi: 10.1023/A:1025696116075
[14]

Chen C, Wu YF, Yoon J, Ahn S. 2022. TransDreamer: reinforcement learning with transformer world models. arXiv 2202.09481v2

doi: 10.48550/arXiv.2202.09481
[15]

Ganea O, Bécigneul G, Hofmann T. 2018. Hyperbolic entailment cones for learning hierarchical embeddings. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10−15 July 2018. PMLR. pp. 1646−55 https://proceedings.mlr.press/v80/ganea18a.html

[16]

Ye J, Yao Z, Huang Z, Pan L, Liu J, et al. 2025. How does transformer learn implicit reasoning? arXiv 2505.23653v1

doi: 10.48550/arXiv.2505.23653
[17]

Wang Z, Wang Y, Zhang Z, Zhou Z, Jin H, et al. 2024. Understanding the language model to solve the symbolic multi-step reasoning problem from the perspective of buffer mechanism. arXiv 2405.15302v3

doi: 10.48550/arXiv.2405.15302
[18]

Liu G, Ji K, Zheng R, Wu Z, Dun C, et al. 2024. Enhancing multi-step reasoning abilities of language models through direct q-function optimization. arXiv 2410.09302v2

doi: 10.48550/arXiv.2410.09302
[19]

Yang M, Verma H, Zhang DC, Liu J, King I, et al. Hypformer: eploring efficient transformer fully in hyperbolic space. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 25−29 August 2024, Barcelona, Spain. USA: ACM. pp. 3770−81 doi: 10.1145/3637528.3672039

[20]

Khrulkov V, Mirvakhabova L, Ustinova E, Oseledets I, Lempitsky V. 2020. Hyperbolic image embeddings. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13−19, 2020, Seattle, WA, USA. USA: IEEE. pp. 6417−27 doi: 10.1109/cvpr42600.2020.00645

[21]

Tifrea A, Bécigneul G, Ganea O-E. 2018. Poincaré glove: hyperbolic word embeddings. arXiv 1810.06546v2

doi: 10.48550/arXiv.1810.06546
[22]

Nickel M, Kiela D. 2018. Learning continuous hierarchies in the lorentz model of hyperbolic geometry. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10−15 July 2018. PMLR. pp. 3779−88 https://proceedings.mlr.press/v80/nickel18a.html

[23]

Meng F, Yao Z, Zhang M. 2025. TransMLA: multi-head latent attention is all you need. arXiv 2502.07864v5

doi: 10.48550/arXiv.2502.07864
[24]

Su J, Ahmed M, Lu Y, Pan S, Bo W, et al. 2024. Roformer: enhanced transformer with rotary position embedding. Neurocomputing 568:127063

doi: 10.1016/j.neucom.2023.127063
[25]

Wembo. 2025. DeepSeekMoE: bridging efficiency and capacity in large language models using DeepSeek model from China https://levelup.gitconnected.com/deepseekmoe-bridging-efficiency-and-capacity-in-large-language-models-using-deepseek-model-from-dbd4e852a637

[26]

Grootendorst M. 2024. A visual guide to mixture of experts (MoE) https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mixture-of-experts