| [1] |
Patil A. 2025. Advancing reasoning in large language models: promising methods and approache. |
| [2] |
Multiworks. 2015. What is multi-hop reasoning. www.moveworks.com/us/en/resources/ai-terms-glossary/multi-hop-reasoning |
| [3] |
Abnave. 2025. OpenAI’s deep research: a leap towards AGI. https://medium.com/@pratikabnave97/openais-deep-research-a-leap-towards-agi-e05339823715 |
| [4] |
Zhang Z, Lin P, Wang Z, Zhang Y, Xu JQZ. 2024. Initialization is critical to whether transformers fit composite functions by reasoning or memorizing. |
| [5] |
De Asis K, Hernandez-Garcia J, Holland G, Sutton R. 2018. Multi-step reinforcement learning: a unifying algorithm. Proceedings of the AAAI Conference on Artificial Intelligence, 2−7 February 2018, New Orleans, Lousiana, USA, Vol. 32. Palo Alto, CA, USA: AAAI Press. doi: 10.1609/aaai.v32i1.11631 |
| [6] |
Deepseek-AI, Liu A, Feng B, Wang B, Wang B, et al. 2024. Deepseek-v2: a strong, economical, and efficient mixture-of-experts language model. CoRR 2024. https://openreview.net/forum?id=MwHAn6R7OS&referrer=%5Bthe%20profile%20of%20Bo%20Liu%5D(%2Fprofile%3Fid%3D~Bo_Liu17 |
| [7] |
Ye Y, Zhang T, Jiang W, Huang H. 2025. Process-supervised reinforcement learning for code generation. |
| [8] |
Kim S, Kim S. 2024. System-2 reasoning via generality and adaptation. The First Workshop on System-2 Reasoning at Scale, NeurIPS'24: Sys2-Reasoning. https://openreview.net/group?id=NeurIPS.cc/2024/Workshop/Sys2-Reasoning#tab-accept-poster |
| [9] |
Bereska L, Gavves E. 2024. Mechanistic interpretability for AI safety − a review. |
| [10] |
Agarwal P, Rahman AA, St-Charles P-L, Prince SJ, Kahou SE. 2023. Transformers in reinforcement learning: a survey. |
| [11] |
Li W, Luo H, Lin Z, Zhang C, Lu Z, Ye D. 2023. A survey on transformers in reinforcement learning. |
| [12] |
Esslinger K, Platt R, Amato C. 2022. Deep transformer q-networks for partially observable reinforcement learning. |
| [13] |
Barto AG, Mahadevan S. 2003. Recent advances in hierarchical reinforcement learning. |
| [14] |
Chen C, Wu YF, Yoon J, Ahn S. 2022. TransDreamer: reinforcement learning with transformer world models. |
| [15] |
Ganea O, Bécigneul G, Hofmann T. 2018. Hyperbolic entailment cones for learning hierarchical embeddings. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10−15 July 2018. PMLR. pp. 1646−55 https://proceedings.mlr.press/v80/ganea18a.html |
| [16] |
Ye J, Yao Z, Huang Z, Pan L, Liu J, et al. 2025. How does transformer learn implicit reasoning? |
| [17] |
Wang Z, Wang Y, Zhang Z, Zhou Z, Jin H, et al. 2024. Understanding the language model to solve the symbolic multi-step reasoning problem from the perspective of buffer mechanism. |
| [18] |
Liu G, Ji K, Zheng R, Wu Z, Dun C, et al. 2024. Enhancing multi-step reasoning abilities of language models through direct q-function optimization. |
| [19] |
Yang M, Verma H, Zhang DC, Liu J, King I, et al. Hypformer: eploring efficient transformer fully in hyperbolic space. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 25−29 August 2024, Barcelona, Spain. USA: ACM. pp. 3770−81 doi: 10.1145/3637528.3672039 |
| [20] |
Khrulkov V, Mirvakhabova L, Ustinova E, Oseledets I, Lempitsky V. 2020. Hyperbolic image embeddings. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13−19, 2020, Seattle, WA, USA. USA: IEEE. pp. 6417−27 doi: 10.1109/cvpr42600.2020.00645 |
| [21] |
Tifrea A, Bécigneul G, Ganea O-E. 2018. Poincaré glove: hyperbolic word embeddings. |
| [22] |
Nickel M, Kiela D. 2018. Learning continuous hierarchies in the lorentz model of hyperbolic geometry. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10−15 July 2018. PMLR. pp. 3779−88 https://proceedings.mlr.press/v80/nickel18a.html |
| [23] |
Meng F, Yao Z, Zhang M. 2025. TransMLA: multi-head latent attention is all you need. |
| [24] |
Su J, Ahmed M, Lu Y, Pan S, Bo W, et al. 2024. Roformer: enhanced transformer with rotary position embedding. |
| [25] |
Wembo. 2025. DeepSeekMoE: bridging efficiency and capacity in large language models using DeepSeek model from China https://levelup.gitconnected.com/deepseekmoe-bridging-efficiency-and-capacity-in-large-language-models-using-deepseek-model-from-dbd4e852a637 |
| [26] |
Grootendorst M. 2024. A visual guide to mixture of experts (MoE) https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mixture-of-experts |