Artificial intelligence in the discovery and modification of biological elements in medicinal plants

Jing Zhang; Yixin Yang; Jinping Si; Donghong Chen; Chuan Dong; Zhigang Han; Jing Zhang; Yixin Yang; Jinping Si; Donghong Chen; Chuan Dong; Zhigang Han

doi:10.48130/mpb-0025-0010

2025 Volume 4

Article Contents

Next Previous

REVIEW Open Access

Artificial intelligence in the discovery and modification of biological elements in medicinal plants

1.
National Key Laboratory for Development and Utilization of Forest Food Resources, Zhejiang A&F University, Hangzhou 311300, China
2.
School of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou 311300, China
^# Authors contributed equally: Jing Zhang, Yixin Yang

More Information

Corresponding authors: chuand@zafu.edu.cn; hanzg@zafu.edu.cn

Received: 17 December 2024
Revised: 13 March 2025
Accepted: 21 March 2025
Published online: 16 April 2025
Medicinal Plant Biology 4, Article number: e012 (2025) | Cite this article

Abstract

Active ingredients extracted from medicinal plants are important natural sources in the field of pharmaceutical and industrial products. However, the low abundance and quality of these compounds have consistently posed a significant barrier to the full utilization of these ingredients. The discovery of enzymes in related pathways, enzyme activity modification, and pathway optimization are key to solving this problem. The emergence of large-scale multi-omics data has provided new opportunities to enhance this progress. But a new big challenge in analyzing the massive amounts of biological data lies before us. Artificial intelligence (AI) is state of the art for its process in big data and pattern recognition that has already had a revolutionary impact on biological fields. Here, we summarize recent advancements in pathway analysis and research ideas in AI-mediated discovery and modification of biological elements. Future directions and challenges for applying AI in precision medicinal plant breeding, and the biosynthesis of optimized, stable, and cost-effective natural products are also discussed.
- Artificial intelligence,
- Medicinal plants,
- Enzyme discovery,
- Enzyme modification
Rights and permissions
Copyright: © 2025 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Zhao Q, Li M, Zhang M, Tan H. 2024. Glandular trichomes: the factory of artemisinin biosynthesis. Medicinal Plant Biology 3:e019 doi: 10.48130/mpb-0024-0018 CrossRef Google Scholar
[2]	Jiang B, Gao L, Wang H, Sun Y, Zhang X, et al. 2024. Characterization and heterologous reconstitution of Taxus biosynthetic enzymes leading to baccatin III. Science 383:622−29 doi: 10.1126/science.adj3484 CrossRef Google Scholar
[3]	Reed J, Orme A, El-Demerdash A, Owen C, Martin LBB, et al. 2023. Elucidation of the pathway for biosynthesis of saponin adjuvants from the soapbark tree. Science 379:1252−64 doi: 10.1126/science.adf3727 CrossRef Google Scholar
[4]	Chen W, Gao Y, Xie W, Gong L, Lu K, et al. 2014. Genome-wide association analyses provide genetic and biochemical insights into natural variation in rice metabolism. Nature Genetics 46:714−21 doi: 10.1038/ng.3007 CrossRef Google Scholar
[5]	Schulte-Sasse R, Budach S, Hnisz D, Marsico A. 2021. Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. Nature Machine Intelligence 3:513−26 doi: 10.1038/s42256-021-00325-y CrossRef Google Scholar
[6]	Huang W, Zhang X, Li J, Lv J, Wang Y, et al. 2024. Substrate promiscuity, crystal structure, and application of a plant UDP-glycosyltransferase UGT74AN3. ACS Catalysis 14:475−88 doi: 10.1021/acscatal.3c05309 CrossRef Google Scholar
[7]	Abramson J, Adler J, Dunger J, Evans R, Green T, et al. 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630:493−500 doi: 10.1038/s41586-024-07487-w CrossRef Google Scholar
[8]	Hayes T, Rao R, Akin H, Sofroniew NJ, Oktay D, et al. 2025. Simulating 500 million years of evolution with a language model. Science 387:850−58 doi: 10.1126/science.ads0018 CrossRef Google Scholar
[9]	Nguyen E, Poli M, Durrant MG, Kang B, Katrekar D, et al. 2024. Sequence modeling and design from molecular to genome scale with Evo. Science 386:eado9336 doi: 10.1126/science.ado9336 CrossRef Google Scholar
[10]	Liu Y, Zhao X, Gan F, Chen X, Deng K, et al. 2024. Complete biosynthesis of QS-21 in engineered yeast. Nature 629:937−44 doi: 10.1038/s41586-024-07345-9 CrossRef Google Scholar
[11]	Liao LX, Song XM, Wang LC, Lv HN, Chen JF, et al. 2017. Highly selective inhibition of IMPDH2 provides the basis of antineuroinflammation therapy. Proceedings of the National Academy of Sciences of the United States of America 114:E5986−E5994 doi: 10.1073/pnas.1706778114 CrossRef Google Scholar
[12]	De La Peña R, Hodgson H, Liu JC, Stephenson MJ, Martin AC, et al. 2023. Complex scaffold remodeling in plant triterpene biosynthesis. Science 379:361−68 doi: 10.1126/science.adf1017 CrossRef Google Scholar
[13]	Zhang M, Bao YO, Zhao CX, Tian YG, Wang ZL, et al. 2024. A four-step biosynthetic pathway involving C-3 oxidation–reduction reactions from cycloastragenol to astragaloside IV in Astragalus membranaceus. The Plant Journal 120:569−77 doi: 10.1111/tpj.17001 CrossRef Google Scholar
[14]	Mehta N, Meng Y, Zare R, Kamenetsky-Goldstein R, Sattely E. 2024. A developmental gradient reveals biosynthetic pathways to eukaryotic toxins in monocot geophytes. Cell 187:5620−37 doi: 10.1016/j.cell.2024.08.027 CrossRef Google Scholar
[15]	Nett RS, Lau W, Sattely ES. 2020. Discovery and engineering of colchicine alkaloid biosynthesis. Nature 584:148−53 doi: 10.1038/s41586-020-2546-8 CrossRef Google Scholar
[16]	Hong B, Grzech D, Caputi L, Sonawane P, López CER, et al. 2022. Biosynthesis of strychnine. Nature 607:617−22 doi: 10.1038/s41586-022-04950-4 CrossRef Google Scholar
[17]	Zubieta C, He X, Dixon RA, Noel JP. 2001. Structures of two natural product methyltransferases reveal the basis for substrate specificity in plant O-methyltransferases. Nature Structural Biology 8:271−79 doi: 10.1038/85029 CrossRef Google Scholar
[18]	Wang HT, Wang ZL, Chen K, Yao MJ, Zhang M, et al. 2023. Insights into the missing apiosylation step in flavonoid apiosides biosynthesis of Leguminosae plants. Nature Communications 14:6658 doi: 10.1038/s41467-023-42393-1 CrossRef Google Scholar
[19]	Peng Z, Song L, Chen M, Liu Z, Yuan Z, et al. 2024. Neofunctionalization of an OMT cluster dominates polymethoxyflavone biosynthesis associated with the domestication of citrus. Proceedings of the National Academy of Sciences of the United States of America 121:e1973352175 doi: 10.1073/pnas.2321615121 CrossRef Google Scholar
[20]	Hodgson H, De La Peña R, Stephenson MJ, Thimmappa R, Vincent JL, et al. 2019. Identification of key enzymes responsible for protolimonoid biosynthesis in plants: Opening the door to azadirachtin production. Proceedings of the National Academy of Sciences of the United States of America 116:17096−104 doi: 10.1073/pnas.1906083116 CrossRef Google Scholar
[21]	Fu S, Liu B. 2020. Recent progress in the synthesis of limonoids and limonoid-like natural products. Organic Chemistry Frontiers 7:1903−47 doi: 10.1039/D0QO00203H CrossRef Google Scholar
[22]	Liu X, Li J, Zhu X, Xu Z, Qi J. 2024. Research advances on paclitaxel biosynthesis. Synthetic Biology Journal 5(3):527−47 (in Chinese) doi: 10.12211/2096-8280.2023-085 CrossRef Google Scholar
[23]	Kautsar SA, Suarez Duran HG, Blin K, Osbourn A, Medema MH. 2017. plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters. Nucleic Acids Research 45:W55−W63 doi: 10.1093/nar/gkx305 CrossRef Google Scholar
[24]	Skinnider MA, Johnston CW, Gunabalasingam M, Merwin NJ, Kieliszek AM, et al. 2020. Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences. Nature Communications 11:6058 doi: 10.1038/s41467-020-19986-1 CrossRef Google Scholar
[25]	Carroll L M, Larralde M, Fleck JS, Ponnudurai R, Milanese A, et al. 2021. Accurate de novo identification of biosynthetic gene clusters with GECCO. bioRxiv Preprint doi: 10.1101/2021.05.03.442509 CrossRef Google Scholar
[26]	Hannigan GD, Prihoda D, Palicka A, Soukup J, Klempir O, et al. 2019. A deep learning genome-mining strategy for biosynthetic gene cluster prediction. Nucleic Acids Research 47:e110 doi: 10.1093/nar/gkz654 CrossRef Google Scholar
[27]	Li Z, Liu F, Yang W, Peng S, Zhou J. 2022. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Transactions on Neural Networks and Learning Systems 33:6999−7019 doi: 10.1109/TNNLS.2021.3084827 CrossRef Google Scholar
[28]	Lipton ZC, Berkowitz J, Elkan C. 2015. A critical review of recurrent neural networks for sequence learning. arXiv Preprint doi: 10.48550/arXiv.1506.00019 CrossRef Google Scholar
[29]	Chen T, Guestrin C. 2016. XGBoost: A Scalable Tree Boosting System. Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, 13−17 August 2016. New York: Association for Computing Machinery. pp. 785−94. doi: 10.1145/2939672.2939785
[30]	Zhang J, Huang J, Jin S, Lu S. 2024. Vision-Language Models for Vision Tasks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 46:5625−44 doi: 10.1109/TPAMI.2024.3369699 CrossRef Google Scholar
[31]	Malhis N, Jacobson M, Jones S J M, Gsponer J. 2020. LIST-S2: taxonomy based sorting of deleterious missense mutations across species. Nucleic Acids Research 48:W154−W161 doi: 10.1093/nar/gkaa288 CrossRef Google Scholar
[32]	Laurens VDM, Hinton G. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9:2579−605 Google Scholar
[33]	Goodfellow IJ. 2014. Generative adversarial nets. Proc. 27 ^th International Conference on Neural Information Processing Systems, Montreal, USA, 2014. Montreal: MIT Press. pp. 2672−80. doi: 10.3156/JSOFT.29.5_177_2
[34]	Kohonen T. 2013. Essentials of the self-organizing map. Neural Networks 37:52−65 doi: 10.1016/j.neunet.2012.09.018 CrossRef Google Scholar
[35]	Han Z, Xu Z, Xu Y, Lin J, Chen X, et al. 2024. Phylogenomics reveal DcTPS-mediated terpenoid accumulation and environmental response in Dendrobium catenatum. Industrial Crops and Products 208:117799 doi: 10.1016/j.indcrop.2023.117799 CrossRef Google Scholar
[36]	Han Z, Gong Q, Huang S, Meng X, Xu Y, et al. 2023. Machine learning uncovers accumulation mechanism of flavonoid compounds in Polygonatum cyrtonema Hua. Plant Physiology and Biochemistry 201:107839 doi: 10.1016/j.plaphy.2023.107839 CrossRef Google Scholar
[37]	Jumper J, Evans R, Pritzel A, Green T, Figurnov M, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596:583−89 doi: 10.1038/s41586-021-03819-2 CrossRef Google Scholar
[38]	Lin Z, Akin H, Rao R, Hie B, Zhu Z, et al. 2023. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379:1123−30 doi: 10.1126/science.ade2574 CrossRef Google Scholar
[39]	Heinzinger M, Weissenow K, Sanchez JG, Henkel A, Mirdita M, et al. 2024. Bilingual language model for protein sequence and structure. NAR Genomics and Bioinformatics 6:lqae150 doi: 10.1093/nargab/lqae150 CrossRef Google Scholar
[40]	Avsec Z, Agarwal V, Visentin D, Ledsam JR, Grabska-Barwinska A, et al. 2021. Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods 18:1196−203 doi: 10.1038/s41592-021-01252-x CrossRef Google Scholar
[41]	Akiyama M, Sakakibara Y. 2022. Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. NAR Genomics and Bioinformatics 4:lqac012 doi: 10.1093/nargab/lqac012 CrossRef Google Scholar
[42]	Poplin R, Chang PC, Alexander D, Schwartz S, Colthurst T, et al. 2018. A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology 36:983−87 doi: 10.1038/nbt.4235 CrossRef Google Scholar
[43]	Fu Y, Yu S, Li J, Lao Z, Yang X, et al. 2024. DeepMineLys: deep mining of phage lysins from human microbiome. Cell Reports 43:114583 doi: 10.1016/j.celrep.2024.114583 CrossRef Google Scholar
[44]	Madani A, Krause B, Greene ER, Subramanian S, Mohr BP, et al. 2023. Large language models generate functional protein sequences across diverse families. Nature Biotechnology 41:1099−106 doi: 10.1038/s41587-022-01618-2 CrossRef Google Scholar
[45]	Wu KE, Yang KK, van den Berg R, Alamdari S, Zou JY, et al. 2024. Protein structure generation via folding diffusion. Nature Communications 15:1059 doi: 10.1038/s41467-024-45051-2 CrossRef Google Scholar
[46]	Wang Y, Song M, Liu F, Liang Z, Hong R, et al. 2025. Artificial intelligence using a latent diffusion model enables the generation of diverse and potent antimicrobial peptides. Science Advances 11:eadp7171 doi: 10.1126/sciadv.adp7171 CrossRef Google Scholar
[47]	Gumulya Y, Baek J, Wun S, Thomson RES, Harris KL, et al. 2018. Engineering highly functional thermostable proteins using ancestral sequence reconstruction. Nature Catalysis 1:878−88 doi: 10.1038/s41929-018-0159-5 CrossRef Google Scholar
[48]	Zhang K, Yang X, Wang Y, Yu Y, Huang N, et al. 2025. Artificial intelligence in drug development. Nature Medicine 31:45−59 doi: 10.1038/s41591-024-03434-4 CrossRef Google Scholar
[49]	Chen M, Zhang W, Gou Y, Xu D, Wei Y, et al. 2023. GPS 6.0: an updated server for prediction of kinase-specific phosphorylation sites in proteins. Nucleic Acids Research 51:W243−W250 doi: 10.1093/nar/gkad383 CrossRef Google Scholar
[50]	Lobentanzer S, Feng S, Bruderer N, Maier A, The BioChatter Consortium, et al. 2025. A platform for the biomedical application of large language models. Nature Biotechnology 43:166−69 doi: 10.1038/s41587-024-02534-3 CrossRef Google Scholar
[51]	Huang Z, Bianchi F, Yuksekgonul M, Montine T J, Zou J. 2023. A visual–language foundation model for pathology image analysis using medical Twitter. Nature Medicine 29:2307−16 doi: 10.1038/s41591-023-02504-3 CrossRef Google Scholar
[52]	Liu W, Li J, Tang Y, Zhao Y, Liu C, et al. 2025. DrBioRight 2.0: an LLM-powered bioinformatics chatbot for large-scale cancer functional proteomics analysis. Nature Communications 16:2256 doi: 10.1038/s41467-025-57430-4 CrossRef Google Scholar

About this article

Cite this article

Zhang J, Yang Y, Si J, Chen D, Dong C, et al. 2025. Artificial intelligence in the discovery and modification of biological elements in medicinal plants. Medicinal Plant Biology 4: e012 doi: 10.48130/mpb-0025-0010

Zhang J, Yang Y, Si J, Chen D, Dong C, et al. 2025. Artificial intelligence in the discovery and modification of biological elements in medicinal plants. Medicinal Plant Biology 4: e012 doi: 10.48130/mpb-0025-0010

Figures(2) / Tables(2)

Download PDF

Article Metrics

Article views(7459) PDF downloads(3329)

Other Articles By Authors

on this site
on Google Scholar

HTML

Introduction

Medicinal plants serve as one of the significant sources of biologically active molecules such as artemisinin, sappanone A, and paclitaxel, which play a vital role in the treatment of human diseases^[1−3]. Although there is significant market demand for these natural products, low abundance, and the difficulty of extraction and purification have resulted in premium pricing for related products and very low utilization rates.

The ultimate goal of medicinal plants is to create high contents of active ingredients. Therefore, it is of paramount importance to elucidate the pathways of these compounds and highly improve biosynthesis efficiency. With the rapid development of high-throughput sequencing, genome-wide association studies (GWAS), and multi-omics approaches, it is now possible to uncover genetic elements^[4]. These methods however have rendered the identification of candidate genes low-efficient and unreliable. Currently, Artificial intelligence (AI) in the field of biology has demonstrated significant potential applications to mine candidate genes^[5].

Enzyme modification and rational design are also critical components for the efficient synthesis of medicinally active ingredients. Various techniques such as X-ray crystallography, homology modeling, and site-directed mutagenesis are commonly employed for enzyme structure prediction and modification^[6]. Currently, Large Language Models (LLMs) such as AlphaFold 3 (AF3), Evolutionary Scale Modeling (ESM), and Evo have revolutionized the field of high-precision protein structure prediction, which might significantly promote the discovery of super-active enzymes^[7−9].

Here, this review systematically elaborates on AI technology in the application of biological elements. We investigate current advances in plant natural product biosynthesis, AI-driven discovery, and modification of biological elements in medicinal plants, future trends and challenges of precision medicinal plant breeding, and biosynthesis using AI.

Current advances in plant natural product biosynthesis

The active ingredients in medicinal plants, such as terpenoids, alkaloids, and flavonoids, play a crucial role in the treatment of numerous diseases^[10,11]. With the swift development of genome sequencing, key enzymes or pathways of some significant active ingredients have been clarified (Table 1).

Terpenoids are important active substances for medical application in humans^[10]. The biosynthesis of terpenoids is mainly through the Mevalonic Acid (MVA) Pathway. The precursors of QS-21, Limonoids, and Astragaloside are all synthesized via the MVA pathway. β-amyrin synthase and three cytochrome P450 enzymes generated core skeleton quillaic acid (QA) for QS-21^[10]. In addition, seven key enzymes and four different types of enzymes enabled the synthesis of QS-21. Melianol Oxide Isomerases (MOIs) are responsible for the conversion of melianol to different limonoids skeletons via epoxidation intermediates^[12]. The biosynthesis pathway from cycloastragenol to astragaloside IV encompasses four key steps: C-3 oxidation, 6-O-glucosylation, C-3 reduction, and 3-O-xylosylation^[13].

Alkaloids are a class of biologically active nitrogen-containing organic compounds that have a variety of physiological effects on the human body^[14]. The main alkaloid biosynthesis pathways are the amino acid derivation pathway, the isoprenoid pathway, and the dopamine pathway. The alkaloids are diverse, with Paclitaxel and Colchicine coming via the mevalonate pathway, and Amaryllidaceae alkaloids, and Strychnine being amino acid derived pathways. Two missing key enzymes 'T9αH' and 'TOT' were discovered and characterized, thus elucidating the mechanism of paclitaxel oxetane formation^[2]. The critical role of the enzymes CYP96T1 and CYP96T6 in the biosynthesis of Amaryllidaceae alkaloids was demonstrated in this study^[14]. GsCYP71FB1 generated N-formyldemecolcine via an atypical oxidative ring expansion reaction, a key step in colchicine backbone synthesis^[15]. SnvWS and SnvAT generated Wieland-Gumlich aldehyde and N-malonyl Wieland-Gumlich aldehyde, respectively, which are key steps in the generation of the stilbene backbone^[16].

Flavonoids are currently the prominent small molecules of significant interest. Although flavonoids are structurally diverse, their basic skeletons are synthesized via the Phenylpropanoid Pathway. Three key genes were identified in the anthocyanin synthesis pathway in Camellia sinensis: flavanol synthase (FLS), dihydroflavonol-4 reductase (DFR), and anthocyanin synthase (ANS). Homoisoflavonoids represent a rare and unique subclass of flavonoids, distinguished by their extended carbon skeleton (C9) between the B and C rings. Unlike the widely distributed isoflavonoids, homoisoflavonoids are limited to only a few species, most notably Caesalpinia sappan and Polygonatum cyrtonema, making them exceptionally valuable in natural product research. ChOMT from Medicago truncatula catalyzed methylation of Isoliquiritigenin, which might be an initiation step in the synthesis of homoisoflavonoids^[17]. An apiosyltransferase GuApiGT from Glycyrrhiza uralensis could efficiently catalyze 2″-O-apiosylation of flavonoid glycosides^[18]. CreOMT3, CreOMT4, and CreOMT5 p-hydroxyflavones exhibited multisite O-methylation activity, generating seven Polymethoxyflavones (PMFs) in vitro and in vivo^[19].

Table 1. Recent progress in dissecting natural product pathways in medicinal plants.

Types	Active ingredients	Enzymes	Plant source	Medicinal activity	Ref.
Terpenoid	QS-21	CCL1, PKSIII, KR, ACT2/3, UGT73CZ2	Quillaja Saponaria	Vaccine adjuvant	[3,10]
	Limonoid	OSC, CYP71CD1, MOI2, L21AT, SDR, L1AT, AKR, LFS	Citrus sinensis, Melia azedarach	Biopesticide	[12,20,21]
	Astragaloside	AmOSC3, AmCYP88D25, AmCYP88D7, AmCYP71D756	Astragalus membranaceus	Anticancer	[13]
Alkaloid	Paclitaxel	TOT1, T9αOH, TS, T5αOH, T13αOH, T10βOH, TAT, T7βOH, T2αOH, TBT, DBAT, T1βOH	Taxus chinensis	Anticancer	[2,22]
	Amaryllidaceae alkaloid	CYP96T1, CYP96T6, CYP96T5, NtSDR2, NtODD2, NtOMT1, NtNMT1, NtAKR1	Narcissus tazetta	Treating Alzheimer's disease	[14]
	Colchicine	GsOMT1, GsNMT, GsCYP75A109, GsOMT2, GsOMT3, GsCYP75A110, GsOMT4, GsCYP71FB1	Gloriosa superba	Antigout	[15]
	Strychnine	SnvGO, SnvNS1, SnvNO, SnvWS, SnvAT, AAE13, Snv10H, SnvOMT, Snv11H	Strychnos nux-vomica	Biopesticide	[16]
Flavonoid	Homoisoflavonoid	ChOMT	Medicago truncatula	Anti-neuroinflammatory	[17]
	Liquiritin apioside	GuApiGT	Glycyrrhiza uralensis	Cough suppressing	[18]
	Polymethoxyflavone	CreOMT3, CreOMT4, CreOMT5	Mandarin	Anticancer	[19]

Although several biosynthesis pathways of active ingredients have been elucidated, biological elements in medicinal plants remain largely unknown.

AI-driven discovery of biological elements in medicinal plants

AI-based biological element modification in medicinal plants

Advanced applications of generative and agentic AI in the biology field

Generative AI is rapidly transforming operations in biological and medical fields, leveraging its advanced natural language processing capabilities for applications such as target identification, virtual screening, and de novo design. Generative AI is capable of developing potential therapeutic drugs by identifying novel drug targets, analyzing complex biological networks, and constructing multi-omics data networks. Furthermore, it can predict ligand spatial transformations, generate complex atomic coordinates, and learn the probability density distribution of receptor-ligand distances to generate binding poses, thereby identifying potential lead compounds or drug candidates. Despite the tremendous advantages of generative AI, the complexity of biology needs an approach that can flexibly break down complex problems into actionable tasks. In healthcare, agent AI can assist physicians in formulating treatment plans and monitoring patient health by making decisions and planning based on real data. Agent AI enhances the efficiency of routine tasks and automates continuous, high-throughput research^[48].

Generative AI and agent AI have diverse and impactful applications in the biomedical field. GPS 6.0 utilizes public phosphorylation sites (ssKSRs) to predict kinase-specific phosphorylation sites, which also provides biologists with a user-friendly online service for predicting kinase-specific phosphorylation sites and offers comprehensive annotations of the prediction outcomes^[49]. BioChatter is an open-source Python framework that includes several modules, such as the LLM provider Application Programming Interface (API), database public API, knowledge management systems, and various software components. Users can customize the entire process, from prototyping to packaging and deployment, based on their specific needs using the different modules. This flexible and modular architecture supports a wide range of biomedical research contexts, making BioChatter an ideal tool for facilitating generative AI applications in the biomedical field^[50]. OpenPath is a generative AI model fine-tuned through comparative learning. It was pre-trained using Pathology Language-Image Pre-training (PLIP) on an extensive dataset comprising 208,414 de-identified pathology images paired with corresponding natural language descriptions, both sourced exclusively from Twitter. The model demonstrates the capability to classify new images without requiring additional training, thereby assisting clinicians in disease diagnosis. Furthermore, OpenPath supports case retrieval through image-based or natural language-based searches, significantly enhancing knowledge sharing and clinical decision-making^[51]. DrBioRight 2.0 is an agentic AI platform powered by a large language model. It integrated approximately 8,000 samples from The Cancer Genome Atlas (TCGA) patient tumors and 900 samples from the Cancer Cell Line Encyclopedia (CCLE) cell lines for training. Leveraging OpenAI GPT-4/4O and Llama 3 models, it generates answers to user queries. The tool is designed to reduce technical barriers and facilitate seamless analysis of protein-centric canceromics data. Users from diverse backgrounds can effortlessly access, analyze, and visualize data through intuitive natural language queries^[52]. Artificial intelligence, particularly generative AI and agentic AI, is revolutionizing the biological research or medical field. For the mining and modification of pathway genes in medicinal plants, generative AI also can identify novel genes and metabolic pathways efficiently. Agentic AI streamlines data analysis and automates tasks, optimizing research efficiency, which will provide a new research paradigm.

shu — Concluding remarks and future perspectives

Concluding remarks and future perspectives

Data type	Models	Architecture	Data sources	Ref.
Protein sequence	ESM-2/3	BERT	UniRef50, UniRef90, PDB	[8,29]
Protein sequence	AlphaFold2/3	Transformer (attention mechanism)	PDB, BFD, UniRef90, MGnify, RFam, RNAcentral, JASPAR	[7,37]
Protein sequence and structure	ProtT5	Transformer	PDB, AFDB	[39]
DNA/RNA	Enformer	Transformer	Gencode	[40]
DNA/RNA	RNABERT	BERT	Rfam, BRAliBase2.1	[41]
Whole-genome	Evo	StripedHyena	GTDB, IMG/VR, IMG/PR, NCBI RefSeq, MGnify, MGRAST, UHGG, JGI IMG	[9]

{{lists.name}}

Artificial intelligence in the discovery and modification of biological elements in medicinal plants

Abstract