Artificial intelligence algorithms drive the deciphering of traditional Chinese medicine by analyzing the chemicalome, targetome, and bioactivome

Huipeng Song; Zeyuan Liang; Xinru Zhang; Fengyao Yang; Mingyue Zheng; Guizhong Xin; Huipeng Song; Zeyuan Liang; Xinru Zhang; Fengyao Yang; Mingyue Zheng; Guizhong Xin

doi:10.48130/targetome-0026-0002

Figures (4) Tables (1)

Figure 1.
Schematic diagram for AI-driven revelation of the TCM chemicalome. The diagram delineates comprehensive characterization strategies for the TCM chemicalome, categorized into in vitro (left panel) and in vivo (right panel) approaches. The in vitro workflow integrates GNPS for analog discovery via molecular networking, SIRIUS for structural elucidation using fragmentation tree analysis, and GinMIL for isomer differentiation. The in vivo workflow employs mass spectrometry for high-quality ADMET data acquisition, CMSSP for deep learning-based metabolite annotation, and a chemicalome-metabolome matching approach for the construction of metabolic networks.
Figure 2.
Schematic diagram for AI-facilitated discovery of the TCM targetome. The diagram integrates three methodological components: AI-driven target prediction by compound-centered, enzyme-centered, and disease-centered methods; proteome-wide target identification by CETSA, TPP, LiP-MS, and ABPP; and natural product targetome analysis by transfer learning models.
Figure 3.
Schematic diagram for AI-accelerated revelation of the TCM bioactivome. The diagram introduces the concept of the bioactivome, which refers to the set of biological activities derived from the molecular and cellular events induced by the action of the TCM chemicalome on the targetome. Research on the bioactivome mainly focuses on platform construction based on AI algorithms, multi-omics integration, algorithmic innovations, and TCM clinical application.
Figure 4.
Schematic diagram for interactions among the chemicalome, targetome, and bioactivome. The framework includes chemicalome-oriented, targetome-oriented, and bioactivome-oriented research to construct their interactions. Molecular feature-driven methods play an important role in this process.

Tool name	Core technologies	Key functionalities	Advantages	Limitations
GNPS	Molecular network algorithm, MS/MS fragment clustering, unsupervised learning	1. MS/MS fragment similarity visualization; 2. Structural homology mining; 3. Known compound family screening	1. Intuitive MS data visualization; 2. Enables rapid natural product tracing; 3. Accelerates discovery of known TCM compound families	1. Reliance on existing MS databases; 2. Low structural annotation accuracy for rare TCM components with unique fragments; 3. Clustering affected by low-quality MS data
SIRIUS	Deep learning, fragmentation tree analysis, tandem MS database matching, de novo structural analysis	1. Tandem MS-based unknown structure identification; 2. MS fragment decomposition; 3. Structural similarity search	1. Enables computational de novo prediction for unknown TCM components; 2. Integrated strategies enable higher annotation accuracy than single-algorithm tools; 3. Supports batch processing of large-scale TCM MS data	1. Reduced de novo accuracy for TCM components with multiple chiral centers; 2. Relies on high-quality tandem MS data; 3. Slow analysis for high-diversity TCM samples
MSNovelist	MS-driven de novo molecular structure generation, deep neural network, single MS/MS spectrum interpretation	1. Direct molecular structure deduction from single MS/MS spectrum; 2. Reference-free unknown structure generation	1. Breaks dependence on physical references; 2. Requires only a single MS/MS spectrum; 3. Improves novel TCM compound discovery efficiency	1. Diverse structural candidates require manual verification; 2. Low accuracy for TCM macromolecules; 3. Poor adaptability for rare TCM components
GinMIL	Machine learning, ion mobility separation, multi-dimensional database construction	1. Comprehensive ginsenoside database construction; 2. Ginsenoside isomer differentiation	1. Solves the traditional MS bottleneck in TCM isomer distinction (ginsenosides); 2. Multi-dimensional data fusion enables high component identification accuracy; 3. Specialized for saponin-rich TCM targeted analysis	1. Ginsenoside-focused; 2. Relies on ion mobility separation equipment; 3. Unable to analyze unknown saponin isomers
SwissADME	Machine learning, QSAR model, molecular descriptor calculation	1. Rapid ADME indicator assessment for small molecules	1. Fast prediction for high-throughput TCM active component screening; 2. Predicted indicators align with TCM small-molecule in vivo metabolic characteristics; 3. Visualizable results	1. Applicable only to small molecules; 2. Low prediction accuracy for rare TCM components with unique structures; 3. Lacks prediction of in vivo metabolic pathways for TCM components
ADMETlab 2.0	Integrated machine learning, multi-task feature learning, large-scale ADMET database	1. Comprehensive prediction of ADMET indicators; 2. Toxicity/metabolic stability prediction; 3. Drug-drug interaction prediction	1. Provides extensive ADMET indicators relevant to TCM research; 2. High accuracy; 3. Supports batch/customized prediction; 4. Predicts TCM compound prescription drug-drug interactions	1. Basic molecular descriptor knowledge required; 2. TCM metabolite ADMET prediction needs metabolic pathway tools; 3. Lacks specific guidance for lead optimization of TCM components
CMSSP	Contrastive learning model, molecular graph convolution, MS-structure unified representation space construction	1. Metabolite structural annotation; 2. Low-abundance metabolite identification in complex biological matrices; 3. In vivo MS data interpretation	1. The MS-structure unified representation space significantly enhances annotation accuracy; 2. Strong anti-interference ability for complex biological matrices; 3. Reduces dependence on in vivo metabolite databases	1. Requires large-scale natural product MS-structure paired training data; 2. Slow analysis for TCM in vivo samples with massive metabolites; 3. Sensitive to high-abundance endogenous biological matrix interferences
Chemicalome-Metabolome Matching Platform	Machine learning, metabolic network construction, chemicalome-metabolome correlation	1. In vivo MS raw data processing; 2. Endogenous background elimination; 3. Parent compound and metabolite distinction; 4. Screening of xenobiotic metabolites	1. Integrates in vitro-in vivo matching; 2. Time-series filtering outlines TCM component temporal metabolic characteristics; 3. Constructs complex TCM metabolic networks; 4. Adapts to dynamic biological matrix processing	1. High in vitro and in vivo MS data quality requirements; 2. Large-scale metabolic network construction needs high computing resources; 3. Weak automatic identification for rare TCM metabolic pathways

Table 1.

Comparative analysis of AI tools for TCM in vitro and in vivo chemicalome characterization.