Large language models in aquatic risk assessment: research status and future perspectives

Qianhui Li; Fei Cheng; Jing You; Qianhui Li; Fei Cheng; Jing You

doi:10.48130/ebp-0026-0002

2026 Volume 2

Article Contents

Next Previous

PERSPECTIVE Open Access

Large language models in aquatic risk assessment: research status and future perspectives

Correspondence: Fei Cheng (chengfei@gig.ac.cn)

Full list of author information is available at the end of the article.

Received: 30 October 2025
Revised: 23 December 2025
Accepted: 12 January 2026
Published online: 04 February 2026
Environmental and Biogeochemical Processes 2, Article number: e007 (2026) | Cite this article

Abstract

As a crucial component for maintaining ecological security and human health, aquatic ecosystems are facing risks from intensified human activities. Aquatic risk assessment requires a comprehensive understanding of geographic distribution, exposure, and effects of diverse pollutants. In the era of big data, utilizing available environmental data to its fullest extent is expected to facilitate efficient regional risk assessment, and support informed decision-making in risk management. However, it faces a significant challenge in data integration, as environmental data are scattered across heterogeneous texts from diverse corpora, such as scientific research literature, monitoring reports, and policy documents. Natural language processing (NLP) approaches serve as key tools for structured information extraction (IE). Traditional NLP techniques face bottlenecks such as cumbersome feature engineering, and limited generalization, while newly developed large language models (LLMs) can perform a wide array of tasks through prompting, achieving remarkable generalization and versatility. The present work systematically reviewed cutting-edge applications of LLMs in IE tasks across multiple disciplines, including chemistry, biology, and toxicology, from three perspectives: entity extraction, relation extraction, and semantic generation. On the contrary, the current application of LLMs in environmental science is still in its early stages, facing challenges such as data dependence, hallucinations, and environmental concerns. Future research should focus on building high-quality environmental corpora and hybrid strategies to systematically integrate aquatic ecological risk data, and support environmental risk assessment and management policies.
- Aquatic risk assessment,
- Large language models,
- Data mining,
- Natural language processing
Author details
- 1.
  Guangdong Provincial Key Laboratory of Environmental Pollution and Health, College of Environment and Climate, Jinan University, Guangzhou 511443, China
- 2.
  State Key Laboratory of Advanced Environmental Technology, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou 510640, China
Rights and permissions
Copyright: © 2026 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, et al. 2017. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. Journal of Cheminformatics 9:61 doi: 10.1186/s13321-017-0247-6 CrossRef Google Scholar
[2]	Kim S, Chen J, Cheng T, Gindulyte A, He J, et al. 2025. PubChem 2025 update. Nucleic Acids Research 53:D1516−D1525 doi: 10.1093/nar/gkae1059 CrossRef Google Scholar
[3]	OECD. 2018. Users' Handbook supplement to the Guidance Document for developing and assessing Adverse Outcome Pathways. OECD Series on Adverse Outcome Pathways 1: OECD Publishing, Paris. doi: 10.1787/5jlv1m9d1g32-en
[4]	Papadopoulos D, Papadakis N, Litke A. 2020. A methodology for open information extraction and representation from large scientific corpora: the CORD-19 data exploration use case. Applied Sciences 10:5630 doi: 10.3390/app10165630 CrossRef Google Scholar
[5]	Hirschberg J, Manning CD. 2015. Advances in natural language processing. Science 349:261−266 doi: 10.1126/science.aaa8685 CrossRef Google Scholar
[6]	Li J, Sun A, Han J, Li C. 2022. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering 34:50−70 doi: 10.1109/TKDE.2020.2981314 CrossRef Google Scholar
[7]	Gonzalez Hernandez F, Nguyen Q, Smith VC, Cordero JA, Ballester MR, et al. 2024. Named entity recognition of pharmacokinetic parameters in the scientific literature. Scientific Reports 14:23485 doi: 10.1038/s41598-024-73338-3 CrossRef Google Scholar
[8]	Dagdelen J, Dunn A, Lee S, Walker N, Rosen AS, et al. 2024. Structured information extraction from scientific text with large language models. Nature Communications 15:1418 doi: 10.1038/s41467-024-45563-x CrossRef Google Scholar
[9]	Liang W, Su W, Zhong L, Yang Z, Li T, et al. 2024. Comprehensive Characterization of oxidative stress-modulating chemicals using GPT-based text mining. Environmental Science and Technology 58:20540−20552 doi: 10.1021/acs.est.4c07390 CrossRef Google Scholar
[10]	Zhang X, Kao Y, Che S, Yan J, Zhou S, et al. 2025. Chinese medical named entity recognition integrating adversarial training and feature enhancement. Scientific Reports 15:14844 doi: 10.1038/s41598-025-98465-3 CrossRef Google Scholar
[11]	Ying H, Yuan H, Lu J, Qu Z, Zhao Y, et al. 2025. GENIE: Generative Note Information Extraction model for structuring EHR data. arXiv 00:2501.18435 doi: 10.48550/arXiv.2501.18435 CrossRef Google Scholar
[12]	Li K, Zhang J, Yao C, Shi C. Automatic relation extraction from text: a survey. 2016 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI), Beijing, China, 2016. USA: IEEE. pp. 83−86 doi: 10.1109/IIKI.2016.58
[13]	Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Computation 9:1735−1780 doi: 10.1162/neco.1997.9.8.1735 CrossRef Google Scholar
[14]	Chung J, Gulcehre C, Cho K, Bengio Y. 2014. Empirical evaluation of gated recurrent neural networks on sequence Modeling. arXiv 00:1412.3555 doi: 10.48550/arXiv.1412.3555 CrossRef Google Scholar
[15]	Howard J, Ruder S. 2018. Universal language model fine-tuning for text classification. Proc. 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018. US: Association for Computational Linguistics. pp. 328−339 doi: 10.18653/v1/p18-1031
[16]	Cho K, Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, et al. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014. PA, USA: ACL. pp. 1724−1734 doi: 10.3115/v1/d14-1179
[17]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, et al. 2023. Attention is all you need. http://arxiv.org/abs/1706.03762. (Accessed on 2025-06-17)
[18]	Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, et al. 2020. Language models are few-shot learners. arXiv 00:2005.14165 doi: 10.48550/arXiv.2005.14165 CrossRef Google Scholar
[19]	Huang J, Cheng F, He L, Lou X, Li H, et al. 2024. Effect driven prioritization of contaminants in wastewater treatment plants across China: a data mining-based toxicity screening approach. Water Research 264:122223 doi: 10.1016/j.watres.2024.122223 CrossRef Google Scholar
[20]	Srivastava H, Kumar Das S. 2023. Air pollution prediction system using XRSTH-LSTM algorithm. Environmental Science and Pollution Research 30:125313−125327 doi: 10.1007/s11356-023-28393-0 CrossRef Google Scholar
[21]	Cheng F, Li H, Brooks BW, You J. 2021. Signposts for aquatic toxicity evaluation in China: text mining using event-driven taxonomy within and among regions. Environmental Science & Technology 55:8977−8986 doi: 10.1021/acs.est.1c00152 CrossRef Google Scholar
[22]	Shrestha S, Mount J, Vald G, Sermet Y, Samuel DJ, et al. 2025. A community-centric intelligent cyberinfrastructure for addressing nitrogen pollution using web systems and conversational AI. Environmental Science & Policy 167:104055 doi: 10.1016/j.envsci.2025.104055 CrossRef Google Scholar
[23]	Strogonov V, Pollert J. 2025. Artificial intelligence-enhanced web application approach to data management in the WIDER UPTAKE project. Journal of Hydroinformatics 27:686−699 doi: 10.2166/hydro.2025.248 CrossRef Google Scholar
[24]	Ren Y, Zhang T, Dong X, Li W, Wang Z, et al. 2024. WaterGPT: training a large language model to become a hydrology expert. Water 16(21):3075 doi: 10.3390/w16213075 CrossRef Google Scholar
[25]	Gunasekar S, Joselin Retna Kumar G, Dileep Kumar Y. 2022. Sustainable optimized LSTM-based intelligent system for air quality prediction in Chennai. Acta Geophysica 70:2889−2899 doi: 10.1007/s11600-022-00796-6 CrossRef Google Scholar
[26]	Wu Z, Liu N, Li G, Liu X, Wang Y, et al. 2023. Meta-learning-based spatial-temporal adaption for coldstart air pollution prediction. International Journal of Intelligent Systems 2023:3734557 doi: 10.1155/2023/3734557 CrossRef Google Scholar
[27]	Panneerselvam V, Thiagarajan R. 2023. ACBiGRU-DAO: attention convolutional bidirectional gated recurrent unit-based dynamic arithmetic optimization for air quality prediction. Environmental Science and Pollution Research 30:86804−86820 doi: 10.1007/s11356-023-28028-4 CrossRef Google Scholar
[28]	Liu Z, Yang Q, Shao J, Wang G, Liu H, et al. 2022. Improving daily precipitation estimation in the data scarce area by merging rain gauge and TRMM data with a transfer learning framework. Journal of Hydrology 613:128455 doi: 10.1016/j.jhydrol.2022.128455 CrossRef Google Scholar
[29]	Patra SR, Chu HJ, Tatas. 2023. Regional groundwater sequential forecasting using global and local LSTM models. Journal of Hydrology: Regional Studies 47:101442 doi: 10.1016/j.ejrh.2023.101442 CrossRef Google Scholar
[30]	Zhao X, Greenberg J, An Y, Hu XT. 2021. Fine-tuning BERT model for materials named entity recognition. 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 2021. US: IEEE. pp. 3717−3720 doi: 10.1109/BigData52589.2021.9671697
[31]	Kang Y, Kim J. 2024. ChatMOF: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models. Nature Communications 15:4705 doi: 10.1038/s41467-024-48998-4 CrossRef Google Scholar
[32]	Duan H, Skreta M, Cotta L, Rajaonson EM, Dhawan N, et al. 2025. Boosting the predictive power of protein representations with a corpus of text annotations. Nature Machine Intelligence 7:1403−1413 doi: 10.1038/s42256-025-01088-6 CrossRef Google Scholar
[33]	Shi H, Zhao Y. 2024. Integration of advanced large language models into the construction of adverse outcome pathways: opportunities and challenges. Environmental Science & Technology 58:15355−15358 doi: 10.1021/acs.est.4c07524 CrossRef Google Scholar
[34]	Yang J, Xu H, Mirzoyan S, Chen T, Liu Z, et al. 2024. Poisoning medical knowledge using large language models. Nature Machine Intelligence 6:1156−1168 doi: 10.1038/s42256-024-00899-3 CrossRef Google Scholar
[35]	Chen Q, Hu Y, Peng X, Xie Q, Jin Q, et al. 2025. Benchmarking large language models for biomedical natural language processing applications and recommendations. Nature Communications 16:3280 doi: 10.1038/s41467-025-56989-2 CrossRef Google Scholar
[36]	Zhu JJ, Yang M, Jiang J, Bai Y, Chen D, et al. 2024. Enabling GPTs for expert-level environmental engineering question answering. Environmental Science & Technology Letters 11:1327−1333 doi: 10.1021/acs.estlett.4c00665 CrossRef Google Scholar
[37]	Boiko DA, MacKnight R, Kline B, Gomes G. 2023. Autonomous chemical research with large language models. Nature 624:570−578 doi: 10.1038/s41586-023-06792-0 CrossRef Google Scholar
[38]	Bran AM, Cox S, Schilter O, Baldassari C, White AD, Schwaller P. 2024. Augmenting large language models with chemistry tools. Nature Machine Intelligence 6:525−535 doi: 10.1038/s42256-024-00832-8 CrossRef Google Scholar
[39]	Zheng Y, Koh HY, Ju J, Nguyen ATN, May LT, et al. 2025. Large language models for scientific discovery in molecular property prediction. Nature Machine Intelligence 7:437−447 doi: 10.1038/s42256-025-00994-z CrossRef Google Scholar
[40]	Lane TR, Vignaux PA, Harris JS, Snyder SH, Urbina F, et al. 2025. Machine learning and large language models for modeling complex toxicity pathways and predicting steroidogenesis. Environmental Science & Technology 59:13844−13856 doi: 10.1021/acs.est.5c04054 CrossRef Google Scholar
[41]	Bodnar C, Bruinsma WP, Lucic A, Stanley M, Allen A, et al. 2025. A foundation model for the Earth system. Nature 641:1180−1187 doi: 10.1038/s41586-025-09005-y CrossRef Google Scholar
[42]	Hayes T, Rao R, Akin H, Sofroniew NJ, Oktay D, et al. 2025. Simulating 500 million years of evolution with a language model. Science 387:850−858 doi: 10.1126/science.ads0018 CrossRef Google Scholar
[43]	Chan N, Parker F, Bennett W, Wu T, Jia MY, et al. 2024. MedTsLLM: leveraging LLMs for multimodal medical time series analysis. arXiv 00:2408.07773 doi: 10.48550/arXiv.2408.07773 CrossRef Google Scholar
[44]	Wang Z, Jin Q, Wei CH, Tian S, Lai PT, et al. 2025. GeneAgent: self-verification language agent for gene-set analysis using domain databases. Nature Methods 22:1677−1685 doi: 10.1038/s41592-025-02748-6 CrossRef Google Scholar
[45]	Dhar P. 2020. The carbon impact of artificial intelligence. Nature Machine Intelligence 2:423−425 doi: 10.1038/s42256-020-0219-9 CrossRef Google Scholar
[46]	Perković G, Drobnjak A, Botički I. 2024. Hallucinations in LLMs: understanding and addressing challenges. 2024 47^th MIPRO ICT and Electronics Convention (MIPRO), Opatija, Croatia, 2024. US: IEEE. pp. 2084−2088 doi: 10.1109/MIPRO60963.2024.10569238
[47]	Strubell E, Ganesh A, McCallum A. 2019. Energy and policy considerations for deep learning in NLP. Proc. The 57^th Annual Meeting of the Association for Computational Linguistics, Italy, 2019. pp. 3645−3650
[48]	Herrera M, Xie X, Menapace A, Zanfei A, Brentan BM. 2025. Sustainable AI infrastructure: a scenario-based forecast of water footprint under uncertainty. Journal of Cleaner Production 526:146528 doi: 10.1016/j.jclepro.2025.146528 CrossRef Google Scholar
[49]	Zhang Y, Lin S, Xiong Y, Li N, Zhong L, et al. 2025. Fine-tuning large language models for interdisciplinary environmental challenges. Environmental Science and Ecotechnology 27:100608 doi: 10.1016/j.ese.2025.100608 CrossRef Google Scholar

About this article

Cite this article

Li Q, Cheng F, You J. 2026. Large language models in aquatic risk assessment: research status and future perspectives. Environmental and Biogeochemical Processes 2: e007 doi: 10.48130/ebp-0026-0002

Li Q, Cheng F, You J. 2026. Large language models in aquatic risk assessment: research status and future perspectives. Environmental and Biogeochemical Processes 2: e007 doi: 10.48130/ebp-0026-0002

Figures(4) / Tables(1)

Download PDF

Article Metrics

Article views(505) PDF downloads(283)

Tasks	Category	Performance	Ref.
Oxidative stress inventory extraction	NER	Through optimization of prompt engineering on GPT-4, the values of 0.91, 0.81, and 0.86 were achieved for precision, recall, and F₁ score, respectively.	[9]
Host-dopant extraction	NER, RE	Llama-2 (precision = 0.836, recall = 0.807, F₁ = 0.821) outperforms MatBERT-Proximity (precision = 0.377, recall = 0.403, F₁ = 0.390) in terms of overall performance.	[8]
Note information extraction	NER, RE	GENIE (F₁ = 0.837, accuracy = 0.912) outperforms cTAKES (F₁ = 0.182, accuracy = 0.748).	[11]
Object detection and waterbody extraction	RE, SG	WaterGPT achieves an accuracy of 0.96 on simple tasks and 0.90 on complex tasks.	[24]
MOFs prediction and generation	SG	The accuracy analysis reports 96.9% and 95.7% for the search and prediction tasks, respectively.	[31]
Expert-level question answering	SG	GPT-4 achieves a relevance of 0.644 and a factuality of 0.791.	[36]
Molecular property prediction	SG	In classification tasks within the field of physiology, the AUC-ROC improved from the previous state-of-the-art of 74.53% to 76.60%; in biophysics classification tasks, the average AUC-ROC reached 79.10; for regression tasks in physical chemistry, the average RMSE was 1.54; and in quantum mechanics tasks, the average MAE was 5.8233, representing a 48.2% improvement over the baseline.	[39]
Modeling complex toxicity pathways and predicting steroidogenesis	SG	In the classification task for target inhibitors, MolBART achieved an AUC above 0.85 and an F₁ score over 0.7; in the task of predicting IC50 values, it attained an R² over 0.7, with an MAE below 0.5 and an RMSE under 0.8.	[40]

{{lists.name}}

Large language models in aquatic risk assessment: research status and future perspectives

Abstract

Author details