Coastal Water Pathogen Database profiles pathogenic microorganisms in coastal environments

Shangheng Song; Zelin Lei; Yuan Feng; Wenxiu Wang; Fang Huang; Yanmei Zhao; Jiping Jiang; Mengqi Sun; Ai-Jie Wang; Shu-Hong Gao; Marwan Majzoub; Lu Fan; Shangheng Song; Zelin Lei; Yuan Feng; Wenxiu Wang; Fang Huang; Yanmei Zhao; Jiping Jiang; Mengqi Sun; Ai-Jie Wang; Shu-Hong Gao; Marwan Majzoub; Lu Fan

doi:10.48130/biocontam-0026-0006

2026 Volume 2

Article Contents

Next Previous

ORIGINAL RESEARCH Open Access

Coastal Water Pathogen Database profiles pathogenic microorganisms in coastal environments

Correspondence: Shu-Hong Gao (gaoshuhong@hit.edu.cn); Lu Fan (fanl@sustech.edu.cn)

^# Authors contributed equally: Shangheng Song, Zelin Lei, Yuan Feng, Wenxiu Wang
Full list of author information is available at the end of the article.

Received: 21 January 2026
Revised: 30 March 2026
Accepted: 21 April 2026
Published online: 01 May 2026
Biocontaminant 2, Article number: e009 (2026) | Cite this article

Highlights

CWPD contains abundant information on pathogens belonging to bacteria, fungi, and viruses. It also shows data on antibiotic resistance genes and virulence factors, providing a comprehensive picture of biological risk factors from samples.

It has a map-based, interactive interface for visual display and navigation.

It provides an online annotation service that allows users to detect pathogens in their uploaded metagenome data.

It provides an Application Programming Interface for easy connecting to external ecological or clinical databases.
Abstract

Pathogenic microorganisms in coastal environments pose significant risks to public health and global biosafety, yet their distribution patterns remain insufficiently characterized on a global scale. We developed the Coastal Water Pathogen Database (CWPD), a comprehensive, open-access platform that integrates metagenomic sequences from diverse coastal environments to profile pathogenic bacteria, viruses, and fungi alongside antibiotic resistance genes (ARGs) and virulence factors (VFs). CWPD contains pathogen information from 158 samples from six coastal environments on three continents from 2010 to 2019. A map-based home page interface provides a quick overview of the global distribution of the most abundant pathogens, and region-specific maps can be navigated for detailed visualization. The website of CWPD includes an online analysis module for user-uploaded metagenomic data, facilitating rapid biological risk assessment. Finally, the database structure is designed with standard protocols for easy interfacing with other databases. CWPD serves as a vital resource for interdisciplinary studies in environmental microbiology, public health, and the One Health framework, providing technical support for pandemic preparedness and coastal ecosystem management.

Graphical Abstract
- Pathogen,
- Coastal environment,
- Antibiotic resistance genes,
- Biological risk factors,
- Metagenome
Author details
- 1.
  Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
- 2.
  Advanced Institute of Ocean Research, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
- 3.
  State Key Laboratory of Urban-Rural Water Resource and Environment, School of Ecoenvironment, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
- 4.
  School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
- 5.
  School of Biomedical Sciences, University of New South Wales, Sydney, NSW 2052, Australia

Supplementary information

The supplementary files can be downloaded from here.

Rights and permissions
Copyright: © 2026 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Stewart JR, Gast RJ, Fujioka RS, Solo-Gabriele HM, Meschke JS, et al. 2008. The coastal environment and human health: microbial indicators, pathogens, sentinels and reservoirs. Environmental Health 7:S3 doi: 10.1186/1476-069X-7-S2-S3 CrossRef Google Scholar
[2]	Zheng D, Yin G, Liu M, Chen C, Jiang Y, et al. 2021. A systematic review of antibiotics and antibiotic resistance genes in estuarine and coastal environments. Science of The Total Environment 777:146009 doi: 10.1016/j.scitotenv.2021.146009 CrossRef Google Scholar
[3]	Ahmed W, Payyappat S, Cassidy M, Harrison N, Besley C. 2023. Microbial source tracking of untreated human wastewater and animal scats in urbanized estuarine waters. Science of The Total Environment 877:162764 doi: 10.1016/j.scitotenv.2023.162764 CrossRef Google Scholar
[4]	Ahmed W, Korajkic A, Smith WJ, Payyappat S, Cassidy M, et al. 2024. Comparing the decay of human wastewater-associated markers and enteric viruses in laboratory microcosms simulating estuarine waters in a temperate climatic zone using qPCR/RT-qPCR assays. Science of The Total Environment 908:167845 doi: 10.1016/j.scitotenv.2023.167845 CrossRef Google Scholar
[5]	Rubin-Blum M, Harbuzov Z, Cohen R, Astrahan P. 2023. Anthropogenic and natural disturbances along a river and its estuary alter the diversity of pathogens and antibiotic resistance mechanisms. Science of The Total Environment 887:164108 doi: 10.1016/j.scitotenv.2023.164108 CrossRef Google Scholar
[6]	Câmara JS, Montesdeoca-Esponda S, Freitas J, Guedes-Alonso R, Sosa-Ferrera Z, et al. 2021. Emerging contaminants in seafront zones. Environmental impact and analytical approaches. Separations 8:95 doi: 10.3390/separations8070095 CrossRef Google Scholar
[7]	Brandão J, Weiskerger C, Valério E, Pitkänen T, Meriläinen P, et al. 2022. Climate change impacts on microbiota in beach sand and water: looking ahead. International Journal of Environmental Research and Public Health 19:1444 doi: 10.3390/ijerph19031444 CrossRef Google Scholar
[8]	Fernández-Juárez V, Riedinger DJ, Gusmao JB, Delgado-Zambrano LF, Coll-García G, et al. 2024. Temperature, sediment resuspension, and salinity drive the prevalence of Vibrio vulnificus in the coastal Baltic Sea. mBio 15:e01569-24 doi: 10.1128/mbio.01569-24 CrossRef Google Scholar
[9]	Dong P, Guo H, Wang Y, Cheng H, Wang K, et al. 2021. DPiWE: a curated database for pathogenic bacteria involved in water environment. Journal of Fisheries of China 45:1921−1933 doi: 10.11964/jfc.20210612935 CrossRef Google Scholar
[10]	Lo LSH, Liu X, Liu H, Shao M, Qian PY, et al. 2023. Aquaculture bacterial pathogen database: pathogen monitoring and screening in coastal waters using environmental DNA. Water Research X 20:100194 doi: 10.1016/j.wroa.2023.100194 CrossRef Google Scholar
[11]	Emmenegger EJ, Kentop E, Thompson TM, Pittam S, Ryan A, et al. 2011. Development of an aquatic pathogen database (AquaPathogen X) and its utilization in tracking emerging fish virus pathogens in North America. Journal of Fish Diseases 34:579−587 doi: 10.1111/j.1365-2761.2011.01270.x CrossRef Google Scholar
[12]	Alcock BP, Raphenya AR, Lau TTY, Tsang KK, Bouchard M, et al. 2019. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Research 48:D517−D525 doi: 10.1093/nar/gkz935 CrossRef Google Scholar
[13]	Liu B, Zheng D, Zhou S, Chen L, Yang J. 2022. VFDB 2022: a general classiﬁcation scheme for bacterial virulence factors. Nucleic Acids Research 50:D912−D917 doi: 10.1093/nar/gkab1107 CrossRef Google Scholar
[14]	Roehr JT, Dieterich C, Reinert K. 2017. Flexbar 3.0 – SIMD and multicore parallelization. Bioinformatics 33:2941−2942 doi: 10.1093/bioinformatics/btx330 CrossRef Google Scholar
[15]	Jiang H, Lei R, Ding SW, Zhu S. 2014. Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics 15:182 doi: 10.1186/1471-2105-15-182 CrossRef Google Scholar
[16]	Buchfink B, Reuter K, Drost HG. 2021. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nature Methods 18:366−368 doi: 10.1038/s41592-021-01101-x CrossRef Google Scholar
[17]	Leung CM, Li D, Xin Y, Law WC, Zhang Y, et al. 2020. MegaPath: sensitive and rapid pathogen detection using metagenomic NGS data. BMC Genomics 21:500 doi: 10.1186/s12864-020-06875-6 CrossRef Google Scholar
[18]	Li D, Liu CM, Luo R, Sadakane K, Lam TW. 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674−1676 doi: 10.1093/bioinformatics/btv033 CrossRef Google Scholar
[19]	Hyatt D, Chen GL, LoCascio PF, Land ML, Larimer FW, et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119 doi: 10.1186/1471-2105-11-119 CrossRef Google Scholar
[20]	Arango-Argoty G, Garner E, Pruden A, Heath LS, Vikesland P, et al. 2018. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome 6:23 doi: 10.1186/s40168-018-0401-z CrossRef Google Scholar
[21]	Nousias O, Montesanto F. 2021. Metagenomic profiling of host-associated bacteria from 8 datasets of the red alga Porphyra purpurea with MetaPhlAn3. Marine Genomics 59:100866 doi: 10.1016/j.margen.2021.100866 CrossRef Google Scholar
[22]	Jiang J, Pang T, Zhang F, Men Y, Yadav H, et al. 2022. Pathway to encapsulate the surface water quality model and its applications as cloud computing services and integration with EDSS for managing urban water environments. Environmental Modelling & Software 148:105280 doi: 10.1016/j.envsoft.2021.105280 CrossRef Google Scholar
[23]	Raza S, Shin H, Hur HG, Unno T. 2022. Higher abundance of core antimicrobial resistant genes in effluent from wastewater treatment plants. Water Research 208:117882 doi: 10.1016/j.watres.2021.117882 CrossRef Google Scholar
[24]	Xu B, Li F, Cai L, Zhang R, Fan L, et al. 2022. A holistic genome dataset of bacteria, archaea and viruses of the Pearl River estuary. Scientific Data 9:49 doi: 10.1038/s41597-022-01153-4 CrossRef Google Scholar
[25]	Zhang C, Du XP, Zeng YH, Zhu JM, Zhang SJ, et al. 2021. The communities and functional profiles of virioplankton along a salinity gradient in a subtropical estuary. Science of The Total Environment 759:143499 doi: 10.1016/j.scitotenv.2020.143499 CrossRef Google Scholar
[26]	Parvathi A, Jasna V, Jina S, Jayalakshmy KV, Lallu KR, et al. 2015. Effects of hydrography on the distribution of bacteria and virus in Cochin estuary, India. Ecological Research 30:85−92 doi: 10.1007/s11284-014-1214-6 CrossRef Google Scholar
[27]	Fortunato CS, Crump BC. 2015. Microbial gene abundance and expression patterns across a river to ocean salinity gradient. PLoS One 10:e0140578 doi: 10.1371/journal.pone.0140578 CrossRef Google Scholar
[28]	Sun M, Zhan Y, Marsan D, Páez-Espino D, Cai L, et al. 2021. Uncultivated viral populations dominate estuarine viromes on the spatiotemporal scale. mSystems 6:e01020-20 doi: 10.1128/msystems.01020-20 CrossRef Google Scholar
[29]	Hugerth LW, Larsson J, Alneberg J, Lindh MV, Legrand C, et al. 2015. Metagenome-assembled genomes uncover a global brackish microbiome. Genome Biology 16:279 doi: 10.1186/s13059-015-0834-7 CrossRef Google Scholar
[30]	Larsson J, Celepli N, Ininbergs K, Dupont CL, Yooseph S, et al. 2014. Picocyanobacteria containing a novel pigment gene cluster dominate the brackish water Baltic Sea. The ISME Journal 8:1892−1903 doi: 10.1038/ismej.2014.35 CrossRef Google Scholar
[31]	Alneberg J, Sundh J, Bennke C, Beier S, Lundin D, et al. 2018. BARM and BalticMicrobeDB, a reference metagenome and interface to meta-omic data for the Baltic Sea. Scientific Data 5:180146 doi: 10.1038/sdata.2018.146 CrossRef Google Scholar
[32]	Alneberg J, Bennke C, Beier S, Bunse C, Quince C, et al. 2020. Ecosystem-wide metagenomic binning enables prediction of ecological niches from genomes. Communications Biology 3:119 doi: 10.1038/s42003-020-0856-x CrossRef Google Scholar
[33]	Robins PE, Dickson N, Kevill JL, Malham SK, Singer AC, et al. 2022. Predicting the dispersal of SARS-CoV-2 RNA from the wastewater treatment plant to the coast. Heliyon 8:e10547 doi: 10.1016/j.heliyon.2022.e10547 CrossRef Google Scholar
[34]	Simner PJ, Miller S, Carroll KC. 2018. Understanding the promises and hurdles of metagenomic next-generation sequencing as a diagnostic tool for infectious diseases. Clinical Infectious Diseases 66:778−788 doi: 10.1093/cid/cix881 CrossRef Google Scholar
[35]	Wang Y, Li L, Li Q, Hu Y, Li W, et al. 2024. MASH-Ocean 1.0: interactive platform for investigating microbial diversity, function, and biogeography with marine metagenomic data. iMeta 3:e201 doi: 10.1002/imt2.201 CrossRef Google Scholar
[36]	Wang S, Peng H, Liang S. 2022. Prediction of estuarine water quality using interpretable machine learning approach. Journal of Hydrology 605:127320 doi: 10.1016/j.jhydrol.2021.127320 CrossRef Google Scholar

About this article

Cite this article

Song S, Lei Z, Feng Y, Wang W, Huang F, et al. 2026. Coastal Water Pathogen Database profiles pathogenic microorganisms in coastal environments. Biocontaminant 2: e009 doi: 10.48130/biocontam-0026-0006

Song S, Lei Z, Feng Y, Wang W, Huang F, et al. 2026. Coastal Water Pathogen Database profiles pathogenic microorganisms in coastal environments. Biocontaminant 2: e009 doi: 10.48130/biocontam-0026-0006

Figures(4) / Tables(2)

Download PDF

Article Metrics

Article views(1204) PDF downloads(250)

Other Articles By Authors

on this site
on Google Scholar

HTML

Introduction

Coastal zones, particularly estuaries, serve as the intersection points between marine and terrestrial ecosystems, possessing rich biodiversity and complex biogeochemical cycles, and are closely related to public health^[1]. Coasts are also potential hubs for biological hazard factors such as pathogenic microorganisms and the antibiotic resistance genes (ARGs) and virulence factor (VF) genes that they encode, posing serious threats to human health and ecosystem safety^[2]. Domestic sewage and livestock farming wastewater discharges are among the main sources of pathogens in nearshore waters. Although most existing sewage treatment plants perform disinfection before discharge, some disinfectant-resistant pathogens may still be released into natural water bodies^[3]. At the same time, nearshore aquaculture can also release pathogens into seawater. Common pathogenic bacteria (PB) reported in coastal waters include Escherichia coli, Salmonella, Shigella, Campylobacter, Vibrio cholerae, Yersinia enterocolitica, Listeria monocytogenes, and Aeromonas veronii; while Channel Catfish Virus, Infectious Hematopoietic Necrosis Virus, and Viral Hemorrhagic Septicemia Virus are among the common viruses^[4,5].

The urgency for surveillance of these biological hazards is amplified by the dual pressures of rapid urbanization and global climate change. Nearly 40% of the world's population resides within 100 km of a coastline, leading to unprecedented levels of wastewater production and environmental degradation^[6]. Climate change exacerbates these risks through rising sea surface temperatures, which favor the proliferation of thermophilic pathogens such as Vibrio species, and altered precipitation patterns that increase the frequency of sewer overflows and the mobilization of soil-borne pathogens into recreational and aquaculture waters^[7,8].

Currently, databases on pathogenic microorganisms in coastal environmental waters remain scarce. Dong et al. constructed the Database of Pathogenic Bacteria Involved in Water Environment (DPiWE), which has collected 9,070 pathogenic bacterial strains along with their corresponding 16S rRNA gene sequences, host information, and infection types^[9]. Lo et al. constructed the Aquaculture Bacterial Pathogen Database (ABPD), which systematically catalogs over 210 bacterial pathogen species in aquaculture^[10]. Emmenegger et al. constructed the AquaPathogen X database, which documents individual isolates of aquatic pathogens and supports the tracking and management of emerging fish viruses in North America^[11]. However, these databases lack information on the geographical and temporal distribution and abundance of pathogens, making it impossible to conduct research on pollution tracing, environmental tracking, and risk assessment. At the same time, there is still a lack of database construction for animal and human pathogenic viruses (PV) in coastal waters.

In this study, we introduce the Coastal Water Pathogen Database (CWPD), which is an online resource platform dedicated to the survey of microbial pathogens in coastal environments. It aims to provide data support for in-depth analysis of the distribution characteristics, potential biological risks, and ecological impacts of nearshore biological hazards. By integrating metagenomic data from coastal environments across public databases, the platform systematically extracts genomic fragment information of PB and PV, along with key biological risk factors (BRFs) such as ARGs and VFs. CWPD integrates multidimensional analytical capabilities: supporting composition and abundance statistics for pathogens and ARGs, interactive spatial distribution visualization based on major global estuaries, precise information retrieval tools, and annotation analysis services for user-defined metagenomic data. By integrating publicly available metagenomic data with geographic context, the platform aims to elucidate the spatiotemporal dynamics and biological risk levels of pathogens in coastal ecosystems and provide scientific evidence and technical support for estuarine ecological research, public health management, and environmental policy formulation.

Materials and methods

Construction of the BRF knowledge dictionary

We collected information on 383 bacterial, viral, and eukaryotic pathogens from the following resources: the National Center for Biotechnology Information (NCBI) (www.ncbi.nlm.nih.gov), the Centers for Disease Control and Prevention (CDC) (www.cdc.gov), China CDC (www.chinacdc.cn), the International Committee on Taxonomy of Viruses (ICTV) (https://ictv.global), and the Pathogen-Host Interaction Database (PHI-base, www.phi-base.org). To resolve synonyms, across the above pathogen data sources, the same microorganism often appears under different naming conventions—for example: SARS-CoV-2, 2019-nCoV, and COVID-19 virus. We utilized a proprietary 'Synonym Mapping Table' with the NCBI Taxonomy ID (NCBI_ID) serving as the unique identifier. Following manual verification, names, abbreviations, or legacy terms from external databases are mapped to the standard scientific name. All pathogen names are then mapped to the NCBI Taxonomy database, using the NCBI Taxonomy ID as the sole standard identifier. This establishes a unified taxonomic framework from kingdom to strain, effectively resolving inconsistencies caused by differing naming systems across various databases. Pathogen inclusion is based on explicit evidence of pathogenicity from authoritative organizations and databases. This includes: (1) official reports of human infection or pathogenicity from national CDC (such as the US CDC and China CDC), ICTV, NCBI, and the WHO; (2) verified host-pathogen interaction data from PHI-base; and (3) a focus on pathogens with clear evidence of human infection and zoonotic pathogens. Pathogens that exclusively infect animals or plants (particularly aquatic species) are included in the underlying database but are explicitly labeled with their corresponding potential host categories. To ensure high-fidelity data, pathogens were required to meet at least one of the following hierarchical thresholds: (1) statutory priority: Inclusion in the WHO or national CDC lists of statutory infectious diseases or priority surveillance programs; (2) verified pathogenicity: documented evidence of pathogenic mechanisms within the PHI-base repository; and (3) genomic integrity: possession of a verified, complete reference genome curated within the NCBI RefSeq or GenBank databases.

Information on ARGs and ARG categories was sourced from the Comprehensive Antibiotic Resistance Database (CARD)^[12]. VF information was sourced from the Virulence Factor Database (VFDB)^[13]. All these elements constituted the CWPD's BRF knowledge dictionary.

Metagenomic data collection
We searched for studies with metagenomic sequencing data related to coastal environments. The screening criteria for the project data were as follows: (1) sequencing method: shotgun high-throughput sequencing based on the Illumina platform with a minimum sequencing depth of 10 GB; (2) sampling period: 2010–2024 and released before Feb 2, 2024; and (3) water samples from estuarine, coastal, and inland sea environments. Data from 16S rRNA gene sequencing were excluded. We selected shotgun metagenomic data generated exclusively on the Illumina platform to minimize errors associated with platform-specific sequencing biases. A minimum sequencing depth of 10 GB per sample was mandated, as lower depths frequently failed to capture rare but clinically significant pathogens and ARGs in complex environmental matrices. We retrieved sequence read archives for these samples from the NCBI database. Associated metadata, including the dates of sample collection and the geographic coordinates (i.e., latitude and longitude) of the sampling sites, were obtained.

Metagenomic data analysis
Raw reads were trimmed and quality-filtered using flexbar (v3.5.0)^[14] and skewer (v0.2.2)^[15]. Reads of average Q-scores greater than 30 and lengths greater than 75 nucleotides (nt) were kept. Community composition was classified using MetaPhlAn (v3.0) with default settings^[16]. The composition of pathogenic microorganisms was analyzed using MegaPath (v2.0)^[17] with the default parameters. MegaPath includes several built-in steps intended to reduce spurious calls: it removes reads that align to the human reference genome and to a database of broadly homologous/repetitive regions prior to pathogen alignment (the paper describes these as 'confidently aligned' reads). MegaPath then applies 'spike polishing,' in which genomic regions with abnormally high read depth (depth > mean + α·sd) are treated as likely repetitive/homologous; alignments in these regions are removed, with default α = 30. Finally, MegaPath performs a two-stage assignment and reassignment procedure that explicitly leverages uniquely assigned reads (UCount) vs multi-assigned reads (MCount), using default thresholds r = e = 0.05 in its 'explains' logic before final LCA-based taxon assignment; this global reassignment is described as reducing false-positive assignments. Information on pathogenic microorganisms was generated from the MegaPath results via matching species names against our knowledge dictionary.

To annotate ARGs and VFs, clean reads were assembled using MEGAHIT^[18] with k-mer parameters set to 21, 29, 39, 59, 79, 99, 119, and 141. Sequences longer than 2 kb were retained for further analysis. Potential coding sequences were predicted using Prodigal (v2.6.3, -meta)^[19]. The setting of a 2 kb threshold allowed for the prediction of complete or near-complete open reading frames, providing the genomic context necessary for accurate functional assignment. We annotated predicted protein sequences for ARGs using DeepARG (probability threshold 0.8, which is the recommended default to balance sensitivity and precision in the classification of ARGs)^[20]. VFs were annotated using Diamond BlastP^[21] against the VFDB database (e-value threshold 1e-5, ensuring that the alignments are statistically significant). These settings were consistent with established benchmarks for metagenomic profiling in complex environmental matrices, where the signal-to-noise ratio can be compromised by the high abundance of non-target genetic material. To estimate ARG and VF abundance, clean reads were mapped to ARG- and VF-containing contigs using BWA-MEM (v2.3.2)^[22]. Reads with mapping coverage below 90% of the read length or with less than 99% identity were removed using Bamm (https://github.com/Ecogenomics/BamM). The number of mapped reads was counted using BBMap^[23]. We calculated the copy number per 1,000 genomes of ARGs and VFs in each sample by normalizing to the average copy numbers of conserved single-copy genes, including COG0048, COG0049, COG0087, COG0088, COG0091, COG0093, COG0094, COG0096, COG0097, COG0099, COG0100, COG0102, COG0184, COG0186, COG0256, and COG0522.

Clinical case cross-reference
To cross-reference our environmental findings with clinical surveillance reports, we leveraged several large language models—including Gemini 3 Thinking (https://gemini.google.com/app), ChatGPT 5.4 Thinking (https://chatgpt.com/), and DeepSeek 671B (www.deepseek.com/)—to identify public health events or outbreaks associated with the dominant bacterial and viral pathogens detected in specific coastal regions during corresponding time periods. All AI-generated results and information sources were subsequently subjected to rigorous manual verification to ensure data integrity and accuracy (Supplementary Table S1).

Database reconstruction and network implementation
In CWPD, we used a PostgreSQL database to store knowledge dictionary data and annotation results for pathogenic microorganisms, ARGs, and VFs in samples. PostGIS, a PostgreSQL plugin, was used to store geographical information. Axios (www.axios-js.com/docs/react-axios.html) was used for the search function using exact and regular expression matching. The website's backend was implemented using Python (www.python.org/). The web interface was built using VS Code with Vue.js (https://cn.vuejs.org/), CSS, HTML, JavaScript, and Element (https://element.eleme.cn/#/en-US). Echarts (https://echarts.apache.org/en/) was used for heatmap and scatter plot visualizations. A fragment upload method was used in the annotation module for segmentation and to record the number of slices to enable resume-from-breakpoint functionality. Submitting a task triggers a file transfer to the OpenGMS model and initiates an asynchronous computation task in the allocation queue, reflecting the task status. Upon completion of the computation, the results are stored in the PostgreSQL database.

Factors	Numbers
Samples	158
Coastal regions	6
Salinity	0.06‰–35.08‰
Temperature	1.64–31.50 °C
Pathogens	403 (1,904)
Bacteria	361 (991)
Viruses	42 (514)
Fungi	0 (399)
ARGs
ARG classes	30 (47)
ARG subtypes	287 (2,373)
VFs	10 (541)
The number in parentheses indicates the total number of entries in the database knowledge dictionary.

Feature	CWPD	DPiWE^[9]	ABPD^[10]	AquaPathogen X^[11]
Pathogen diversity	Bacteria, viruses, fungi	Bacteria	Bacteria (aquaculture)	Multi-taxa (isolates)
Functional factors	ARGs and VFs	NA	NA	Epidemiological traits
Data source type	Shotgun metagenomics	16S rRNA/isolates	eDNA metabarcoding	Individual isolates
Spatial visualization	Interactive webGIS	Static/result-based	Regional profiles	Template-based
Analysis utility	Online mNGS pipeline	Sequence alignment	Monitoring support	Surveillance tool
Temporal range	2010–2024 (global)	Not specified	One year (regional)	Isolate history

{{lists.name}}

Coastal Water Pathogen Database profiles pathogenic microorganisms in coastal environments

Highlights

Abstract