A chromosomal-level genome assembly of the American shad: insights into phylogenetic relationships

Yuhong Zheng; Siwei Deng; Jia Du; Qinghua Liu; Yuhong Zheng; Siwei Deng; Jia Du; Qinghua Liu

doi:10.48130/gcomm-0025-0005

2025 Volume 2

Article Contents

Next Previous

RESOURCE Open Access

A chromosomal-level genome assembly of the American shad: insights into phylogenetic relationships

1.
Suzhou Fish Seeds Biotechnology, Ltd, Suzhou 215141, China
2.
Suzhou Health-Origin Biotechnology, Ltd, Suzhou 215138, China
3.
College of Science and Technology, Wenzhou-Kean University, Wenzhou 325399, China

More Information

Corresponding author: aegischina@gmail.com

Received: 21 October 2024
Revised: 25 February 2025
Accepted: 27 February 2025
Published online: 28 March 2025
Genomics Communications 2, Article number: e006 (2025) | Cite this article

Abstract

American shad is an important fishery species distributed in North America. It was introduced to China in 2003 and has become one of China's most expensive aquaculture species. Here we report a high-quality, chromosomal-level genome assembly using the PacBio sequencing platform and tissue-specific transcriptomic data. The results showed the genome size is 920 Mb, with G + C content of 43.19%, contig N50 and scaffold N50 of 1.15 and 37.99 Mb. All the sequences can be successfully clustered on 24 chromosomes, with an attachment rate of 97.51%. 28,120 coding genes encoding 29,212 proteins were identified. A phylogenetic tree was constructed based on 250 single-copy orthologous genes selected from 20,706 gene families with 18 fish species, which suggests it is most closely related to European pilchard. For genome comparison, 1,017 expanded and 2,573 contracted gene families, including 81 expanded and 143 contracted ones were identified. One hundred and sixty nine unique gene families were obtained, which can explain the biological traits. Enrichment analysis of tandem duplicated genes and positively selected genes showed that terms related to biological processes for basic life activities were enriched, such as gene expression, biosynthesis, metabolism, movement, and immunization. The data can provide invaluable information to understand habitual natureo American shad.
- Genomics,
- Transcriptomics,
- Genome annotation,
- Phylogenetic analysis,
- Evolution

Supplementary information

Supplementary Table S1 Statistics of PacBio sequencing data of American shad genome.
Supplementary Table S2 Statistics of NGS data mapping to genome.
Supplementary Table S3 Summary of orthology gene numbers in phylogenetic tree construction.
Supplementary Table S4 GO enrichment of rapidly expanded and contracted genes in the American shad genome.
Supplementary Table S5 List of positive selected genes.

Rights and permissions
Copyright: © 2025 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Leggett WC, Whitney RR. 1972. Water temperature and the migrations of American shad. Fishery Bulletin 90:659−70 Google Scholar
[2]	Limburg KE, Hattala K A, Kahnle A. 2003. American shad in its native range. American Fiseries Society Symposium 2003. pp. 125−40
[3]	Jia Y, Chen Y, Goudie CA, Simco BA, Liu Q. 2007. Potential invasion risk of the introduced American shad Alosa sapidissima to aquatic ecosystem in China. Acta Zoologica Sinica 53:625−29 doi: 10.3969/j.issn.1674-5507.2007.04.006 CrossRef Google Scholar
[4]	Liu Q, Zheng Y, Fu L, Simco BA, Goudie CA. 2021. Brood-stock management and natural spawning of American shad (Alosa sapidissima) in a recirculating aquaculture system. Aquaculture 532:735952 doi: 10.1016/j.aquaculture.2020.735952 CrossRef Google Scholar
[5]	Jia Y, Liu Q, Goudie CA, Simco BA. 2009. Survival, growth, and feed utilization of pre- and postmetamorphic American shad exposed to increasing salinity. North American Journal of Aquaculture 71:197−205 doi: 10.1577/a07-095.1 CrossRef Google Scholar
[6]	Liu ZF, Gao XQ, Yu JX, Qian XM, Xue GP, et al. 2017. Effects of different salinities on growth performance, survival, digestive enzyme activity, immune response, and muscle fatty acid composition in juvenile American shad (Alosa sapidissima). Fish Physiology and Biochemistry 43:761−73 doi: 10.1007/s10695-016-0330-3 CrossRef Google Scholar
[7]	Gao X, Hong L, Liu ZF, Guo ZL, Wang YH, et al. 2015. The definition of point of no return of larvae and feeding characteristics of Alosa sapidissima larvae and juveniles. Journal of Fisheries of China 39:392−400 Google Scholar
[8]	Olney JE, Latour RJ, Watkins BE, Clarke DG. 2006. Migratory behavior of American shad in the york river, Virginia, with implications for estimating in-river exploitation from tag recovery data. Transactions of the American Fisheries Society 135:889−96 doi: 10.1577/t05-101.1 CrossRef Google Scholar
[9]	Mann DA, Lu Z, Popper AN. 1997. A clupeid fish can detect ultrasound. Nature 389:341 doi: 10.1038/38636 CrossRef Google Scholar
[10]	Liu QH, Jia YJ, Gao Y, Wang A, Goudie C. 2006. Biological characteristics and intensive-culture management of American shad. Fishery Modernization 1:26−27 doi: 10.3969/j.issn.1007-9580.2006.01.011 CrossRef Google Scholar
[11]	Liu QH, Jia YJ, Gao YL, Qi ZH, Wang AM. 2006. Bottlenecks and Countermeasures of American shad culture (I). Scientific Fish Farming 7:5 Google Scholar
[12]	Nack CC, Swaney DP, Limburg KE. 2019. Historical and projected changes in spawning phenologies of American shad and striped bass in the Hudson River Estuary. Marine and Coastal Fisheries 11:271−84 doi: 10.1002/mcf2.10076 CrossRef Google Scholar
[13]	Arseneau JR, Steeves R, Laflamme M. 2017. Modified low-salt CTAB extraction of high-quality DNA from contaminant-rich tissues. Molecular Ecology Resources 17:686−93 doi: 10.1111/1755-0998.12616 CrossRef Google Scholar
[14]	Chen S, Zhou Y, Chen Y, Gu J. 2018. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884−90 doi: 10.1093/bioinformatics/bty560 CrossRef Google Scholar
[15]	Marçais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764−70 doi: 10.1093/bioinformatics/btr011 CrossRef Google Scholar
[16]	Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, et al. 2017. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33:2202−4 doi: 10.1093/bioinformatics/btx153 CrossRef Google Scholar
[17]	PacificBiosciences. 2021. CCS - Pacific Biosciences. https://github.com/PacificBiosciences/ccs (accessed 19 March 2021)
[18]	Cheng H, Concepcion GT, Feng X, Zhang H, Li H. 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18:170−75 doi: 10.1038/s41592-020-01056-5 CrossRef Google Scholar
[19]	Durand NC, Shamim MS, Machol I, Rao SSP, Huntley MH, et al. 2016. Juicer provides a one-click system for analyzing loop-resolution hi-C experiments. Cell Systems 3:95−98 doi: 10.1016/j.cels.2016.07.002 CrossRef Google Scholar
[20]	Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, et al. 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356:92−95 doi: 10.1126/science.aal3327 CrossRef Google Scholar
[21]	Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754−60 doi: 10.1093/bioinformatics/btp324 CrossRef Google Scholar
[22]	Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, et al. 2016. Juicebox provides a visualization system for hi-C contact maps with unlimited zoom. Cell Systems 3:99−101 doi: 10.1016/j.cels.2015.07.012 CrossRef Google Scholar
[23]	Ou S, Su W, Liao Y, Chougule K, Agda JRA, et al. 2019. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biology 20:275 doi: 10.1186/s13059-019-1905-y CrossRef Google Scholar
[24]	Kim D, Langmead B, Salzberg SL. 2015. HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12:357−60 doi: 10.1038/nmeth.3317 CrossRef Google Scholar
[25]	Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, et al. 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology 33:290−95 doi: 10.1038/nbt.3122 CrossRef Google Scholar
[26]	Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, et al. 2008. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Research 18:188−96 doi: 10.1101/gr.6743907 CrossRef Google Scholar
[27]	Stanke M, Keller O, Gunduz I, Hayes A, Waack S, et al. 2006. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research 34:W435−W439 doi: 10.1093/nar/gkl200 CrossRef Google Scholar
[28]	Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. 2005. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Research 33:6494−506 doi: 10.1093/nar/gki937 CrossRef Google Scholar
[29]	Hoff KJ, Lomsadze A, Borodovsky M, Stanke M. 2019. Whole-genome annotation with BRAKER. In Gene Prediction. Methods in Molecular Biology, ed. Kollmar M. New York: Humana. pp. 65−95. doi: 10.1007/978-1-4939-9173-0_5
[30]	Stanke M, Schöffmann O, Morgenstern B, Waack S. 2006. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7:62 doi: 10.1186/1471-2105-7-62 CrossRef Google Scholar
[31]	Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. 2016. BRAKER1: unsupervised RNA-seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32:767−69 doi: 10.1093/bioinformatics/btv661 CrossRef Google Scholar
[32]	Stanke M, Diekhans M, Baertsch R, Haussler D. 2008. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24:637−44 doi: 10.1093/bioinformatics/btn013 CrossRef Google Scholar
[33]	Korf I. 2004. Gene finding in novel genomes. BMC Bioinformatics 5:59 doi: 10.1186/1471-2105-5-59 CrossRef Google Scholar
[34]	Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210−12 doi: 10.1093/bioinformatics/btv351 CrossRef Google Scholar
[35]	The UniProt Consortium. 2023. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Research 51:D523−D531 doi: 10.1093/nar/gkac1052 CrossRef Google Scholar
[36]	Pruitt KD, Tatusova T, Maglott DR. 2007. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research 35:D61−D65 doi: 10.1093/nar/gkl842 CrossRef Google Scholar
[37]	Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, et al. 2019. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research 47:D309−D314 doi: 10.1093/nar/gky1085 CrossRef Google Scholar
[38]	Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, et al. 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41 doi: 10.1186/1471-2105-4-41 CrossRef Google Scholar
[39]	Buchfink B, Xie C, Huson DH. 2015. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12:59−60 doi: 10.1038/nmeth.3176 CrossRef Google Scholar
[40]	Jones P, Binns D, Chang HY, Fraser M, Li W, et al. 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236−40 doi: 10.1093/bioinformatics/btu031 CrossRef Google Scholar
[41]	Corpet F, Gouzy J, Kahn D. 1998. The ProDom database of protein domain families. Nucleic Acids Research 26:323−26 doi: 10.1093/nar/26.1.323 CrossRef Google Scholar
[42]	Attwood TK. 2002. The PRINTS database: a resource for identification of protein families. Briefings in Bioinformatics 3:252−63 doi: 10.1093/bib/3.3.252 CrossRef Google Scholar
[43]	Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, et al. 2014. Pfam: the protein families database. Nucleic Acids Research 42:D222−30 doi: 10.1093/nar/gkt1223 CrossRef Google Scholar
[44]	Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, et al. 2004. SMART 4.0: towards genomic data integration. Nucleic Acids Research 32:D142−D144 doi: 10.1093/nar/gkh088 CrossRef Google Scholar
[45]	Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, et al. 2003. PANTHER: a library of protein families and subfamilies indexed by function. Genome Research 13:2129−41 doi: 10.1101/gr.772403 CrossRef Google Scholar
[46]	Sigrist CJA, Cerutti L, Hulo N, Gattiker A, Falquet L, et al. 2002. PROSITE: a documented database using patterns and profiles as motif descriptors. Briefings in Bioinformatics 3:265−74 doi: 10.1093/bib/3.3.265 CrossRef Google Scholar
[47]	Lagesen K, Hallin P, Rødland EA, Staerfeldt HH, Rognes T, et al. 2007. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Research 35:3100−8 doi: 10.1093/nar/gkm160 CrossRef Google Scholar
[48]	Nawrocki EP, Eddy SR. 2013. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29:2933−35 doi: 10.1093/bioinformatics/btt509 CrossRef Google Scholar
[49]	Kalvari I, Argasinska J, Quinones-Olvera N, Nawrocki EP, Rivas E, et al. 2018. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Research 46:D335−42 doi: 10.1093/nar/gkx1038 CrossRef Google Scholar
[50]	Lowe TM, Eddy SR. 1997. tRNAscan-SE a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research 25:955−64 doi: 10.1093/nar/25.5.955 CrossRef Google Scholar
[51]	Phytozome. (n.d.). Phytozome v12.1. https://phytozome.jgi.doe.gov (accessed 20 March 2021)
[52]	Coordinators NR. 2018. Database resources of the national center for biotechnology information. Nucleic Acids Research 46:D8−D13 doi: 10.1093/nar/gkx1095 CrossRef Google Scholar
[53]	Sneddon TP, Li P, Edmunds SC. 2012. GigaDB: announcing the GigaScience database. GigaScience 1:11 doi: 10.1186/2047-217X-1-11 CrossRef Google Scholar
[54]	Emms DM, Kelly S. 2019. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology 20:238 doi: 10.1186/s13059-019-1832-y CrossRef Google Scholar
[55]	Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution 30:772−80 doi: 10.1093/molbev/mst010 CrossRef Google Scholar
[56]	Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972−73 doi: 10.1093/bioinformatics/btp348 CrossRef Google Scholar
[57]	Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312−13 doi: 10.1093/bioinformatics/btu033 CrossRef Google Scholar
[58]	Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24:1586−91 doi: 10.1093/molbev/msm088 CrossRef Google Scholar
[59]	Tang H, Bowers JE, Wang X, Ming R, Alam M, et al. 2008. Synteny and collinearity in plant genomes. Science 320:486−88 doi: 10.1126/science.1153917 CrossRef Google Scholar
[60]	Tang H, Krishnakumar V, Zeng X, Xu Z, Taranto A, et al. 2024. JCVI: A versatile toolkit for comparative genomics analysis. iMeta 3(4):e211 doi: 10.1002/imt2.211 CrossRef Google Scholar
[61]	Wang Y, Tang H, Debarry JD, Tan X, Li J, et al. 2012. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Research 40:e49 doi: 10.1093/nar/gkr1293 CrossRef Google Scholar
[62]	De Bie T, Cristianini N, Demuth JP, Hahn MW. 2006. CAFE a computational tool for the study of gene family evolution. Bioinformatics 22:1269−71 doi: 10.1093/bioinformatics/btl097 CrossRef Google Scholar
[63]	Yu G, Wang LG, Han Y, He QY. 2012. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics 16:284−87 doi: 10.1089/omi.2011.0118 CrossRef Google Scholar
[64]	Suyama M, Torrents D, Bork P. 2006. PAL2NAL: robust conversion of protein sequence alignments into the corresponding Codon alignments. Nucleic Acids Research 34:W609−W612 doi: 10.1093/nar/gkl315 CrossRef Google Scholar
[65]	Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution 32:268−74 doi: 10.1093/molbev/msu300 CrossRef Google Scholar
[66]	Xu L, Dong Z, Fang L, Luo Y, Wei Z, et al. 2019. OrthoVenn2: a web server for whole-genome comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Research 47:W52−W58 doi: 10.1093/nar/gkz333 CrossRef Google Scholar
[67]	í Kongsstovu S, Dahl HA, Gislason H, Homrum E, Jacobsen JA, et al. 2020. Identification of male heterogametic sex-determining regions on the Atlantic herring Clupea harengus genome. Journal of Fish Biology 97:190−201 doi: 10.1111/jfb.14349 CrossRef Google Scholar
[68]	Louro B, De Moro G, Garcia C, Cox CJ, Veríssimo A, et al. 2019. A haplotype-resolved draft genome of the European sardine (Sardina pilchardus). GigaScience 8:giz059 doi: 10.1093/gigascience/giz059 CrossRef Google Scholar
[69]	Mohindra V, Dangi T, Tripathi RK, Kumar R, Singh RK, et al. 2019. Draft genome assembly of Tenualosa ilisha, Hilsa shad, provides resource for osmoregulation studies. Scientific Reports 9:16511 doi: 10.1038/s41598-019-52603-w CrossRef Google Scholar
[70]	Leonard JBK, Norieka JF, Kynard B, McCormick SD. 1999. Metabolic rates in an anadromous clupeid, the American shad (Alosa sapidissima). Journal of Comparative Physiology B 169:287−95 doi: 10.1007/s003600050223 CrossRef Google Scholar
[71]	Liu QH, Zheng YH, Meng H, Wang WJ. 2017. The breeding risks and countermeasures of American shad. Scientific Fish Farming 11:1−3 Google Scholar
[72]	Barry T, Kynard B. 1986. Attraction of adult American shad to fish lifts at Holyoke Dam, Connecticut River. North American Journal of Fisheries Management 6:233−41 doi: 10.1577/1548-8659(1986)6<233:AOAAST>2.0.CO;2 CrossRef Google Scholar

About this article

Cite this article

Zheng Y, Deng S, Du J, Liu Q. 2025. A chromosomal-level genome assembly of the American shad: insights into phylogenetic relationships. Genomics Communications 2: e006 doi: 10.48130/gcomm-0025-0005

Zheng Y, Deng S, Du J, Liu Q. 2025. A chromosomal-level genome assembly of the American shad: insights into phylogenetic relationships. Genomics Communications 2: e006 doi: 10.48130/gcomm-0025-0005

Figures(4) / Tables(4)

Download PDF

Article Metrics

Article views(4507) PDF downloads(1634)

Other Articles By Authors

on this site
on Google Scholar

HTML

Introduction

American shad (Alosa sapidissima, belonging to the Clupeidae family) is a eurythermal, anadromous fish species^[1] that is widely distributed in the Pacific and Atlantic basins of North America^[2]. It was introduced into China for aquaculture in the early 21^st century^[3] and successfully cultured in indoor freshwater systems. At present, the American shad has become one of the most expensive aquaculture species in the Chinese fish market^[4]. American shad is similar to Chinese shad (Tenualosa reeesi) in morphology, ecological habits, digestive systems, meat quality, and nutritional value. These two fishes belong to the same subfamily (Alosinae), but different genera.

The biological characteristics of the American shad have been studied to improve productivity, such as reproductive features^[3,4], salinity adaptation^[5,6], and larval development^[7]. They are stenothermal fish with a small temperature range for survival^[1]. They are extremely sensitive to environmental changes. For example, they can receive high-frequency ultrasonic signals such as 180 kHz that most fish cannot, which causes a stress response^[8,9]. They also have small gills and require constant and rapid swimming to obtain oxygen, resulting in large demand for dissolved oxygen concentrations in the water. All of these properties mean that the American shad has to maintain a high basal metabolic rate^[10,11]. Also, they are sensitive to salinity which affects their adaptability to temperature^[5,6,12]. These combined special biological traits make this species very difficult to culture. Consequently, it is inevitably important to investigate its genome and transcriptome to further understand the species' genetic background.

This study presents a high-quality, chromosomal-level genome assembly of American shad using the PacBio platform. Six tissues (heart, liver, intestines, gills, muscle, and testis) of the same organism were performed for RNA-Seq and the important genes were predicted using our comprehensive transcriptomic data. The high-quality, chromosomal-level genome, and transcriptome of American shad will provide an invaluable and systematic resource for further studies on the its biology, evolution, germplasm conservation, and genetic selection.

Discussion

This study presents a high-quality genome assembly (920 Mb, 24 chromosomes) of the American shad (A. sapidissima). Consistent with previous studies, the analysis suggests that American shad has the closest relationship with European pilchard (S. pilchardus), with similar genome sizes, although they belong to different subfamilies^[67,68]. However, the genome sizes of the species in the same subfamily of the American shad have larger differences, such as ilish (Tenualosa ilisha), which has a genome size of 762.5 Mb^[69].

Interestingly, the genome data revealed many genes related to metabolic processes, especially energy supply molecules and pathways, were significantly enriched in positively selected and/or tandem duplicated genes, such as lipid catabolic process, lipoprotein metabolic process, and L-arabinose metabolic process. Also, the movement-related genes were enriched. All of these are concordant with the biological characteristics of the American shad. For example, the American shad, a high-speed swimmer in days and nights, often overreacts to noise, lightning and some other environmental stimulations (such as low dissolved oxygen), which demand high metabolic rates, and energy supply^[4,10,70]. The GO MF enrichment of the terms indicated the development of the nervous system was enhanced, and redox reactions may be stronger in American shad, which may explain some of the unique characteristics of the American shad in aquaculture systems. This species is very sensitive to environmental disturbances, such as sudden noise and flashing lights, which induces high-speed, frantic, disorderly swimming resulting in collisions with each other in the culture system, resulting in serious body damage, and even death^[10,71]. It also often overreacts to handling and netting, which easily results in scale loss, physiological stress, and asphyxiation death^[72]. Therefore, American shad is considered to be one of the most sensitive species with high metabolic rates which leads to difficulties in its culture management, and needs special considerations to domesticate carefully in an aquaculture system.

Conclusions

This study presents a high-quality, chromosomal-level genome assembly of American shad using the PacBio platform and tissue-specific transcriptomic data. Comparative genome analysis indicated the closest relationship between the American shad and European pilchard. GO enrichment analysis of tandem duplicated genes and positively selected genes showed that terms belonging to biological process, related with favour basic life activities such as gene expression, biosynthesis, metabolism, cell homeostasis maintenance, especially motion and immunisation were kept, which agreed with the physiological characteristics of American shad.

Item	Assembly statistics
Total length (bp)	920,245,501
Number of contigs	2,302
Number of scaffolds	274
Contig N50 (bp)	1,152,427
Scaffold N50 (bp)	37,994,265
Maximum contig length (bp)	7,309,698
GC content (%)	43.19
Completely retrieved single-copy orthologs (%)	93.8
Complete and single-copy BUSCOs (%)	89.60
Complete and duplicated BUSCOs (%)	4.20
Partially retrieved single-copy orthologs (%)	2.5

Class	Count	Length (bp)	Percentage
DIRS	710	242765	0.03%
DNA	−	−	−
DTA	508,133	116,314,889	12.64%
DTC	225,315	31,080,936	3.38%
DTH	26,565	5,057,458	0.55%
DTM	138,994	22,287,764	2.42%
DTT	7,397	1,097,116	0.12%
Helitron	79,770	11,666,059	1.26%
LINE	3,013	1,943,108	0.21%
LTR	−	−	−
Copia	2,494	714,658	0.08%
Gypsy	159,707	50,775,360	5.52%
Unknown	302,822	65,763,801	7.15%
MITE	−	−	−
DTA	46,637	5,802,613	0.63%
DTC	5,594	551,629	0.06%
DTH	4,346	371,579	0.04%
DTM	92,139	14,049,594	1.53%
DTT	418	31,172	0.00%
Maverick	77	99,461	0.01%
Penelope	830	867,362	0.09%
TIR	−	−	−
Tc1-Mariner	470	99,441	0.01%
Unknown	123,728	26,970,622	2.93%
Total interspersed	1,729,159	355,787,387	38.66%
Low complexity	74,138	6,172,731	0.67%
Simple repeat	866,409	51,709,171	5.62%
Total	2,669,706	413,669,289	44.95%
DIRS, Dictyostelium intermediate repeat sequence; DTA, hAT superfamily; DTC, CACTA superfamily; DTH, PIF-Harbinger superfamily; DTM, Mutator superfamily; DTT, Tc1-Mariner superfamily; LINE, long interspersed nuclear elements; LTR, long terminal repeat; MITE, miniature inverted-repeat transposable elements; and TIR, terminal inverted repeats.

Chromosome	Size (Mb)	Gap no.	GC%	Gene	Protein	tRNA	miRNA	snRNA	snoRNA	lncRNA
Chr1	54.79	70	43.00	1,841	1,927	108	19	412	36	97
Chr2	44.56	97	43.03	1,252	1,314	35	18	370	5	69
Chr3	42.16	90	42.79	1,212	1,246	63	8	5	24	50
Chr4	41.48	104	42.50	1,396	1,444	39	8	2	12	72
Chr5	41.39	59	43.14	1,477	1,527	66	14	5	28	90
Chr6	40.62	77	43.62	1,246	1,315	31	29	3	11	79
Chr7	40.49	73	42.80	1,369	1,423	1,108	23	89	15	90
Chr8	39.44	93	43.28	1,178	1,241	58	9	3	7	73
Chr9	39.36	54	42.85	1,258	1,301	89	16	284	9	95
Chr10	38.18	77	43.27	1,199	1,246	134	8	4	27	83
Chr11	37.99	73	43.11	1,240	1,299	65	18	7	19	86
Chr12	37.72	74	43.28	1,297	1,354	2,247	12	7	6	71
Chr13	37.34	80	43.05	1,143	1,189	89	13	9	33	81
Chr14	36.50	56	43.05	1,028	1,055	1,886	10	9	22	65
Chr15	35.68	108	43.60	1,054	1,096	1,111	9	2	2	73
Chr16	35.61	65	43.05	949	986	31	5	1	3	65
Chr17	34.99	96	43.28	1,073	1,106	102	15	1	9	68
Chr18	34.23	127	43.25	962	1,006	69	548	4	132	45
Chr19	31.23	117	43.33	858	873	99	10	214	3	50
Chr20	31.03	74	43.21	850	877	79	13	6	12	45
Chr21	30.88	101	43.27	994	1,027	33	8	4	15	59
Chr22	30.85	85	43.30	1,014	1,042	41	8	2	15	64
Chr23	30.44	76	43.47	1,021	1,085	89	3	2	12	81
Chr24	30.28	102	43.14	1,055	1,079	66	10	6	11	56
Unplaced	22.88	0	45.29	154	154	590	750	1,642	0	3
Total	920.24	2,028	43.19	28,120	29,212	8,328	1,584	3,093	468	1,710

Item	Statistics
Number of genes	479,036
Number of genes in orthogroups	453,950
Number of unassigned genes	25,086
Percentage of genes in orthogroups	94.8%
Percentage of unassigned genes	5.2%
Number of orthogroups	20,706
Number of species-specific orthogroups	286
Number of genes in species-specific orthogroups	1,204
Percentage of genes in species-specific orthogroups)	0.3%
Mean orthogroup size	21.9
Median orthogroup size	19
G50 (assigned genes)	26
G50 (all genes)	23
O50 (assigned genes)	5,190
O50 (all genes)	5,701
Number of orthogroups with all species present	7,489
Number of single-copy orthogroups	250

{{lists.name}}

A chromosomal-level genome assembly of the American shad: insights into phylogenetic relationships