Comprehensive identification and characterization of simple sequence repeats based on the whole-genome sequences of 14 forest and fruit trees

Xiaoming Song; Nan Li; Yuanyuan Guo; Yun Bai; Tong Wu; Tong Yu; Shuyan Feng; Yu Zhang; Zhiyuan Wang; Zhuo Liu; Hao Lin; Xiaoming Song; Nan Li; Yuanyuan Guo; Yun Bai; Tong Wu; Tong Yu; Shuyan Feng; Yu Zhang; Zhiyuan Wang; Zhuo Liu; Hao Lin

doi:10.48130/FR-2021-0007

2021 Volume 1

Article Contents

Next Previous

ARTICLE Open Access

Comprehensive identification and characterization of simple sequence repeats based on the whole-genome sequences of 14 forest and fruit trees

1.
School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
2.
School of Life Sciences/School of Economics, North China University of Science and Technology, Tangshan, Hebei 063210, China
3.
Food Science and Technology Department, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
These authors contributed equally: Xiaoming Song, Nan Li

More Information

Corresponding authors: songxm@ncst.edu.cn; hlin@uestc.edu.cn

Received Date: 29 December 2020
Accepted Date: 13 April 2021
Published Online: 21 April 2021
Forestry Research 1, Article number: 7 (2021) | Cite this article

Abstract

Simple sequence repeats (SSRs) are popular and important molecular markers that exist widely in plants. Here, we conducted a comprehensive identification and comparative analysis of SSRs in 14 tree species. A total of 16, 298 SSRs were identified from 429, 449 genes, and primers were successfully designed for 99.44% of the identified SSRs. Our analysis indicated that tri-nucleotide SSRs were the most abundant, with an average of ~834 per species. Functional enrichment analysis by combining SSR-containing genes in all species, revealed 50 significantly enriched terms, with most belonging to transcription factor families associated with plant development and abiotic stresses such as Myeloblastosis_DNA-bind_4 (Myb_DNA-bind_4), APETALA2 (AP2), and Fantastic Four meristem regulator (FAF). Further functional enrichment analysis showed that 48 terms related to abiotic stress regulation and floral development were significantly enriched in ten species, whereas no significantly enriched terms were found in four species. Interestingly, the largest number of enriched terms was detected in Citrus sinensis (L.) Osbeck, accounting for 54.17% of all significantly enriched functional terms. Finally, we analyzed AP2 and trihelix gene families (Myb_DNA-bind_4) due to their significant enrichment in SSR-containing genes. The results indicated that whole-genome duplication (WGD) and whole genome triplication (WGT) might have played major roles in the expansion of the AP2 gene family but only slightly affected the expansion of the trihelix gene family during evolution. In conclusion, the identification and comprehensive characterization of SSR markers will greatly facilitate future comparative genomics and functional genomics studies.
- Simple sequence repeat,
- Comparative analysis,
- Functional enrichment analysis,
- Trihelix gene family,
- AP2 gene family

Supplementary Information

Supplemental Tables S1 The nine types of simple sequence repeats (SSRs) identified in 14 species used in this study.
Supplemental Table S2 The detailed information of simple sequence repeats (SSRs) identified in 14 species.
Supplemental Table S3 The statistics of pfam annotation for whole-genome genes and SSR related genes in each of 14 species.
Supplemental Table S4 The enrichment analysis of pfam annotation for combing whole-genome genes and SSR related genes of all 14 species. The significant enriched terms were defined with the q-value < 0.01 and fold-change >=2.
Supplemental Table S5 The enrichment analysis of pfam annotation for whole-genome genes and SSR related genes in each of 14 species. The significant enriched terms were defined with the q-value < 0.01 and fold-change >=2.
Supplemental Table S6 The AP2 transcription factor (PF00847) and related SSR information in 14 species.
Supplemental Table S7 The Myb transcription factor (Myb_DNA-bind_4) and related SSR information in 14 species.
Supplemental Dataset 1 The information of SSRs and corresponding primer sequences of 14 species.
Supplemental Fig. S1 The phylogenetic analysis of AP2 gene family in 14 species. The maximum-likelihood (ML) trees were generated based on the amino acid sequences of AP2 gene family. The phylogenetic tree was constructed using the FastTree software with 1000 bootstrap repeats in 14 species. The bootstrap values above 40% were shown on each branch by the different size of circle dots.
Supplemental Fig. S2 The phylogenetic analysis of Myb4 gene family in 14 species. The maximum-likelihood (ML) trees were generated based on the amino acid sequences of Myb_DNA-bind_4 gene family. The phylogenetic tree was constructed using the FastTree software with 1000 bootstrap repeats in 14 species. The bootstrap values above 40% were shown on each branch by the different size of circle dots.

Rights and permissions
Copyright: © 2021 by the author(s). Exclusive Licensee Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Garrido-Cardenas JA, Mesa-Valle C, Manzano-Agugliaro F. 2018. Trends in plant research using molecular markers. Planta 247:543−57 doi: 10.1007/s00425-017-2829-y CrossRef Google Scholar
[2]	Kalia RK, Rai MK, Kalia S, Singh R, Dhawan AK. 2011. Microsatellite markers: an overview of the recent progress in plants. Euphytica 177:309−34 doi: 10.1007/s10681-010-0286-9 CrossRef Google Scholar
[3]	Varshney RK, Graner A, Sorrells ME. 2005. Genic microsatellite markers in plants: features and applications. Trends in Biotechnology 23:48−55 doi: 10.1016/j.tibtech.2004.11.005 CrossRef Google Scholar
[4]	Vieira ML, Santini L, Diniz AL, Munhoz Cde F. 2016. Microsatellite markers: what they mean and why they are so useful. Genetics and Molecular Biology 39:312−28 doi: 10.1590/1678-4685-GMB-2016-0027 CrossRef Google Scholar
[5]	Ranade SS, Lin YC, Van de Peer Y, García-Gil MR. 2015. Comparative in silico analysis of SSRs in coding regions of high confidence predicted genes in Norway spruce (Picea abies) and Loblolly pine (Pinus taeda). BMC Genetics 16:149 https://bmcgenomdata.biomedcentral.com/articles/10.1186/s12863-015-0304-y
[6]	Zane L, Bargelloni L, Patarnello T. 2002. Strategies for microsatellite isolation: a review. Molecular ecology 11:1−16 doi: 10.1046/j.0962-1083.2001.01418.x CrossRef Google Scholar
[7]	Mayer C, Leese F, Tollrian R. 2010. Genome-wide analysis of tandem repeats in Daphnia pulex - a comparative approach. BMC Genomics 11:277 doi: 10.1186/1471-2164-11-277 CrossRef Google Scholar
[8]	da Maia LC, de Souza VQ, Kopp MM, de Carvalho FI, de Oliveira AC. 2009. Tandem repeat distribution of gene transcripts in three plant families. Genetics and Molecular Biology 32:822−33 doi: 10.1590/S1415-47572009005000091 CrossRef Google Scholar
[9]	von Stackelberg M, Rensing SA, Reski R. 2006. Identification of genic moss SSR markers and a comparative analysis of twenty-four algal and plant gene indices reveal species-specific rather than group-specific characteristics of microsatellites. BMC Plant Biology 6:9 doi: 10.1186/1471-2229-6-9 CrossRef Google Scholar
[10]	Xu J, Liu L, Xu Y, Chen C, Rong T, et al. 2013. Development and characterization of simple sequence repeat markers providing genome-wide coverage and high resolution in maize. DNA Research 20:497−509 doi: 10.1093/dnares/dst026 CrossRef Google Scholar
[11]	Zhang L, Yuan D, Yu S, Li Z, Cao Y, et al. 2004. Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana. Bioinformatics 20:1081−6 doi: 10.1093/bioinformatics/bth043 CrossRef Google Scholar
[12]	Legendre M, Pochet N, Pak T, Verstrepen KJ. 2007. Sequence-based estimation of minisatellite and microsatellite repeat variability. Genome Research 17:1787−96 doi: 10.1101/gr.6554007 CrossRef Google Scholar
[13]	Metzgar D, Liu L, Hansen C, Dybvig K, Wills C. 2002. Domain-level differences in microsatellite distribution and content result from different relative rates of insertion and deletion mutations. Genome Research 12:408−13 doi: 10.1101/gr.198602 CrossRef Google Scholar
[14]	Li Y, Korol AB, Fahima T, Beiles A, Nevo E. 2002. Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Molecular Ecology 11:2453−65 doi: 10.1046/j.1365-294X.2002.01643.x CrossRef Google Scholar
[15]	Nevo E. 2001. Evolution of genome-phenome diversity under environmental stress. PNAS 98:6233−40 doi: 10.1073/pnas.101109298 CrossRef Google Scholar
[16]	Gao C, Ren X, Mason AS, Li J, Wang W, et al. 2013. Revisiting an important component of plant genomes: microsatellites. Functional Plant Biology 40:645−61 doi: 10.1071/FP12325 CrossRef Google Scholar
[17]	Zalapa JE, Cuevas H, Zhu H, Steffan S, Senalik D, et al. 2012. Using next-generation sequencing approaches to isolate simple sequence repeat (SSR) loci in the plant sciences. American Journal of Botany 99:193−208 doi: 10.3732/ajb.1100394 CrossRef Google Scholar
[18]	Taheri S, Lee Abdullah T, Yusop MR, Hanafi MM, Sahebi M, et al. 2018. Mining and Development of Novel SSR Markers Using Next Generation Sequencing (NGS) Data in Plants. Molecules 23:399 doi: 10.3390/molecules23020399 CrossRef Google Scholar
[19]	Šarhanová P, Pfanzelt S, Brandt R, Himmelbach A, Blattner FR. 2018. SSR-seq: Genotyping of microsatellites using next-generation sequencing reveals higher level of polymorphism as compared to traditional fragment size scoring. Ecology and Evolution 8:10817−33 doi: 10.1002/ece3.4533 CrossRef Google Scholar
[20]	Biswas MK, Xu Q, Mayer C, Deng X. 2014. Genome wide characterization of short tandem repeat markers in sweet orange (Citrus sinensis). PLoS One 9:e104182 doi: 10.1371/journal.pone.0104182 CrossRef Google Scholar
[21]	Liang M, Yang X, Li H, Su S, Yi H, et al. 2015. De novo transcriptome assembly of pummelo and molecular marker development. PLoS One 10:e0120615 doi: 10.1371/journal.pone.0120615 CrossRef Google Scholar
[22]	Jia H, Yang H, Sun P, Li J, Zhang J, et al. 2016. De novo transcriptome assembly, development of EST-SSR markers and population genetic analyses for the desert biomass willow, Salix psammophila. Scientific Reports 6:39591 doi: 10.1038/srep39591 CrossRef Google Scholar
[23]	Tian W, Paudel D, Vendrame W, Wang J. 2017. Enriching Genomic Resources and Marker Development from Transcript Sequences of Jatropha curcas for Microgravity Studies. International Journal of Genomics 2017:8614160 doi: 10.1155/2017/8614160 CrossRef Google Scholar
[24]	El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, et al. 2019. The Pfam protein families database in 2019. Nucleic Acids Research 47:D427−D432 doi: 10.1093/nar/gky995 CrossRef Google Scholar
[25]	Nagano Y. 2000. Several features of the GT-factor trihelix domain resemble those of the Myb DNA-binding domain. Plant Physiology 124:491−4 doi: 10.1104/pp.124.2.491 CrossRef Google Scholar
[26]	Song X, Li Y, Hou X. 2013. Genome-wide analysis of the AP2/ERF transcription factor superfamily in Chinese cabbage (Brassica rapa ssp. pekinensis). BMC Genomics 14:573 doi: 10.1186/1471-2164-14-573 CrossRef Google Scholar
[27]	Liu M, Sun W, Ma Z, Zheng T, Huang L, et al. 2019. Genome-wide investigation of the AP2/ERF gene family in tartary buckwheat (Fagopyum Tataricum). BMC Plant Biology 19:84 doi: 10.1186/s12870-019-1681-6 CrossRef Google Scholar
[28]	Zhao Y, Ma R, Xu D, Bi H, Xia Z, Peng H. 2019. Genome-Wide Identification and Analysis of the AP2 Transcription Factor Gene Family in Wheat (Triticum aestivum L.). Frontiers In Plant Science 10:1286 doi: 10.3389/fpls.2019.01286 CrossRef Google Scholar
[29]	Aukerman MJ, Sakai H. 2003. Regulation of flowering time and floral organ identity by a MicroRNA and its APETALA2-like target genes. Plant Cell 15:2730−41 doi: 10.1105/tpc.016238 CrossRef Google Scholar
[30]	Yamamoto T, Terakami S. 2016. Genomics of pear and other Rosaceae fruit trees. Breeding Science 66:148−59 doi: 10.1270/jsbbs.66.148 CrossRef Google Scholar
[31]	Carrasco B, Meisel L, Gebauer M, Garcia-Gonzales R, Silva H. 2013. Breeding in peach, cherry and plum: from a tissue culture, genetic, transcriptomic and genomic perspective. Biological Research 46:219−30 doi: 10.4067/S0716-97602013000300001 CrossRef Google Scholar
[32]	Sumathi M, Yasodha R. 2014. Microsatellite resources of Eucalyptus: current status and future perspectives. Botanical Studies 55:73 doi: 10.1186/s40529-014-0073-3 CrossRef Google Scholar
[33]	Riechmann JL, Heard J, Martin G, Reuber L, Jiang C, et al. 2000. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290:2105−10 doi: 10.1126/science.290.5499.2105 CrossRef Google Scholar
[34]	Song X, Wang J, Sun P, Ma X, Yang Q, et al. 2020. Preferential gene retention increases the robustness of cold regulation in Brassicaceae and other plants after polyploidization. Horticulture Research 7:20 doi: 10.1038/s41438-020-0253-0 CrossRef Google Scholar
[35]	Song X, Nie F, Chen W, Ma X, Gong K, et al. 2020. Coriander Genomics Database: a genomic, transcriptomic, and metabolic database for coriander. Horticulture Research 7:55 doi: 10.1038/s41438-020-0261-0 CrossRef Google Scholar
[36]	Song X, Liu G, Duan W, Liu T, Huang Z, et al. 2014. Genome-wide identification, classification and expression analysis of the heat shock transcription factor family in Chinese cabbage. Molecular Genetics and Genomics 289:541−51 doi: 10.1007/s00438-014-0833-5 CrossRef Google Scholar
[37]	Agarwal PK, Agarwal P, Reddy MK, Sopory SK. 2006. Role of DREB transcription factors in abiotic and biotic stress tolerance in plants. Plant Cell Reports 25:1263−74 doi: 10.1007/s00299-006-0204-8 CrossRef Google Scholar
[38]	Baillo EH, Kimotho RN, Zhang Z, Xu P. 2019. Transcription Factors Associated with Abiotic and Biotic Stress Tolerance and Their Potential for Crops Improvement. Genes (Basel) 10:771 doi: 10.3390/genes10100771 CrossRef Google Scholar
[39]	Song X, Sun P, Yuan J, Gong K, Li N, et al. 2020. The celery genome sequence reveals sequential paleo-polyploidizations, karyotype evolution and resistance gene reduction in apiales. Plant Biotechnology Journal doi: 10.1111/pbi.13499 CrossRef Google Scholar
[40]	Kaplan-Levy RN, Brewer PB, Quon T, Smyth DR. 2012. The trihelix family of transcription factors − light, stress and development. Trends in Plant Science 17:163−71 doi: 10.1016/j.tplants.2011.12.002 CrossRef Google Scholar
[41]	Nagano Y, Inaba T, Furuhashi H, Sasaki Y. 2001. Trihelix DNA-binding protein with specificities for two distinct cis-elements: both important for light down-regulated and dark-inducible gene expression in higher plants. Journal of Biological Chemistry 276:22238−43 doi: 10.1074/jbc.M102474200 CrossRef Google Scholar
[42]	Wang Z, Liu Q, Wang H, Zhang H, Xu X, et al. 2016. Comprehensive analysis of trihelix genes and their expression under biotic and abiotic stresses in Populus trichocarpa. Scientific Reports 6:36274 doi: 10.1038/srep36274 CrossRef Google Scholar
[43]	Xi J, Qiu Y, Du L, Poovaiah BW. 2012. Plant-specific trihelix transcription factor AtGT2L interacts with calcium/calmodulin and responds to cold and salt stresses. Plant Science 185-186:274−80 doi: 10.1016/j.plantsci.2011.11.013 CrossRef Google Scholar
[44]	Fang Y, Xie K, Hou X, Hu H, Xiong L. 2010. Systematic analysis of GT factor family of rice reveals a novel subfamily involved in stress responses. Molecular Genetics and Genomics 283:157−69 doi: 10.1007/s00438-009-0507-x CrossRef Google Scholar
[45]	Brewer PB, Howles PA, Dorian K, Griffith ME, Ishida T, et al. 2004. PETAL LOSS, a trihelix transcription factor gene, regulates perianth architecture in the Arabidopsis flower. Development 131:4035−45 doi: 10.1242/dev.01279 CrossRef Google Scholar
[46]	Li J, Zhang M, Sun J, Mao X, Wang J, et al. 2019. Genome-Wide Characterization and Identification of Trihelix Transcription Factor and Expression Profiling in Response to Abiotic Stresses in Rice (Oryza sativa L.). International Journal of Molecular Sciences 20:251 doi: 10.3390/ijms20020251 CrossRef Google Scholar
[47]	Yu C, Song L, Song J, Ouyang B, Guo L, et al. 2018. ShCIGT, a Trihelix family gene, mediates cold and drought tolerance by interacting with SnRK1 in tomato. Plant Science 270:140−9 doi: 10.1016/j.plantsci.2018.02.012 CrossRef Google Scholar
[48]	Liu W, Zhang Y, Li W, Lin Y, Wang C, et al. 2020. Genome-wide characterization and expression analysis of soybean trihelix gene family. PeerJ 8:e8753 doi: 10.7717/peerj.8753 CrossRef Google Scholar
[49]	Xiao J, Hu R, Gu T, Han J, Qiu D, et al. 2019. Genome-wide identification and expression profiling of trihelix gene family under abiotic stresses in wheat. BMC Genomics 20:287 doi: 10.1186/s12864-019-5632-2 CrossRef Google Scholar
[50]	Licausi F, Ohme-Takagi M, Perata P. 2013. APETALA2/Ethylene Responsive Factor (AP2/ERF) transcription factors: mediators of stress responses and developmental programs. New Phytologist 199:639−49 doi: 10.1111/nph.12291 CrossRef Google Scholar
[51]	Li M, Xu Z, Huang Y, Tian C, Wang F, et al. 2015. Genome-wide analysis of AP2/ERF transcription factors in carrot (Daucus carota L.) reveals evolution and expression profiles under abiotic stress. Molecular Genetics and Genomics 290:2049−61 doi: 10.1007/s00438-015-1061-3 CrossRef Google Scholar
[52]	Jofuku KD, den Boer BG, Van Montagu M, Okamuro JK. 1994. Control of Arabidopsis flower and seed development by the homeotic gene APETALA2. Plant Cell 6:1211−25 doi: 10.1105/tpc.6.9.1211 CrossRef Google Scholar
[53]	Song X, Wang J, Ma X, Li Y, Lei T, et al. 2016. Origination, expansion, evolutionary trajectory, and expression bias of AP2/ERF superfamily in Brassica napus. Frontiers in Plant Science 7:1186 doi: 10.3389/fpls.2016.01186 CrossRef Google Scholar
[54]	Irish VF, Sussex IM. 1990. Function of the apetala-1 gene during Arabidopsis floral development. The Plant Cell 2:741−53 doi: 10.1105/tpc.2.8.741 CrossRef Google Scholar
[55]	Shannon S, Meeks-Wagner DR. 1993. Genetic Interactions That Regulate Inflorescence Development in Arabidopsis. The Plant Cell 5:639−55 doi: 10.2307/3869807 CrossRef Google Scholar
[56]	Sun X, Shantharaj D, Kang X, Ni M. 2010. Transcriptional and hormonal signaling control of Arabidopsis seed development. Current Opinion in Plant Biology 13:611−20 doi: 10.1016/j.pbi.2010.08.009 CrossRef Google Scholar
[57]	Licausi F, Giorgi FM, Zenoni S, Osti F, Pezzotti M, et al. 2010. Genomic and transcriptomic analysis of the AP2/ERF superfamily in Vitis vinifera. BMC Genomics 11:719 doi: 10.1186/1471-2164-11-719 CrossRef Google Scholar
[58]	Zhang C, Shangguan L, Ma R, Sun X, Tao R, et al. 2012. Genome-wide analysis of the AP2/ERF superfamily in peach (Prunus persica). Genetics and Molecular Research 11:4789−809 doi: 10.4238/2012.October.17.6 CrossRef Google Scholar
[59]	Zhuang J, Cai B, Peng R, Zhu B, Jin X, et al. 2008. Genome-wide analysis of the AP2/ERF gene family in Populus trichocarpa. Biochemical and Biophysical Research Communications 371:468−74 doi: 10.1016/j.bbrc.2008.04.087 CrossRef Google Scholar
[60]	Das Laha S, Dutta S, Schäffner AR, Das M. 2020. Gene duplication and stress genomics in Brassicas: Current understanding and future prospects. Journal of Plant Physiology 255:153293 doi: 10.1016/j.jplph.2020.153293 CrossRef Google Scholar
[61]	Schranz ME, Mohammadin S, Edger PP. 2012. Ancient whole genome duplications, novelty and diversification: the WGD Radiation Lag-Time Model. Current Opinion in Plant Biology 15:147−53 doi: 10.1016/j.pbi.2012.03.011 CrossRef Google Scholar
[62]	Soltis PS, Soltis DE. 2016. Ancient WGD events as drivers of key innovations in angiosperms. Current Opinion in Plant Biology 30:159−65 doi: 10.1016/j.pbi.2016.03.015 CrossRef Google Scholar
[63]	Beier S, Thiel T, Munch T, Scholz U, Mascher M. 2017. MISA-web: a web server for microsatellite prediction. Bioinformatics 33:2583−5 doi: 10.1093/bioinformatics/btx198 CrossRef Google Scholar
[64]	Song X, Ge T, Li Y, Hou X. 2015. Genome-wide identification of SSR and SNP markers from the non-heading Chinese cabbage for comparative genomic analyses. BMC Genomics 16:328 doi: 10.1186/s12864-015-1534-0 CrossRef Google Scholar
[65]	Rozen S, Skaletsky H. 2000. Primer3 on the WWW for general users and for biologist programmers. In Bioinformatics Methods and Protocols, Methods in Molecular Biology™, eds. Misener S, Krawetz SA. vol 132. Totowa, NJ: Humana Press. pp. 365−86 https://doi.org/10.1385/1-59259-192-2:365
[66]	Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 17:261−72 doi: 10.1038/s41592-019-0686-2 CrossRef Google Scholar
[67]	Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, et al. 2020. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Molecular Plant 13:1194−202 doi: 10.1016/j.molp.2020.06.009 CrossRef Google Scholar
[68]	Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, et al. 2014. Pfam: the protein families database. Nucleic Acids Res 42:D222−D230 doi: 10.1093/nar/gkt1223 CrossRef Google Scholar
[69]	Letunic I, Doerks T, Bork P. 2012. SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Research 40:D302−D305 doi: 10.1093/nar/gkr931 CrossRef Google Scholar
[70]	Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, Deweese-Scott C, et al. 2009. CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Research 37:D205−D210 doi: 10.1093/nar/gkn845 CrossRef Google Scholar
[71]	Song X, Liu T, Duan W, Ma Q, Ren J, et al. 2014. Genome-wide analysis of the GRAS gene family in Chinese cabbage (Brassica rapa ssp. pekinensis). Genomics 103:135−46 doi: 10.1016/j.ygeno.2013.12.004 CrossRef Google Scholar
[72]	Nakamura T, Yamada KD, Tomii K, Katoh K. 2018. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics 34:2490−2 doi: 10.1093/bioinformatics/bty121 CrossRef Google Scholar
[73]	Price MN, Dehal PS, Arkin AP. 2009. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Molecular Biology and Evolution 26:1641−50 doi: 10.1093/molbev/msp077 CrossRef Google Scholar
[74]	Letunic I, Bork P. 2019. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Research 47:W256−W259 doi: 10.1093/nar/gkz239 CrossRef Google Scholar
[75]	Stolzer M, Lai H, Xu M, Sathaye D, Vernot B, et al. 2012. Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinformatics 28:i409−i415 doi: 10.1093/bioinformatics/bts386 CrossRef Google Scholar
[76]	Wang T, Hu J, Ma X, Li C, Yang Q, et al. 2020. Identification, evolution and expression analyses of whole genome-wide TLP gene family in Brassica napus. BMC Genomics 21:264 doi: 10.1186/s12864-020-6678-x CrossRef Google Scholar
[77]	Song X, Ma X, Li C, Hu J, Yang Q, et al. 2018. Comprehensive analyses of the BES1 gene family in Brassica napus and examination of their evolutionary pattern in representative species. BMC Genomics 19:346 doi: 10.1186/s12864-018-4744-4 CrossRef Google Scholar

About this article

Cite this article

Song X, Li N, Guo Y, Bai Y, Wu T, et al. 2021. Comprehensive identification and characterization of simple sequence repeats based on the whole-genome sequences of 14 forest and fruit trees. Forestry Research 1: 7 doi: 10.48130/FR-2021-0007

doi: 10.48130/FR-2021-0007

Figures(5)

Download PDF

Article Metrics

Article views(4540) PDF downloads(754)

Other Articles By Authors

on this site
- Xiaoming Song
- Nan Li
- Yuanyuan Guo
- Yun Bai
- Tong Wu
- Tong Yu
- Shuyan Feng
- Yu Zhang
- Zhiyuan Wang
- Zhuo Liu
- Hao Lin
on Google Scholar
- Xiaoming Song
- Nan Li
- Yuanyuan Guo
- Yun Bai
- Tong Wu
- Tong Yu
- Shuyan Feng
- Yu Zhang
- Zhiyuan Wang
- Zhuo Liu
- Hao Lin

HTML

INTRODUCTION

Generally, forest and fruit trees undergo a long juvenile period before flowering or fruiting, making the tree breeding process extremely long and challenging. An important prerequisite for plant breeding is the understanding of genetic variation. Molecular markers are useful tools for plant improvement as they can detect existing mutations in the genome and decode the genetic control of important traits, such as disease resistance, abiotic stress tolerance, and fruit quality attributes, to shorten the time required for obtaining new varieties with superior quality.

Molecular markers are nucleotide sequences that can reveal the distribution of genes and the expression of phenotypic traits among individuals by analyzing DNA fragments that encompass different genetic information^[1]. Based on the detection method, molecular markers can be divided into three classes: hybrid-based markers, such as restriction fragment length polymorphisms (RFLPs); PCR-based markers, such as random amplified polymorphic DNA (RAPD) markers, simple sequence repeats (SSRs), and amplified fragment length polymorphism (AFLP) markers; and DNA sequence-based markers, such as single nucleotide polymorphisms (SNPs)^{[2, 3]}. Among these molecular markers, SSR markers or microsatellites, are one of the most commonly used markers in plant breeding, gene flow analyses and genetic diversity assessments^[4]. According to the arrangement of nucleotide(s) in the repeat unit, SSRs are classified as perfect, imperfect, and compound microsatellites^[5]. Perfect SSRs are defined as continuous repetitions without any interruption (e.g., (AG)₁₂), while repeated sequences in the imperfect SSR interrupted by different bases that are not repeated (e.g., (AG)₁₀TC(AG)₈). Compound SSRs contain two adjacent distinct SSRs (e.g., (AG)₁₀(TC)₈).

SSR markers are dispersed over the coding and non-coding regions of all prokaryotic and eukaryotic genomes analyzed to date^[6]. Analyses of the occurrence of microsatellites in some plant and animal species indicate no apparent association between genome size and SSR density^{[7, 8]}. SSR markers show high levels of variation in motif frequency and microsatellite class^[9]. Generally, coding sequences exhibit a relatively low SSR density, as a high mutation rate of SSRs may affect gene function^{[10, 11]}. SSRs in coding regions are mostly tri- and hexa-nucleotide SSRs and are assumed not to cause frame shift mutations, as they are multiples of three nucleotides^{[12, 13]}. Emerging evidence suggests that SSRs may regulate gene transcription, translation, DNA methylation, mRNA stability, chromatin structure, and metabolic activities^[14−16].

With the rapid development of next-generation sequencing (NGS), it is now possible to screen SSR markers in different species in a more efficient and cost-effective way^[17−19]. In fact, SSR markers have been developed based on the genome sequences of several trees such as C. sinensis, Citrus maxima Merr., Jatropha curcas L., and Salix brachycarpa Nutt.^{[18, 20−23]}. In the present study, we systematically analyzed and compared the characteristics of SSR markers based on the released genome sequences of 14 trees, including ten eudicots, one monocot, one basal angiosperm, one gymnosperm, and one Lycopodiophyta species. Moreover, the potential functions of SSR-containing sequences were further investigated based on enriched Gene Ontology (GO) annotations. This research deepens our understanding of the characteristics of SSRs and their potential biological functions in trees, thereby providing new information for future breeding programs.

DISCUSSION

In this study, we developed SSR markers from all genes of whole-genomes in 14 tree species. Unlike crops, the breeding of fruit and forest trees is still in its infancy due to their long breeding cycles, complex genetic structures, and lack of genomic and functional information. In general, an improvement in breeding precision and a shortening of the breeding cycle are two major challenges for the genetic improvement of perennial woody plants. The selection of new varieties by traditional breeding methods is insufficient. With advancements in molecular biology, SSR markers provide a powerful means for directional genetic manipulation and improvement of tree species^[30−32].

Molecular marker assisted selection (MAS) combines modern molecular biology and traditional breeding. SSR markers can be used to select breeding materials at the DNA level to improve yield, quality, and resistance in tree species. The rapid development of high-throughput sequencing technology, and the substantial reduction in sequencing cost has promoted the genotyping of large-scale mapping populations, allowing for the construction of high-density genetic linkage maps^{[17, 18]}. Furthermore, the whole genomes of many tree species have been sequenced and released, which facilitated the identification of SSR markers for all genes of species. Developing SSR markers that closely associate with functional genes can help select individuals with a desired phenotype at an early stage of tree growth, significantly improving breeding efficiency. Here, a total of 16,298 SSRs were identified in 429,449 genes of 14 representative trees. In addition, we successfully designed primers for 16,081 SSRs. Therefore, these resources will promote molecular marker assisted selection applied in tree breeding.

Furthermore, several significantly enriched terms were detected in SSR-containing genes, and most enriched functional terms were transcription factors (TFs). Transcription regulation of gene expression plays important roles in many biological processes such as cell morphogenesis, signal transduction, and responses to environmental stress^[33−35]. Plant growth and productivity are greatly threatened by environmental biotic and abiotic factors^{[34, 36]}. To adapt to environmental changes, plants have evolved a large number of TFs to combat adverse effects^[37−39]. In this study, SSR markers were found to be significantly enriched in TFs associated with abiotic stress and floral development, and they included members of the Myb_DNA-bind 4, AP2, TCP, and WRKY families. The large number of TF families enriched for SSRs might have been a result of the various environmental conditions these species live in, causing the evolution of different TF families to cope with different adverse environmental factors.

The Myb_DNA-bind 4 was also known as the trihelix gene family^[25]. The trihelix proteins were one of the earliest TF families found in plants but have attracted attention only recently^{[40, 41]}. They were first classified as GT factors and further classified into five clades, GT-1, GT-2, SH4, GTg, and SIP1^[40]. Among these 14 species, the trihelix gene family was only reported in P. trichocarpa at the whole-genome level ^[42]. Here, we detected 471 trihelix family genes from the whole-genome of 14 trees, ranging from 10 (J. curcas) to 58 (P. trichocarpa). Until now, the function of trihelix family genes has been investigated only in some model plants or major crops, such as Arabidopsis, tomato, rice, soybean, and wheat^[43−49]. Besides a role in regulating the development of flowers, stomata, embryos, and seeds, recent studies have found that some members of the trihelix gene family can respond to biotic and abiotic stresses such as disease, salt, drought, and cold stress^[40]. Among forest trees, there are only a few studies on trihelix family gene function in P. trichocarpa, while it was rarely reported in other species^[42]. In P. trichocarpa, some trihelix genes are responsive to osmotic stress and can be induced by phytohormones, including abscisic acid, salicylic acid, and methyl jasmonate^[42]. Moreover, the inhibition of PtrGT10 expression enhances the scavenging ability of reactive oxygen species and reduces cell death^[42]. Here, we conducted systematic comparative analyses and phylogenetic analyses for these family genes in P. trichocarpa and other trees. Based on these analyses, we easily obtained the homologous genes between P. trichocarpa and each of other examined trees. Therefore, the previous studies of P. trichocarpa genes provide a good reference for the functional studies of homologous trihelix family genes in other trees. The functions of trihelix genes are becoming clear with the identification and characterization of more members in this family. Therefore, our study provides rich gene resources for the functional research of trihelix family genes in trees. Especially, trihelix family genes that contain SSRs were significantly enriched in four species, including E. guineensis, P. abies, S. purpurea, and S. moellendorffii. However, there are few reports on this gene family in these four species. Therefore, this study lays the foundation for future studies on the function of trihelix family genes in these species.

SSR markers were also significantly enriched in AP2 family genes that are associated with abiotic stress responses. Members of the AP2 gene family have different functions in regulating plant development and various stress responses^{[50, 51]}. The AP2 family contains one or two tandem AP2 domains, which shows a relatively high similarity between different genes^{[52, 53]}. Members of the AP2 family are expressed primarily in young organs and function as key regulators of plant growth and development, including floral meristem establishment, floral organ identity specification, the regulation of floral homeotic gene expression, and the regulation of ovule development^[54−56]. Among 14 tree species, the AP2 gene family has been reported in V. vinifera, P. persica, and P. trichocarpa at the whole-genome level^[57−59]. Here, we detected 1,649 AP2 family genes from the whole-genome of 14 trees, ranging from 25 (J. curcas) to 222 (S. purpurea). Our results show that AP2 family genes that contain SSRs were significantly enriched in four species, including C. papaya, C. canephora, P. trichocarpa, and S. purpurea. A previous study implied that AP2 family genes play important roles in fruit growth and development in peach^[58]. Furthermore, we conducted a phylogenetic reconstruction using all AP2 family genes of 14 species, which provides good guidance for functional studies of AP2 family genes based on the homologous relationship with model species. Generally, whole genome duplications (WGD) or triplications (WGT) are the major sources for the diversification and specification of gene function^[60−62]. Our results suggest that WGD and WGT events may play important roles in the expansion of the AP2 gene family, which further contributed to the evolution of these family genes. Therefore, these findings also provide new insights into the evolution and functions of SSR-containing genes for future comparative and functional genomics studies of these species and other related tree species.

CONCLUSION

In summary, we identified SSRs from whole-genomes of 14 tree species and investigated the characteristics of these SSRs in major forest and fruit trees through comparative analysis. A total of 50 significantly enriched terms were detected by comparing all identified SSR-containing genes with genes that have Pfam annotations. Most enriched functional terms were TFs that related to abiotic stress regulation, and they included Myb, AP2, TCP, and WRKY gene families. Furthermore, enriched functional terms that were common or specific to each species were analyzed, and seven species were enriched for specific functional terms. Finally, we found that functional terms AP2 and Myb_DNA-bind 4 were each significantly enriched in four species. Taken together, our findings provide valuable insights into the evolution and functions of SSR-containing genes for future functional genomics studies and valuable genetic resources for developing markers for the breeding of these tree species.

{{lists.name}}

Comprehensive identification and characterization of simple sequence repeats based on the whole-genome sequences of 14 forest and fruit trees