TeaPGDB: Tea Plant Genome Database

Xiaogang Lei; Ya Wang; Yuhan Zhou; Yongzhong Chen; Hongyuan Chen; Zhongwei Zou; Lin Zhou; Yuanchun Ma; Fei Chen; Wanping Fang; Xiaogang Lei; Ya Wang; Yuhan Zhou; Yongzhong Chen; Hongyuan Chen; Zhongwei Zou; Lin Zhou; Yuanchun Ma; Fei Chen; Wanping Fang

doi:10.48130/BPR-2021-0005

2021 Volume 1

Article Contents

Next Previous

ARTICLE Open Access

TeaPGDB: Tea Plant Genome Database

1.
College of Horticulture, Nanjing Agricultural University, Nanjing 210095, China
2.
The High-performance Computing Platform of Bioinformatics Center, Nanjing Agricultural University, Nanjing 210095, China
3.
Department of Plant Science, University of Manitoba, Winnipeg, R3T2N2, Canada
4.
Forestry and Pomology Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai 201403, China

More Information

Corresponding authors: feichen@njau.edu.cn; fangwp@njau.edu.cn

Received: 25 May 2021
Accepted: 25 June 2021
Published online: 30 June 2021
Beverage Plant Research 1, Article number: 5 (2021) | Cite this article

Abstract

As the most widely consumed beverage in the world, tea has various nutritional, economic, and global cultural values. With the development of the third-generation sequencing technology, several genome sequences of tea plants have been published. These genomic data have pivotal information that is of benefit to tea plant breeders and biologists in advancing tea plant improvement and the final quality of tea products. We hereby present the integrative online database, Tea Plant Genome Database (TeaPGDB; http://eplant.njau.edu.cn/tea), which incorporates the published genome sequences of tea plants. The current release of TeaPGDB hosts published tea plant genome data with various online tools, including JBrowse, gene search, SSR search, BLAST. TeaPGDB also contains a download server, which provides access for the download of genome-related data and rich annotation files. TeaPGDB is committed to collecting, integrating, and annotating published tea plant genome data, providing data support for research on tea plant heredity, evolution, breeding for resistance, plant improvement, and facilitating the characterization of important traits or flavor related genes in the community. Compared with other tea plant databases, this database not only contains more complete genome data and gene annotation information, but also has a user-friendly interface for researchers in the field.
- tea plant,
- genome,
- database,
- online analysis,
- annotation
Rights and permissions
Copyright: © 2021 by the author(s). Exclusive Licensee Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Xia E, Zhang H, Sheng J, Li K, Zhang Q, et al. 2017. The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Molecular Plant 10:866−77 doi: 10.1016/j.molp.2017.04.002 CrossRef Google Scholar
[2]	Chen Y, Yu M, Xu J, Chen X, Shi J. 2009. Differentiation of eight tea (Camellia sinensis) cultivars in China by elemental fingerprint of their leaves. Journal of the Science of Food and Agriculture 89:2350−55 doi: 10.1002/jsfa.3716 CrossRef Google Scholar
[3]	Juneja LR, Chu DC, Okubo T, Nagato Y, Yokogoshi H. 1999. L-theanine - a unique amino acid of green tea and its relaxation effect in humans. Trends in Food Science & Technology 10:199−204 doi: 10.1016/S0924-2244(99)00044-8 CrossRef Google Scholar
[4]	Mukhtar H, Ahmad N. 2000. Tea polyphenols: prevention of cancer and optimizing health. The American Journal of Clinical Nutrition 71:1698S−1702S doi: 10.1093/ajcn/71.6.1698S CrossRef Google Scholar
[5]	Koshiishi C, Kato A, Yama S, Crozier A, Ashihara H. 2001. A new caffeine biosynthetic pathway in tea leaves: utilisation of adenosine released from the S-adenosyl-L-methionine cycle. FEBS Letters 499:50−54 doi: 10.1016/S0014-5793(01)02512-1 CrossRef Google Scholar
[6]	Zhang Z, Li Y, Qi L, Wan X. 2006. Antifungal activities of major tea leaf volatile constituents toward Colletorichum camelliae Massea. Journal of Agricultural and Food Chemistry 54:3936−40 doi: 10.1021/jf060017m CrossRef Google Scholar
[7]	Zhang S, Xuan H, Zhang L, Fu S, Wang Y, et al. 2017. TBC2health: a database of experimentally validated health-beneficial effects of tea bioactive compounds. Briefings in Bioinformatics 18:830−36 doi: 10.1093/bib/bbw055 CrossRef Google Scholar
[8]	Xia E H, Tong W, Wu Q, Wei S, Zhao J, et al. 2020. Tea plant genomics: achievements, challenges and perspectives. Horticulture Research 7:7 doi: 10.1038/s41438-019-0225-4 CrossRef Google Scholar
[9]	Ott J, Wang J, Leal SM. 2015. Genetic linkage analysis in the age of whole-genome sequencing. Nature Reviews Genetics 16:275−84 doi: 10.1038/nrg3908 CrossRef Google Scholar
[10]	Zhang C, Wang L, Wei K, Wu L, Li H, et al. 2016. Transcriptome analysis reveals self-incompatibility in the tea plant (Camellia sinensis) might be under gametophytic control. BMC Genomics 17:359 doi: 10.1186/s12864-016-2703-5 CrossRef Google Scholar
[11]	Kang M, Wu H, Yang Q, Huang L, Hu Q, et al. 2020. A chromosome-scale genome assembly of Isatis indigotica, an important medicinal plant used in traditional Chinese medicine. Horticulture Research 7:18 doi: 10.1038/s41438-020-0240-5 CrossRef Google Scholar
[12]	Wei S, Yang Y, Yin T. 2020. The chromosome-scale assembly of the willow genome provides insight into Salicaceae genome evolution. Horticulture Research 7:45 doi: 10.1038/s41438-020-0268-6 CrossRef Google Scholar
[13]	Li S, Wang J, Dong R, Zhu H, Lan L, et al. 2020. Chromosome-level genome assembly, annotation and evolutionary analysis of the ornamental plant Asparagus setaceus. Horticulture Research 7:48 doi: 10.1038/s41438-020-0271-y CrossRef Google Scholar
[14]	Gao S, Wang B, Xie S, Xu X, Zhang J, et al. 2020. A high-quality reference genome of wild Cannabis sativa. Horticulture Research 7:73 doi: 10.1038/s41438-020-0295-3 CrossRef Google Scholar
[15]	Xue T, Zheng X, Chen D, Liang L, Chen N, et al. 2020. A high-quality genome provides insights into the new taxonomic status and genomic characteristics of Cladopus chinensis (Podostemaceae). Horticulture Research 7:46 doi: 10.1038/s41438-020-0269-5 CrossRef Google Scholar
[16]	Xanthopoulou A, Manioudaki M, Bazakos C, Kissoudis C, Farsakoglou AM, et al. 2020. Whole genome re-sequencing of sweet cherry (Prunus avium L.) yields insights into genomic diversity of a fruit species. Horticulture Research 7:60 doi: 10.1038/s41438-020-0281-9 CrossRef Google Scholar
[17]	Hu M, Sun W, Tsai WC, Xiang S, Lai X, et al. 2020. Chromosome-scale assembly of the Kandelia obovata genome. Horticulture Research 7:60 doi: 10.1038/s41438-020-0300-x CrossRef Google Scholar
[18]	Fan Y, Sahu SK, Yang T, Mu W, Wei J, et al. 2020. Dissecting the genome of star fruit (Averrhoa carambola L.). Horticulture Research 7:94 doi: 10.1038/s41438-020-0306-4 CrossRef Google Scholar
[19]	Peace CP, Bianco L, Troggio M, van de Weg E, Howard NP, et al. 2019. Apple whole genome sequences: recent advances and new prospects. Horticulture Research 6:59 doi: 10.1038/s41438-019-0141-7 CrossRef Google Scholar
[20]	Li Q, Qi J, Qin X, Dou W, Lei T, et al. 2020. CitGVD: a comprehensive database of citrus genomic variations. Horticulture Research 7:12 doi: 10.1038/s41438-019-0234-3 CrossRef Google Scholar
[21]	Liu T, Li M, Liu Z, Ai X, Li Y. 2021. Reannotation of the cultivated strawberry genome and establishment of a strawberry genome database. Horticulture research 8(1):41 doi: 10.1038/s41438-021-00476-4 CrossRef Google Scholar
[22]	Yue J, Liu J, Tang W, Wu Y, Tang X, et al. 2020. Kiwifruit Genome Database (KGD): a comprehensive resource for kiwifruit genomics. Horticulture Research 7:117 doi: 10.1038/s41438-020-0338-9 CrossRef Google Scholar
[23]	Yano K, Aoki K, Shibata D. 2007. Genomic Databases for Tomato. Plant Biotechnology 24:17−25 doi: 10.5511/plantbiotechnology.24.17 CrossRef Google Scholar
[24]	Xu H, Yu Q, Shi Y, Hua X, Tang H, et al. 2018. PGD: Pineapple Genomics Database. Horticulture Research 5:66 doi: 10.1038/s41438-018-0078-2 CrossRef Google Scholar
[25]	Kim C, Park D, Seol Y, Yoon U, Lee G, et al. 2012. An online database for genome information of agricultural plants. Bioinformation 8:1059−61 doi: 10.6026/97320630081059 CrossRef Google Scholar
[26]	Chen J, Zheng C, Ma J, Jiang C, Ercisli S, et al. 2020. The chromosome-scale genome reveals the evolution and diversification after the recent tetraploidization event in tea plant. Horticulture Research 7:63 doi: 10.1038/s41438-020-0288-2 CrossRef Google Scholar
[27]	Xia E, Tong W, Hou Y, An Y, Chen L, et al. 2020. The reference genome of tea plant and resequencing of 81 diverse accessions provide insights into its genome evolution and adaptation. Molecular Plant 13:1013−26 doi: 10.1016/j.molp.2020.04.010 CrossRef Google Scholar
[28]	Wang X, Feng H, Chang Y, Ma C, Wang L, et al. 2020. Population sequencing enhances understanding of tea plant evolution. Nature Communications 11:4447 doi: 10.1038/s41467-020-18228-8 CrossRef Google Scholar
[29]	Zhang Q, Li W, Li K, Nan H, Shi C, et al. 2020. The chromosome-level reference genome of tea tree unveils recent bursts of non-autonomous LTR retrotransposons in driving genome size evolution. Molecular Plant 13:935−38 doi: 10.1016/j.molp.2020.04.009 CrossRef Google Scholar
[30]	Zhang W, Zhang Y, Qiu H, Guo Y, Wan H, et al. 2020. Genome assembly of wild tea tree DASZ reveals pedigree and selection history of tea varieties. Nature Communications 11:3719 doi: 10.1038/s41467-020-17498-6 CrossRef Google Scholar
[31]	Wei C, Yang H, Wang S, Zhao J, Liu C, et al. 2018. Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality. PNAS 115:E4151−E4158 doi: 10.1073/pnas.1719622115 CrossRef Google Scholar
[32]	Wang P, Yu J, Jin S, Chen S, Yue C, et al. 2021. Genetic basis of high aroma and stress tolerance in the oolong tea cultivar genome. Horticulture research 8:107 doi: 10.1038/s41438-021-00542-x CrossRef Google Scholar
[33]	Guo W, Chen J, Li J, Huang J, Wang Z, et al. 2020. Portal of Juglandaceae: A comprehensive platform for Juglandaceae study. Horticulture Research 7:35 doi: 10.1038/s41438-020-0256-x CrossRef Google Scholar
[34]	Zeng C, Hollingsworth PM, Yang J, He Z S, Zhang Z R, et al. 2018. Genome skimming herbarium specimens for DNA barcoding and phylogenomics. Plant Methods 14:43 doi: 10.1186/s13007-018-0300-0 CrossRef Google Scholar
[35]	Dong M, Liu S, Xu Z, Hu Z, Ku W, et al. 2018. The complete chloroplast genome of an economic plant, Camellia sinensis cultivar Anhua, China. Mitochondrial DNA Part B - Resources 3:558−59 doi: 10.1080/23802359.2018.1462124 CrossRef Google Scholar
[36]	Huang H, Shi C, Liu Y, Mao S, Gao L. 2014. Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships. BMC Evolutionary Biology 14:151 doi: 10.1186/1471-2148-14-151 CrossRef Google Scholar
[37]	Chen S, Li R, Ma Y, Lei S, Ming R, et al. 2021. The complete chloroplast genome sequence of Camellia sinensis var. sinensis cultivar Tieguanyin (Theaceae). Mitochondrial DNA Part B-Resources 6:395−96 doi: 10.1080/23802359.2020.1869615 CrossRef Google Scholar
[38]	Ye X, Zhao Z, Xu Y, Xu H, Sun L. 2015. The Phylogenetic Analysis of Camellia sinensis cv. Longjing 43. Journal of the Korean Tea Society, special 21:63−66 Google Scholar
[39]	Li L, Hu Y, He M, Zhang B, Wu W, et al. 2021. Comparative chloroplast genomes: insights into the evolution of the chloroplast genome of Camellia sinensis and the phylogeny of Camellia. BMC genomics 22:138 doi: 10.1186/s12864-021-07427-2 CrossRef Google Scholar
[40]	Li L, Hu Y, Wu L, Chen R, Luo S. 2021. The complete chloroplast genome sequence of Camellia sinensis cv. Dahongpao: a most famous variety of Wuyi tea (Synonym: Thea bohea L.). Mitochondrial DNA Part B - Resources 6:3−5 doi: 10.1080/23802359.2020.1844093 CrossRef Google Scholar
[41]	Hao W, Wang S, Yao M, Ma J, Xu Y, et al. 2019. The complete chloroplast genome of an albino tea, Camellia sinensis cultivar 'Baiye 1'. Mitochondrial DNA Part B - Resources 4:3143−44 doi: 10.1080/23802359.2019.1667889 CrossRef Google Scholar
[42]	Lee DJ, Kim CK, Lee TH, Lee SJ, Moon DG, et al. 2020. The complete chloroplast genome sequence of economical standard tea plant, Camellia sinensis L. cultivar Sangmok, in Korea. Mitochondrial DNA Part B-Resources 5:2841−42 doi: 10.1080/23802359.2020.1790311 CrossRef Google Scholar
[43]	Rawal HC, Kumar PM, Bera B, Singh NK, Mondal TK. 2020. Decoding and analysis of organelle genomes of Indian tea (Camellia assamica) for phylogenetic confirmation. Genomics 112:659−68 doi: 10.1016/j.ygeno.2019.04.018 CrossRef Google Scholar
[44]	Zhang F, Li W, Gao C, Zhang D, Gao L. 2019. Deciphering tea tree chloroplast and mitochondrial genomes of Camellia sinensis var. assamica. Scientific Data 6:209 doi: 10.1038/s41597-019-0201-8 CrossRef Google Scholar
[45]	Jia X, Zhang W, Fernie AR, Wen W. 2019. Camellia sinensis (Tea). Trends in Genetics 37:201−2 doi: 10.1016/j.tig.2020.10.002 CrossRef Google Scholar
[46]	Finn R D, Attwood T K, Babbitt P C, Bateman A, Bork P, et al. 2017. InterPro in 2017—beyond protein family and domain annotations. Nucleic Acids Research 45:D190−D199 doi: 10.1093/nar/gkw1107 CrossRef Google Scholar
[47]	Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, et al. 2021. Pfam: The protein families database in 2021. Nucleic Acids Research 49:D412−D419 doi: 10.1093/nar/gkaa913 CrossRef Google Scholar
[48]	Potter SC, Luciani A, Eddy SR, Park Y, Lopez R, et al. 2018. HMMER web server: 2018 update. Nucleic Acids Research 46:W200−W204 doi: 10.1093/nar/gky448 CrossRef Google Scholar
[49]	Aramaki T, Blanc-Mathieu R, Endo H, Ohkubo K, Kanehisa M, et al. 2020. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36:2251−52 doi: 10.1093/bioinformatics/btz859 CrossRef Google Scholar
[50]	Armenteros JJA, Tsirigos KD, Sønderby, CK, Petersen T N, Winther O, et al. 2019. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nature Biotechnology 37:420−23 doi: 10.1038/s41587-019-0036-z CrossRef Google Scholar
[51]	Ginestet C. 2011. ggplot2: Elegant Graphics for Data Analysis. Journal of the Royal Statistical Society Series a - Statistics in Society 174:245−46 doi: 10.1111/j.1467-985X.2010.00676_9.x CrossRef Google Scholar
[52]	Gong X, Zhang D. 2011. Research on Web Server Based on Red5, Tomcat and Apache. Advanced Materials Research 282−283:721−25 doi: 10.4028/www.scientific.net/AMR.282-283.721 CrossRef Google Scholar
[53]	Hoy MB. 2011. HTML5: A new standard for the web. Medical Reference Services Quarterly 30:50−55 doi: 10.1080/02763869.2011.540212 CrossRef Google Scholar
[54]	Prokhorenko V, Choo KKR, Ashman H. 2016. Intent-Based Extensible Real-Time PHP Supervision Framework. IEEE Transactions on Information Forensics and Security 11:2215−26 doi: 10.1109/tifs.2016.2569063 CrossRef Google Scholar
[55]	Di Giacomo M. 2005. MySQL: Lessons learned on a digital library. IEEE Software 22:10−13 doi: 10.1109/MS.2005.71 CrossRef Google Scholar
[56]	Korpela J. 1998. Lurching toward Babel: HTML, CSS and XML. Computer 31:103−4 doi: 10.1109/2.689682 CrossRef Google Scholar
[57]	Wei S, Xhakaj F, Ryder BG. 2016. Empirical study of the dynamic behavior of JavaScript objects. Software-Practice and Experience 46:867−89 doi: 10.1002/spe.2334 CrossRef Google Scholar
[58]	Lee S-U, Moon I-Y. 2011. A study of user interaction using jQuery in Web Application. Journal of Advanced Navigation Technology 15(4):626−31 doi: 10.12673/jant.2011.15.4.626 CrossRef Google Scholar
[59]	Priyam A, Woodcroft BJ, Rai V, Moghul I, Munagala A, et al. 2019. Sequenceserver: a modern graphical user interface for custom BLAST databases. Molecular Biology and Evolution 36:2922−24 doi: 10.1093/molbev/msz185 CrossRef Google Scholar
[60]	Wang H, Ooi BC, Tan KL, Ong TH, Zhou L. 2003. BLAST++: BLASTing queries in batches. Bioinformatics 19:2323−24 doi: 10.1093/bioinformatics/btg310 CrossRef Google Scholar
[61]	Westesson O, Skinner M, Holmes I. 2013. Visualizing next-generation sequencing data with JBrowse. Briefings in Bioinformatics 14:172−77 doi: 10.1093/bib/bbr078 CrossRef Google Scholar
[62]	Seppey M, Manni M, Zdobnov EM. 2019. BUSCO: assessing genome assembly and annotation completeness. In Gene prediction. methods in molecular biology, ed. Kollmar M. vol 1962. New York: Humana. pp. 227−45. https://doi.org/10.1007/978-1-4939-9173-0_14
[63]	Xia EH, Li FD, Tong W, Li PH, Wu Q, et al. 2019. Tea Plant Information Archive: a comprehensive genomics and bioinformatics platform for tea plant. Plant Biotechnology Journal 17:1938−53 doi: 10.1111/pbi.13111 CrossRef Google Scholar
[64]	Yue Y, Chu G, Liu X, Tang X, Wang W, et al. 2014. TMDB: A literature-curated database for small molecular compounds found from tea. BMC Plant Biology 14:243 doi: 10.1186/s12870-014-0243-1 CrossRef Google Scholar
[65]	Zhang R, Ma Y, Hu X, Chen Y, He X, et al. 2020. TeaCoN: a database of gene co-expression network for tea plant (Camellia sinensis). BMC Genomics 21:461 doi: 10.1186/s12864-020-06839-w CrossRef Google Scholar
[66]	Buels R, Yao E, Diesh CM, Hayes RD, Munoz-Torres M, et al. 2016. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biology 17:66 doi: 10.1186/s13059-016-0924-1 CrossRef Google Scholar
[67]	Du L, Zhang C, Liu Q, Zhang X, Yue B. 2018. Krait: an ultrafast tool for genome-wide survey of microsatellites and primer design. Bioinformatics 34:681−83 doi: 10.1093/bioinformatics/btx665 CrossRef Google Scholar
[68]	Dubey H, Rawal HC, Rohilla M, Lama U, Kumar PM et al. 2020. TeaMiD: a comprehensive database of simple sequence repeat markers of tea. Database 2020:baaa013 doi: 10.1093/database/baaa013 CrossRef Google Scholar
[69]	Mondal TK, Rawal HC, Bera B, Kumar PM, Choubey M, et al. 2019. Draft genome sequence of a popular Indian tea genotype TV-1 [Camellia assamica L. (O). Kunze]. bioRxiv Preprint doi: 10.1101/762161 CrossRef Google Scholar

About this article

Cite this article

Lei X, Wang Y, Zhou Y, Chen Y, Chen H, et al. 2021. TeaPGDB: Tea Plant Genome Database. Beverage Plant Research 1: 5 doi: 10.48130/BPR-2021-0005

Lei X, Wang Y, Zhou Y, Chen Y, Chen H, et al. 2021. TeaPGDB: Tea Plant Genome Database. Beverage Plant Research 1: 5

doi: 10.48130/BPR-2021-0005

Figures(6) / Tables(2)

Download PDF

Article Metrics

Article views(19057) PDF downloads(3467)

Other Articles By Authors

on this site
on Google Scholar

HTML

INTRODUCTION

Tea is one of the three most popular non-alcoholic beverages in the world, with important economic, health and cultural value^[1]. Tea plants not only have both a long history and wide range of cultivation. So far, tea plants have been planted in more than 60 countries of the world^[2]. In the past few decades, theanine, caffeine, tea polyphenols, mineral elements and other substances that contribute to human health and tea quality, have been focused on and studied by tea researchers^[3−7]. Before the release of the Camellia sinensis var. sinensis (CSS) 'Shuchazao' genome, studies on the molecular genetics and breeding of tea plants progressed slowly due to the lack of genomic datasets^[1]. The self-incompatibility of tea plants are another reason why tea plant breeding and genetics progressed slowly^[8]. Different from other crops, tea plants are self-incompatible species, and the rate of hybrid breeding is relatively low. These factors lead to a lack of high-generation segregating populations and a lack of sufficient offspring, which is not conducive to the construction of genetic maps. Genetic mapping is the basis of molecular biology and is essential for the study of genetics and genomics, such as quantitative trait mapping, molecular marker-assisted breeding and comparative genome research^{[9, 10]}. In addition, the basic biological characteristics has been narrowed by limited knowledge of tea plant phylogenetic biology and functional genomics^[8]. In general, there are lots of research factors that hinder tea plants genetics and breeding, but almost all of them are related to tea plant genome research.

The rapid development of plant genomics has accelerated and advanced molecular biological characterization of important factors in plant vegetative and reproductive growth, tolerance to stress, and secondary metabolites, which benefits horticultural crop genetics, evolution, and improvement. In particular, over the past decade, with the development and improvement of third-generation sequencing and assembly technologies, several important horticultural crop genomes were sequenced and resequenced such as Isatis indigotic^[11], willow (Salix suchowensis)^[12], asparagus fern (Asparagus setaceus)^[13], wild hemp (Cannabis sativa)^[14], Cladopus chinensis^[15], sweet cherry (Prunus avium L.)^[16], Kandelia obovata^[17], star fruit (Averrhoa carambola L.)^[18], and apple (Malus domestica)^[19], generating a large amount of important genomic data. To facilitate the use and mining of genomic data, corresponding genomic databases have been established such as citrus^[20] strawberry (Fragaria × ananassa)^[21], kiwifruit (Actinidia spp.)^[22], tomato (Lycopersicon esculentum Miller)^[23], pineapple (Ananas comosus L.)^[24], and cabbage (Brassica oleracea var. capitata L.)^[25]. Because of the large genome size, high heterozygosity, and complexity of tea plants, the deciphering of the tea plant genome has become a significant barrier in tea science research. Similar to the genome sequencing of other horticultural crops, eight genomes of six tea plant species have been quickly deciphered, among which C. sinensis var. sinensis 'Shuchazao' has been sequenced and assembled three times^{[1, 26, 27]}. Since the publication of the genome of the C. sinensis var. assamica (CSA) 'Yunkang 10' in 2017, the genomes of C. sinensis var. sinensis (CSS) 'Biyun', CSS 'Longjing 43', the wild tea plant DASZ, CSS 'Shuchazao' and 'Huangdan' have been published successively^{[1, 26−32]}. Although these genomes have greatly promoted tea science research, it is difficult to obtain detailed information such as the sequence, chromosome location, function, and gene annotation when people look at particular functional genes related to fertility, in response to stress, resistance, or aroma formation. In addition, users need to introduce many external tools to perform gene or protein sequence alignments and query detailed information of genes. The whole process is very cumbersome, complicated and time-consuming. Although tea plant has published eight genomes in different cultivars, it is still difficult to select the proper genome sequence as a reference in research activities since the assembly quality of each genome is hard to evaluate. In attempts to understand the unknown functions of thousands of proteins and coding sequences in the individual genome, we need to perform alignments using several different databases to make inferences^[33]. In this study, we developed a tea plant genome database (TeaPGDB), which integrates all the currently published tea plant genomes. The new integrated database fully annotated the tea plant genome (organellar genome contained), including the gene localization on the chromosome, gene families identification, Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation, Gene Ontology (GO) annotation, signal peptide prediction, etc. We also provide several useful online tools, such as gene search, BLAST, JBrowse, etc. By using a gene search engine, tea researchers can easily figure out the chromosomal location of the target gene, gene family identification, GO annotations and KEGG annotations and other detailed information. In addition, users can also use online tools to align gene or protein sequences with published tea plant genomes, where we can view the relevant information of the gene in JBrowse based on the chromosomes. Unknown gene sequences are aligned and searched in published data to find homologous genes, which is important to advance the molecular and biological studies in tea plants.

RESULTS

Species and tea flavor gene pages

TeaPGDB has several unique pages including species and tea flavor gene pages. We introduce and collect data on tea species with released genomes. These pages contain the results of BUSCO analysis of all tea plant genomes, including genes involved in the synthesis of tea plant quality components (catechins, caffeine, theanine). TeaPGDB facilitates tea researchers in understanding the current status of tea plant genome research and determine the most suitable reference genome. TeaPGDB can also promote the study of tea quality components.

Data
The TeaPGDB database contains all released genome data, and all gene and protein sequence data annotated. Compared with TPIA or other databases, the annotated genomes in this database are more comprehensive and informative. These annotation data can benefit research in tea biochemical composition, resistance, and breeding, which is of great significance for promoting tea plant performance and improvement. Compared with TPIA, TeaPGDB contains more complete SSR data, and it has SSR data for all released genomes.

Online retrieval and analysis tools
TeaPGDB provides several online retrieval and analysis tools to facilitate data analysis and information retrieval. Gene and SSR search is an information retrieval tool. Users can obtain detailed gene annotation information and SSR information by entering the correct number. JBrowse is a genome visualization tool that can display genome sequence information and gene annotation information to users, and users can also upload data for customization. BLAST online analysis tool provides an online sequence comparison function, you can select tea plant data or other species data in the database for sequence comparison to find homologous sequences. Compared with TPIA, this database has a more advanced blast online analysis tool, and its reference database contains the protein, genome, and coding sequence files of the released genomes.

Download and Community modules
Download and Community modules are convenient for users to download original and annotated data, and provide user feedback to facilitate developers in order to optimize the database. It also provides access to tea-related and genome-related databases, which is conducive to the interaction and collaboration between databases.

Data type	Counts	Data size
Tea plant cultivars	6	−
Nuclear genomes¹	7	21.5 Gb
Organellar genomes²	17	3.8 Mb
Coding sequences	271,430	352 Mb
Proteins	271,430	150 Mb
Gene family identification	4,460	25 M
GO term	71,996	59 M
KEGG	90,893	1.2 G
Signal peptide	19,156	13 M
SSR	2,363,168	152 M
¹ C. sinensis var. sinensis 'Shuchazao' has two nuclear genome sequences. ² C. sinensis var. assamica 'Yunkang 10' has two mitochondrion genome sequences.

Database	Count	Species	Assembly scale	Di	Tri	Tetra	Penta	Hexa
TPGDB	7	CSS 'Biyun'	Chromosome	200 688	106 538	31 272	32 222	22 695
		CSS 'Shuchazao'¹	Chromosome	202 217	108 561	32 822	33 629	23 712
		CSS 'Shuchazao'²	Chromosome	163 949	82 721	28 356	31 275	21 408
		CSS 'Huangdan'	Chromosome	190 146	106 242	31 604	32 562	23 956
		CSS 'Longjing 43'	Chromosome	195 729	109 900	31 620	32 195	21 882
		CSA 'Yunkang 10'	Scaffold	118 758	49 111	17 059	22 058	15 878
		Wild tea plant³	Chromosome	190 468	104 621	30 014	31 930	22 488
TeaMiD	3	CSS 'Shuchazao'⁴	Scaffold	118 777	21 352	17 096	5 183	4 585
		CSA 'Yunkang 10'	Scaffold	163 982	33 223	28 426	7 289	6 091
		CA 'TV-1'⁵	Scaffold	138 689	25 392	18 829	5 720	5 281
¹ C. sinensis var. sinensis 'Shuchazao' was released by AHAU in 2020. ² C. sinensis var. sinensis 'Shuchazao' was released by TRI in 2020. ³ Wild tea plant (DASZ) was released by HZAU in 2020. ⁴ C. sinensis var. sinensis 'Shuchazao' was released by AHAU in 2018. ⁵ C. assamica L (O). Kuntze 'TV-1' was released by IARI.

{{lists.name}}

TeaPGDB: Tea Plant Genome Database

Abstract