Evolutionary genomics of structural variation in the tea plant, <i>Camellia sinensis</i>

Shuai Chen; Jingping Fang; Yibin Wang; Pengjie Wang; Shengcheng Zhang; Zhenyang Liao; Hong Lu; Xingtan Zhang; Shuai Chen; Jingping Fang; Yibin Wang; Pengjie Wang; Shengcheng Zhang; Zhenyang Liao; Hong Lu; Xingtan Zhang

doi:10.48130/TP-2022-0002

2022 Volume 1

Article Contents

Next Previous

ARTICLE Open Access

Evolutionary genomics of structural variation in the tea plant, Camellia sinensis

1.
Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
2.
College of Life Science, Center of Engineering Technology Research for Microalgae Germplasm Improvement of Fujian, Southern Institute of Oceanography, Fujian Normal University, Fuzhou 350117, Fujian, China

More Information

Corresponding author: zhangxingtan@caas.cn

Received: 06 April 2022
Accepted: 02 May 2022
Published online: 22 May 2022
Tropical Plants 1, Article number: 2 (2022) | Cite this article

Highlights

Resequencing of 107 tea plant genomes uncovered 44,240 structural variants

The dynamic changes of SVs functionally affect gene function, especially gene expression

A new pipeline was proposed to identify copy numer variations associated with artificial selection
Abstract

Structural variants (SVs) are a type of genetic variation that contribute substantially to phenotypic diversity and evolution. Further study of SVs will help us understand the influence of SVs associated with tea quality and stress resistance and provide new insight into tea plant breeding improvement and genetic research. However, SVs have not been thoroughly discovered in the tea plant genomes. Herein, we constructed a large-scale SV map across a population of 107 resequenced genomes, including both ancient and cultivated tea plants. A total of 44,240 high-confident SVs were identified, including 34,124 DEL (deletions), 4,448 DUP (duplications), 2,503 INV (inversions), 544 INS (insertions) and 2,621 TRA (translocations). In total, 12,400 protein-coding genes were overlapped with SVs, of which 49.5% were expressed in all five tea tissues. SV-based analysis of phylogenetic relationships and population structure in tea plants showed a consistent evolutionary history with the SNP-based results. We also identified SVs subject to artificial selection and found that genes under domestication were enriched in metabolic pathways involving theanine and purine alkaloids, biosynthesis of monoterpenoid, phenylpropanoid, fatty acid, and isoflavonoid, contributing to traits of agronomic interest in tea plants. In addition, a total of 27 terpene synthase (TPS) family genes were selected during domestication. These results indicate that these SVs could provide extensive genomic information for tea quality improvement.

Graphical Abstract
- Structural variants,
- Tea plant,
- Population selection,
- Copy number variants

Supplementary information

Supplemental Table S1 Information and statistics of re-sequenced tea accessions.
Supplemental Fig. S1 Distribution of SVs, gene density and TE density on Chr07.
Supplemental Fig. S2 Interaction frequency, A/B compartment, gene density and LTR density on Chr07. The colour scale represents normalized interaction matrix (Pearson’s correlation coefficient). A/B Compartments y-axis represents eigenvector value of correlation matrix.
Supplemental Fig. S7 GO enrichment analysis(Cellular Component) of genes with the SV-overlapping genes .
Supplemental Fig. S9 GO enrichment analysis(Biological Process) of genes with the SV-overlapping genes.
Supplemental Table S3 GO enrichment analysis of SVs.
Supplemental Fig. S11 Gene-Associated SVs example may impact expression. The pink rectangle represents UDP-glucosyltransferases gene (CsUGT), and red rectangle represents a 400 bp deletion of CDS start.
Supplemental Fig. S3 Phylogenetic tree based on SNPs, accessions are represented in the same color code throughout this figure (blue=wild, red=Mid, Light purple=SFJ, Teal=NFJ, and Yellow=ZJ).
Supplemental Fig. S8 GO enrichment analysis(Molecular Function) of genes with the SV-overlapping genes.
Supplemental Fig. S5 SNP-based admixture analysis of tea accessions, including 12 ancient and 94 cultivated tea accessions (K = 2 to 10).
Supplemental Fig. S4 SV-based admixture analysis of tea accessions, including 12 ancient and 94 cultivated tea accessions (K = 2 to 10).
Supplemental Table S2 LD decay in each of the geographic groups.
Supplemental Fig. S6 Decay of linkage disequilibrium (LD) in each of the geographic groups.
Supplemental Fig. S10 Domestication analysis of SVs in four sub populations.
Supplemental Table S5 Statistics of selected SV genes.
Supplemental Table S4 KEGG enrichment analysis of cultivated selection genes.
Supplemental Table S7 KEGG enrichment analysis of CNV genes in four sub populations.
Supplemental Table S6 SV selected genes releated to the Theanine metabolism and purine metabolism.
Supplemental Table S8 The CNV genes releated to metabolic pathway of tea plant.
Supplemental Table S9 FPKM of 27 TPS family genes in domestication of CNVs.
Supplemental Table S10 TPS gene function annotation on Chr13.

Rights and permissions
Copyright: © 2022 by the author(s). Published by Maximum Academic Press on behalf of Hainan University. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, et al. 2011. The genome of Theobroma cacao. Nature Genetics 43:101−8 doi: 10.1038/ng.736 CrossRef Google Scholar
[2]	Ashihara H, Crozier A. 2001. Caffeine: a well known but little mentioned compound in plant science. Trends in Plant Science 6:407−13 doi: 10.1016/S1360-1385(01)02055-6 CrossRef Google Scholar
[3]	Lu H, Zhang J, Yang Y, Yang X, Xu B, et al. 2016. Earliest tea as evidence for one branch of the Silk Road across the Tibetan Plateau. Scientific Reports 6:18955 doi: 10.1038/srep18955 CrossRef Google Scholar
[4]	Hayat K, Iqbal H, Malik U, Bilal U, Mushtaq S. 2015. Tea and its consumption: benefits and risks. Critical Reviews in Food Science and Nutrition 55:939−54 doi: 10.1080/10408398.2012.678949 CrossRef Google Scholar
[5]	Xia E, Zhang H, Sheng J, Li K, Zhang Q, et al. 2017. The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Molecular Plant 10:866−77 doi: 10.1016/j.molp.2017.04.002 CrossRef Google Scholar
[6]	Wei C, Yang H, Wang S, Zhao J, Liu C, et al. 2018. Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality. PNAS 115:E4151−E4158 doi: 10.1073/pnas.1719622115 CrossRef Google Scholar
[7]	Taniguchi F, Kimura K, Saba T, Ogino A, Yamaguchi S, et al. 2014. Worldwide core collections of tea (Camellia sinensis) based on SSR markers. Tree Genetics & Genomes 10:1555−65 doi: 10.1007/s11295-014-0779-0 CrossRef Google Scholar
[8]	Wang P, Yu J, Jin S, Chen S, Yue C, et al. 2021. Genetic basis of high aroma and stress tolerance in the oolong tea cultivar genome. Horticulture Research 8:107 doi: 10.1038/s41438-021-00542-x CrossRef Google Scholar
[9]	Wang X, Feng H, Chang Y, Ma C, Wang L, et al. 2020. Population sequencing enhances understanding of tea plant evolution. Nature Communications 11:4447 doi: 10.1038/s41467-020-18228-8 CrossRef Google Scholar
[10]	Xia E, Tong W, Hou Y, An Y, Chen L, et al. 2020. The reference genome of tea plant and resequencing of 81 diverse accessions provide insights into genome evolution and adaptation of tea plants. Molecular Plant 13:1013−26 doi: 10.1016/j.molp.2020.04.010 CrossRef Google Scholar
[11]	Zhang Q, Li W, Li K, Nan H, Shi C, et al. 2020. The chromosome-level reference genome of tea tree unveils recent bursts of non-autonomous LTR retrotransposons in driving genome size evolution. Molecular Plant 13:935−38 doi: 10.1016/j.molp.2020.04.009 CrossRef Google Scholar
[12]	Zhang X, Chen S, Shi L, Gong D, Zhang S, et al. 2021. Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis. Nature Genetics 53:1250−59 doi: 10.1038/s41588-021-00895-y CrossRef Google Scholar
[13]	Chaisson MJP, Sanders AD, Zhao X, Malhotra A, Porubsky D, et al. 2019. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nature Communications 10:1784 doi: 10.1038/s41467-018-08148-z CrossRef Google Scholar
[14]	Morin PA, Luikart G, Wayne RK, the SNP workshop group. 2004. SNPs in ecology, evolution and conservation. Trends in Ecology & Evolution 19:208−16 doi: 10.1016/j.tree.2004.01.009 CrossRef Google Scholar
[15]	Wellenreuther M, Mérot C, Berdan E, Bernatchez L. 2019. Going beyond SNPs: The role of structural genomic variants in adaptive evolution and species diversification. Molecular Ecology 28:1203−9 doi: 10.1111/mec.15066 CrossRef Google Scholar
[16]	Li Y, Zhou G, Ma J, Jiang W, Jin L, et al. 2014. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nature Biotechnology 32:1045−52 doi: 10.1038/nbt.2979 CrossRef Google Scholar
[17]	Tao Y, Zhao X, Mace E, Henry R, Jordan D. 2019. Exploring and exploiting pan-genomics for crop improvement. Molecular Plant 12:156−69 doi: 10.1016/j.molp.2018.12.016 CrossRef Google Scholar
[18]	Zhang Z, Mao L, Chen H, Bu F, Li G, et al. 2015. Genome-wide mapping of structural variations reveals a copy number variant that determines reproductive morphology in cucumber. The Plant Cell 27:1595−604 doi: 10.1105/tpc.114.135848 CrossRef Google Scholar
[19]	Gaut BS, Seymour DK, Liu Q, Zhou Y. 2018. Demography and its effects on genomic variation in crop domestication. Nature Plants 4:512−20 doi: 10.1038/s41477-018-0210-1 CrossRef Google Scholar
[20]	Alonge M, Wang X, Benoit M, Soyk S, Pereira L, et al. 2020. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182:145−161.E23 doi: 10.1016/j.cell.2020.05.021 CrossRef Google Scholar
[21]	Liu Y, Du H, Li P, Shen Y, Peng H, et al. 2020. Pan-genome of wild and cultivated soybeans. Cell 182:162−176.E13 doi: 10.1016/j.cell.2020.05.023 CrossRef Google Scholar
[22]	Tattini L, D'Aurizio R, Magi A. 2015. Detection of genomic structural variants from next-generation sequencing data. Frontiers in Bioengineering and Biotechnology 3:92 doi: 10.3389/fbioe.2015.00092 CrossRef Google Scholar
[23]	Caicedo AL, Williamson SH, Hernandez RD, Boyko A, Fledel-Alon A, et al. 2007. Genome-wide patterns of nucleotide polymorphism in domesticated rice. PLoS Genetics 3:e0030163 doi: 10.1371/journal.pgen.0030163 CrossRef Google Scholar
[24]	Zhu Q, Zheng X, Luo J, Gaut BS, Ge S. 2007. Multilocus analysis of nucleotide variation of Oryza sativa and its wild relatives: severe bottleneck during domestication of rice. Molecular Biology and Evolution 24:875−88 doi: 10.1093/molbev/msm005 CrossRef Google Scholar
[25]	Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, et al. 2005. The effects of artificial selection on the maize genome. Science 308:1310−14 doi: 10.1126/science.1107891 CrossRef Google Scholar
[26]	Doebley JF, Gaut BS, Smith BD. 2006. The molecular genetics of crop domestication. Cell 127:1309−21 doi: 10.1016/j.cell.2006.12.006 CrossRef Google Scholar
[27]	Lu J, Tang T, Tang H, Huang J, Shi S, et al. 2006. The accumulation of deleterious mutations in rice genomes: a hypothesis on the cost of domestication. Trends in Genetics 22:126−31 doi: 10.1016/j.tig.2006.01.004 CrossRef Google Scholar
[28]	Liu Q, Zhou Y, Morrell PL, Gaut BS. 2017. Deleterious variants in Asian rice and the potential cost of domestication. Molecular Biology and Evolution 34:908−24 doi: 10.1093/molbev/msw296 CrossRef Google Scholar
[29]	Wang L, Beissinger TM, Lorant A, Ross-Ibarra C, Ross-Ibarra J, et al. 2017. The interplay of demography and selection during maize domestication and expansion. Genome Biology 18:215 doi: 10.1186/s13059-017-1346-4 CrossRef Google Scholar
[30]	Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, et al. 2018. Accurate detection of complex structural variations using single-molecule sequencing. Nature Methods 15:461−68 doi: 10.1038/s41592-018-0001-7 CrossRef Google Scholar
[31]	Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, et al. 2016. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32:1220−22 doi: 10.1093/bioinformatics/btv710 CrossRef Google Scholar
[32]	Layer RM, Chiang C, Quinlan AR, Hall IM. 2014. LUMPY: a probabilistic framework for structural variant discovery. Genome Biology 15:R84 doi: 10.1186/gb-2014-15-6-r84 CrossRef Google Scholar
[33]	Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, et al. 2012. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28:i333−i339 doi: 10.1093/bioinformatics/bts378 CrossRef Google Scholar
[34]	Jeffares DC, Jolly C, Hoti M, Speed D, Shaw L, et al. 2017. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nature Communications 8:14061 doi: 10.1038/ncomms14061 CrossRef Google Scholar
[35]	Shcherban AB. 2015. Repetitive DNA sequences in plant genomes. Russian Journal of Genetics: Applied Research 5:159−67 doi: 10.1134/S2079059715030168 CrossRef Google Scholar
[36]	Eagen KP. 2018. Principles of chromosome architecture revealed by Hi-C. Trends in Biochemical Sciences 43:469−78 doi: 10.1016/j.tibs.2018.03.006 CrossRef Google Scholar
[37]	Dong P, Tu X, Chu P, Lü P, Zhu N, et al. 2017. 3D chromatin architecture of large plant genomes determined by local A/B compartments. Molecular Plant 10:1497−509 doi: 10.1016/j.molp.2017.11.005 CrossRef Google Scholar
[38]	Su X, Wang W, Xia T, Gao L, Shen G, et al. 2018. Characterization of a heat responsive UDP: Flavonoid glucosyltransferase gene in tea plant (Camellia sinensis). PLoS One 13:e0207212 doi: 10.1371/journal.pone.0207212 CrossRef Google Scholar
[39]	McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. 2010. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20:1297−303 doi: 10.1101/gr.107524.110 CrossRef Google Scholar
[40]	Alexander DH, Lange K. 2011. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics 12:246 doi: 10.1186/1471-2105-12-246 CrossRef Google Scholar
[41]	Zhang W, Rong J, Wei C, Gao L, Chen J, et al. 2018. Domestication origin and spread of cultivated tea plants. Biodiversity Science 26:357−72 doi: 10.17520/biods.2018006 CrossRef Google Scholar
[42]	Yoon S, Xuan Z, Makarov V, Ye K, Sebat J. 2009. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Research 19:1586−92 doi: 10.1101/gr.092981.109 CrossRef Google Scholar
[43]	Zhou Z, Jiang Y, Wang Z, Gou Z, Lyu J, et al. 2015. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nature Biotechnology 33:408−14 doi: 10.1038/nbt.3096 CrossRef Google Scholar
[44]	Song C, Härtl K, McGraphery K, Hoffmann T, Schwab W. 2018. Attractive but Toxic: Emerging roles of glycosidically bound volatiles and glycosyltransferases involved in their formation. Molecular Plant 11:1225−36 doi: 10.1016/j.molp.2018.09.001 CrossRef Google Scholar
[45]	Yang Z, Baldermann S, Watanabe N. 2013. Recent studies of the volatile compounds in tea. Food Research International 53:585−99 doi: 10.1016/j.foodres.2013.02.011 CrossRef Google Scholar
[46]	Larson G, Piperno DR, Allaby RG, Purugganan MD, Andersson L, et al. 2014. Current perspectives and the future of domestication studies. PNAS 111:6139−6146 doi: 10.1073/pnas.1323964111 CrossRef Google Scholar
[47]	Kou Y, Liao Y, Toivainen T, Lv Y, Tian X, et al. 2020. Evolutionary genomics of structural variation in Asian rice (Oryza sativa) domestication. Molecular Biology and Evolution 37:3507−24 doi: 10.1093/molbev/msaa185 CrossRef Google Scholar
[48]	Zhang W, Zhang Y, Qiu H, Guo Y, Wan H, et al. 2020. Genome assembly of wild tea tree DASZ reveals pedigree and selection history of tea varieties. Nature Communications 11:3719 doi: 10.1038/s41467-020-17498-6 CrossRef Google Scholar
[49]	Deng W, Ogita S, Ashihara H. 2008. Biosynthesis of theanine (γ-ethylamino-l-glutamic acid) in seedlings of Camellia sinensis. Phytochemistry Letters 1:115−19 doi: 10.1016/j.phytol.2008.06.002 CrossRef Google Scholar
[50]	Kato M, Ashihara H. 2008. Biosynthesis and catabolism of purine alkaloids in Camellia plants. Natural Product Communications 3:1934578X0800300 doi: 10.1177/1934578x0800300907 CrossRef Google Scholar
[51]	Suzuki, T. 1972. The participation of S-adenosylmethionine in the biosynthesis of caffeine in the tea plant. FEBS Letters 24:18−20 doi: 10.1016/0014-5793(72)80815-9 CrossRef Google Scholar
[52]	Ashihara H, Yokota T, Crozier A. Purine Alkaloids, Cytokinins, and Purine-Like Neurotoxin Alkaloids. In Natural Products, eds. Ramawat KG, Mérillon JM. Heidelberg: Springer Berlin Heidelberg. pp. 953–75 https://doi.org/10.1007/978-3-642-22144-6_32.
[53]	Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114−20 doi: 10.1093/bioinformatics/btu170 CrossRef Google Scholar
[54]	Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754−60 doi: 10.1093/bioinformatics/btp324 CrossRef Google Scholar
[55]	Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, et al. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078−79 doi: 10.1093/bioinformatics/btp352 CrossRef Google Scholar
[56]	Chiang C, Layer RM, Faust GG, Lindberg MR, Rose DB, et al. 2015. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nature Methods 12:966−68 doi: 10.1038/nmeth.3505 CrossRef Google Scholar
[57]	Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312−13 doi: 10.1093/bioinformatics/btu033 CrossRef Google Scholar
[58]	Alexander DH, Novembre J, Lange K. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Research 19:1655−64 doi: 10.1101/gr.094052.109 CrossRef Google Scholar
[59]	Servant N, Varoquaux N, Lajoie BR, Viara E, Chen C, et al. 2015. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biology 16:259 doi: 10.1186/s13059-015-0831-x CrossRef Google Scholar
[60]	Weir BS, Cockerham CC. 1984. Estimating F-Statistics for the Analysis of Population Structure. Evolution 38:1358−70 doi: 10.1111/j.1558-5646.1984.tb05657.x CrossRef Google Scholar
[61]	Danecek P, Auton A, Abecasis G, Albers CA, Banks E, et al. 2011. The variant call format and VCFtools. Bioinformatics 27:2156−58 doi: 10.1093/bioinformatics/btr330 CrossRef Google Scholar
[62]	Pedersen BS, Quinlan AR. 2018. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34:867−68 doi: 10.1093/bioinformatics/btx699 CrossRef Google Scholar
[63]	Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9:357−59 doi: 10.1038/nmeth.1923 CrossRef Google Scholar
[64]	Li B, Dewey CN. 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323 doi: 10.1186/1471-2105-12-323 CrossRef Google Scholar

About this article

Cite this article

Chen S, Fang J, Wang Y, Wang P, Zhang S, et al. 2022. Evolutionary genomics of structural variation in the tea plant, Camellia sinensis. Tropical Plants 1:2 doi: 10.48130/TP-2022-0002

Chen S, Fang J, Wang Y, Wang P, Zhang S, et al. 2022. Evolutionary genomics of structural variation in the tea plant, Camellia sinensis. Tropical Plants 1:2 doi: 10.48130/TP-2022-0002

Figures(5)

Download PDF

Article Metrics

Article views(12259) PDF downloads(3498)

Other Articles By Authors

on this site
on Google Scholar

HTML

INTRODUCTION

Tea [Camellia sinensis (L.) O. Kuntze] is one of the most popular non-alcoholic caffeine-containing beverages with outstanding medicinal and cultural properties worldwide^[1,2]. The earliest credible archaeological evidence of tea being consumed as a beverage dates back to 59 BCE during the Western Han Dynasty^[3]. In addition to the attractive aroma and mellow taste, tea possesses a plethora of characteristic secondary metabolites beneficial for human health and largely determine the tea flavor, such as catechins, theanine, caffeine, polysaccharides, and minerals^[4−6]. Nowadays, commercial tea plants have been widely planted in China and worldwide^[7]. The goal of tea breeding and improvement research has always been to breed high-quality tea germplasm resources. Chinese tea type Camellia sinensis var. sinensis (CSS) and Assam tea type Camellia sinensis var. assamica (CSA) are the two major groups of cultivated tea. Several high-quality tea plant genomes, including CSS and CSA, were recently assembled, and the related population genetic diversity landscapes have been thoroughly studied^[5,6,8−12], providing a theoretical foundation for future research to understand and utilize the genome that determines the diversity of tea germplasm.

Various genetic variants can influence the phenotype of organisms, especially those related to human diseases^[13]. They also play an essential role in the diversity and evolution of species. Early studies have suggested that single nucleotide polymorphisms (SNPs) are the main contributors to biological diversity^[13,14]. However, extensive evidence from genetics and molecular biology has demonstrated that structural variants (SVs) are more common than SNPs^[15] and can cause major phenotypic variations affecting agronomic traits^[16−18]. Approximately one-third of reported crop phenotype changes are caused by structural variations^[19]. A recent study of panSVs based on 100 tomato accessions showed that multiple SVs could change gene dosage and expression levels, thus modifying fruit flavor, size, and production^[20]. The soybean pan-genome studies suggested that a 1.4 kb deletion in the promoter region of a Fe²⁺/Zn²⁺ regulated transporter gene (SoyZH13_14G179600) led to the decreased expression of this transporter, and the genetic diversity of this gene is responsible for the divergent ability in iron uptake among soybean accessions^[21].

Advancements in genomic technologies and detection methodologies have allowed us to study the effect of chromosome-level structure variations on agronomic traits through plant population-scale genomics and genetic studies. SNP-based population genetic analysis is the most prevalent method applied in current genetic variations and domestication studies in the tea plant. For instance, population genetic relationships among different varieties and geographically distinct populations have been established through SNP-based strategies, providing some meaningful conclusions about the origin, domestication, and quality characteristics of the tea plant^[9,10,12]. Compared to the SNP-based studies that can only locate a fraction of trait-associated genetic variations, SVs that play a greater impact on various biological processes and traits have been largely neglected in studies of tea plants.

Structural variants usually refer to large changes in chromosome structure, defined as more than 50 bp in length, including deletions, insertions, duplications, inversions, and translocations^[22]. Copy number variants (CNVs) are specific SVs that contribute to genetic variations underlying important domestication traits. A copy number variation arising from a recent 30.2 kb duplication in the cucumber genome was found to involve four Female-determined genes, which gives rise to gynoecious cucumber plants that bear only female flowers and set fruit almost at each node^[18]. Some crop populations undergo a strong bottleneck during the domestication process, like rice^[23,24] and maize^[25]. They have experienced bottlenecks of decreasing genetic diversity and repatterning the frequencies of genetic variants^[26]. Analysis of CNVs in common crops with a bottleneck effect history may differ from inferences based on the tea plant. Furthermore, some evidence from rice^[27,28] and maize^[29] suggested that these bottlenecks contribute to domestication. These studies provided a theoretical and practical basis for the genome-wide characterization of CNVs among diverse tea populations. Large-scale resequencing has been undertaken in diverse elite germplasm accessions of the tea plant in this study. This resequenced dataset combined with the available draft tea genomes can represent a valuable resource for discovering causal structural variations underlying important traits associated with tea qualities and diversification. Our goal is to fill a major gap in our knowledge of tea genome diversity and trait-influencing mutations by investigating the population-scale SVs and CNVs in ancient and cultivated tea populations. We also compare the population genetic differences between SV-based and SNP-based analyses. To do so, we analyzed population genomic data of 107 high-coverage resequenced tea individuals from publicly available genomic resources, which cover most of the tea-growing areas. In this study, we mainly focus on the following questions: whether the population genetic variations based on SVs and SNPs analysis are consistent and whether SVs can provide insight into the domestication and selected genomic regions outside of SNPs. Finally, whether these CNVs identified across the whole genome could influence metabolic pathways associated with agriculturally important traits.

DISCUSSION

Structural variants (SVs) are becoming a frontier of plant population genomics and explain amounts of phenotype variations. Some studies on crops have proved interesting general patterns about crop domestication^[46]. Until now, the SVs during domestication have been investigated in common crops such as rice^[47], tomato^[20], where SVs provided evidence that may have been under artificial selection and associated with cultivation and improvement. However, tea plants are clonally propagated, very different from rice, tomato, and other annuals plant, leading to the accumulation of recessive deleterious mutations^[12], which also increased SV numbers in the domesticate. Our previous studies have found that artificial selection provided evidence of parallel domestication in CSA and CSS, which domestication traits were likely targets of artificial selection^[12] Although numerous studies on the population genetics of tea plants have been published^[10,48], most of them are based on SNPs, there is no systematic study on SVs in this field, and profound structural variants, in particular, are still unrevealed about the domestication of tea plants.

This study reported a genome-wide structural variation map with large-scale population resequencing data in tea plants. In total, 44,240 high curated SVs were detected and unevenly distributed across chromosomes; in particular, they have a relatively concentrated density on Chr01, which may be caused by a large number of tandem repeats in this region. In addition, a 55Mb segment of the SV-sparsed region was found on Chr07, speculated that a transcriptional inhibition region was observed by Hi-C technology. We still found that most SVs are deletion type, indicating that deletions are very common during plant genome evolution. Phylogenetic relationship and population structure analysis proved an almost consistent evolutionary process between SNPs and SVs.

We further performed domestication analysis of tea plants based on the SVs, where F_ST analyses indicated some of the selected genes were detected with SVs only. We believe that the selection regions identified by SV divergence between ancient and cultivated tea plants represent selection signals. Some of the genes potentially affecting agronomic traits were identified only in SV domestication analysis. Some of the selected genes were related to critical metabolic pathways of tea trees in different sub-populations. The genes under domestication we detected are important for theanine synthesis and flavor forming in tea plants, such as the presence of high concentrations of purine alkaloids in tea plants. One of the major steps of biosynthetic is the caffeine pathway^[49−51]. Two hypotheses have been proposed for the ecological role of purine alkaloids, such as caffeine: the chemical defense theory and the allelopathic function theory^[52], caffeine-producing tobacco has been proved to play an essential role in plant defense against fungal and insect pests by transgenic assays. Therefore, with the help of the practice basis, it is of epoch-making significance for resisting insect pests of tea plants through modifying related genes of tea plant by genetic engineering technology. In addition, underlying CNVs selected genes associated with the catechin synthesis pathway, for example, monoterpenoid, phenylpropanoid, and isoflavonoid biosynthesis, including 27 TPS family genes were selected during domestication. Previous evidence has demonstrated that TPS genes with no expression under normal conditions would show substantially increasing expression in response to the attack from Ectropis obliqua, one of the most destructive pests of tea plants^[10]. Some SV-genes or CNV-genes we detected contribute to traits of agronomic interest.

In conclusion, this study provides insights into improving our understanding of SVs acting on tea domestication, and the comprehensive SV set introduces a reference for the subsequent development of genetic markers and future breeding strategies.

{{lists.name}}

Evolutionary genomics of structural variation in the tea plant, Camellia sinensis

Highlights

Abstract