Genome resequencing reveals an independently originated <i>Camellia sinensis</i> variety – Hainan tea

Dazhong Guo; Dongliang Li; Zijun Wang; Dawei Li; Yingyi Zhou; Guisheng Xiang; Wenting Zhang; Weibin Wang; Zongzhuang Fang; Tingting Hao; Daojun Zheng; Yahui Lei; Ling Yang; Wei Zhang; Shi Tang; Lijuan Zheng; Yuli Cao; Yewei Huang; Shengchang Duan; Dazhong Guo; Dongliang Li; Zijun Wang; Dawei Li; Yingyi Zhou; Guisheng Xiang; Wenting Zhang; Weibin Wang; Zongzhuang Fang; Tingting Hao; Daojun Zheng; Yahui Lei; Ling Yang; Wei Zhang; Shi Tang; Lijuan Zheng; Yuli Cao; Yewei Huang; Shengchang Duan

doi:10.48130/abd-0024-0003

2024 Volume 1

Article Contents

Next Previous

ARTICLE Open Access

Genome resequencing reveals an independently originated Camellia sinensis variety – Hainan tea

1.
State Key Laboratory of Biological Big Data in Yunnan Province, Yunnan Agricultural University, Kunming 650201, China
2.
Hainan Academy of Agricultural Sciences, Haikou 571100, China
3.
State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China
4.
College of Science, Yunnan Agricultural University, Kunming 650201, China
5.
Hainan Agricultural Reclamation Wuzhishan Tea Industry Group Co., Ltd., Haikou 571101, China
6.
Hainan Natural Tea Co., Ltd., Baisha 572812, China
7.
Wuzhishan Yexian Bioscience & Technology Co., Ltd., Wuzhishan 572215, China
8.
Hainan Qiongzhong Xinwei Rainforest Tea Industry Co., Ltd., Qiongzhong 572999, China
9.
Yunnan Research Institute for Local Plateau Agriculture and Industry, Yunnan Agricultural University, Kunming 650201, China
10.
College of Plant Protection, Yunnan Agricultural University, Kunming 650201, China
^#Authors contributed equally: Dazhong Guo, Dongliang Li

More Information

Corresponding authors: lichuangyewei100@163.com (Yewei Huang); duanshengchang@163.com (Shengchang Duan)

Received: 04 March 2024
Revised: 27 March 2024
Accepted: 11 April 2024
Published online: 17 May 2024
Agrobiodiversity 2024, 1(1): 3−12 | Cite this article

Abstract

Tea, originating in China over 3,000 years ago, has transitioned from a medicinal herb to a widely consumed beverage. Despite considerable research focusing on tea plants in southwestern China, little attention has been paid to those on Hainan Island. The notable resemblance between Hainan tea and C. sinensis var. assamica, alongside the unique geographical and climatic conditions of Hainan Island, has presented significant challenges for taxonomic and genetic investigations concerning Hainan tea. Our study bridged this gap by collecting 500 samples from Hainan Province and employing whole-genome resequencing to examine interspecific differences between Hainan tea and cultivated varieties. The findings confirmed the distinct taxonomic position of Hainan tea within Camellia sinensis, providing valuable insights for resource conservation and molecular breeding. Furthermore, our methodology offers a framework for investigating the origin, domestication, and genetic diversity of other species native to Hainan Island.
- Camellia sinensis,
- SNPs,
- Whole genome resequencing,
- Population structure,
- Hainan

Supplementary information

Supplemental Table S1 The relevant information of 500 Hainan tea samples used in this study.
Supplemental Table S2 Quantity table of SNPs after different hard filter.
Supplemental Table S3 Supplementary Table 3. The information used in this study includes global Assamonia, global Sinensis, Other, Bloom Camellia, Oil seed, and Wild samples.
Supplemental Table S4 f3 Statistical analysis results.
Supplemental Table S5 D Statistical analysis results.
Supplemental Table S6 Hainan tea samples used in KING software analysis of genetic relationships.
Supplemental Fig. S1 The density plot of the sample's mapping rate, X-coordinates indicate the Mapping rate and Y-coordinates indicate the density.
Supplemental Fig. S2 Density distribution plot of SNPs hard filter.
Supplemental Fig. S3 Group rootless trees constructed by OLMS, LMS, global Assamica and global Sinensis. KM6 (C. cuspidata) was selected as the outgroup.
Supplemental Fig. S4 Figure 3A cross-validation error. The x-axis represents the K value, and the y-axis represents the cross-validation error. The dots in the figure show K = 8 with the smallest cross-validation error.
Supplemental Fig. S5 Principal Component Analysis of Hainan Tea with global Assamica, global Sinensis, wild Camellia sinensis, Bloom Camellia, Oil seed Camellia and other Camellia. Principal Components 1 and 3. The analysis focused on principal components 1 and 3. Each sample is represented by a small circle, with the color of each circle indicating the corresponding group as depicted in the legend on the right side.
Supplemental Fig. S6 (A,B) Optimal m-value diagrams for OptM judgment. The horizontal axis is the m value, and the vertical axis is the likelihood value. The optimal m value in the figure is 1, which means that the historical relationship of samples can be best explained by 1 migration.

Rights and permissions
Copyright: © 2024 by the author(s). Published by Maximum Academic Press on behalf of Yunnan Agricultural University. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Xia EH, Zhang HB, Sheng J, Li K, Zhang QJ, et al. 2017. The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Molecular Plant 10:866−77 doi: 10.1016/j.molp.2017.04.002 CrossRef Google Scholar
[2]	Wei C, Yang H, Wang S, Zhao J, Liu C, et al. 2018. Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality. Proceedings of the National Academy of Sciences of the United States of America 115:E4151−E4158 doi: 10.1073/pnas.1719622115 CrossRef Google Scholar
[3]	Henry BC. 1886. Ling-Nam: or, interior views of southern China, including explorations in the hitherto untraversed island of Hainan. London: SW Partridge. 511 pp.
[4]	Wambulwa MC, Meegahakumbura MK, Kamunya S, Wachira FN. 2021. From the wild to the cup: tracking footprints of the tea species in time and space. Frontiers in Nutrition 8:706770 doi: 10.3389/fnut.2021.706770 CrossRef Google Scholar
[5]	Li MM, Meegahakumbura MK, Wambulwa MC, Burgess KS, Möller M, et al. 2023. Genetic analyses of ancient tea trees provide insights into the breeding history and dissemination of Chinese Assam tea (Camellia sinensis var. assamica). Plant Diversity 46:229−37 doi: 10.1016/j.pld.2023.06.002 CrossRef Google Scholar
[6]	Zhang W, Zhang Y, Qiu H, Guo Y, Wan H, et al. 2020. Genome assembly of wild tea tree DASZ reveals pedigree and selection history of tea varieties. Nature Communications 11:3719 doi: 10.1038/s41467-020-17498-6 CrossRef Google Scholar
[7]	Zhang X, Chen S, Shi L, Gong D, Zhang S, et al. 2021. Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis. Nature Genetics 53:1250−59 doi: 10.1038/s41588-021-00895-y CrossRef Google Scholar
[8]	Wang X, Feng H, Chang Y, Ma C, Wang L, et al. 2020. Population sequencing enhances understanding of tea plant evolution. Nature Communications 11:4447 doi: 10.1038/s41467-020-18228-8 CrossRef Google Scholar
[9]	Zhou Y, He W, He Y, Chen Q, Gao Y, et al. 2023. Formation of 8-hydroxylinalool in tea plant Camellia sinensis var. Assamica ‘Hainan dayezhong’. Food Chemistry: Molecular Sciences 6:100173 doi: 10.1016/j.fochms.2023.100173 CrossRef Google Scholar
[10]	Huang H, Shi C, Liu Y, Mao SY, Gao LZ. 2014. Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships. BMC Evolutionary Biology 14:151 doi: 10.1186/1471-2148-14-151 CrossRef Google Scholar
[11]	Whittaker RJ, Fernández-Palacios JM, Matthews TJ, Borregaard MK, Triantis KA. 2017. Island biogeography: Taking the long view of nature’s laboratories. Science 357:eaam8326 doi: 10.1126/science.aam8326 CrossRef Google Scholar
[12]	Zhou M, Liu J, Liang Y, Li D. 2017. Distribution of Holttumochloa (Poaceae: Bambusoideae) in China with description of a new species revealed by morphological and molecular evidence. Plant Diversity 39:135−39 doi: 10.1016/j.pld.2017.05.001 CrossRef Google Scholar
[13]	Tian X, Wang Q, Zhou Y. 2018. Euphorbia Section Hainanensis (Euphorbiaceae), a New Section Endemic to the Hainan Island of China From Biogeographical, Karyological, and Phenotypical Evidence. Frontiers in Plant Scienc 9:660 doi: 10.3389/fpls.2018.00660 CrossRef Google Scholar
[14]	Wang XH, Li J, Zhang LM, He ZW, Mei QM, et al. 2019. Population Differentiation and Demographic History of the Cycas taiwaniana Complex (Cycadaceae) Endemic to South China as Indicated by DNA Sequences and Microsatellite Markers. Frontiers in Genetics 10:1238 doi: 10.3389/fgene.2019.01238 CrossRef Google Scholar
[15]	Li X, Shen Z, Ma C, Yang L, Duan S, et al. 2023. Teabase: A comprehensive omics database of Camellia. Plant Communications 4:100664 doi: 10.1016/j.xplc.2023.100664 CrossRef Google Scholar
[16]	Jiang H, Long W, Zhang H, Mi C, Zhou T, et al. 2019. Genetic diversity and genetic structure of Decalobanthus boisianus in Hainan Island, China. Ecology and Evolution 9:5362−71 doi: 10.1002/ece3.5127 CrossRef Google Scholar
[17]	Darwin C. 1859. On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life. London: John Murray. https://doi.org/10.5962/bhl.title.87938
[18]	Lussu M, Marignani M, Lai R, Loi MC, Cogoni A, et al. 2020. A Synopsis of Sardinian Studies: Why Is it Important to Work on Island Orchids? Plants 9:853 doi: 10.3390/plants9070853 CrossRef Google Scholar
[19]	Nazir MF, He S, Ahmed H, Sarfraz Z, Jia Y, et al. 2021. Genomic insight into the divergence and adaptive potential of a forgotten landrace G. hirsutum L. purpurascens. Journal of Genetics and Genomics 48:473−84 doi: 10.1016/j.jgg.2021.04.009 CrossRef Google Scholar
[20]	Lynch M, Ackerman MS, Gout JF, Long H, Sung W, et al. 2016. Genetic drift, selection and the evolution of the mutation rate. Nature Reviews Genetics 17:704−14 doi: 10.1038/nrg.2016.104 CrossRef Google Scholar
[21]	Su H, Qu LJ, He K, Zhang Z, Wang J, et al. 2003. The Great Wall of China: a physical barrier to gene flow? Heredity 90:212−19 doi: 10.1038/sj.hdy.6800237 CrossRef Google Scholar
[22]	Wu LX, Xu HY, Jian SG, Gong X, Feng XY. 2022. Geographic factors and climatic fluctuation drive the genetic structure and demographic history of Cycas taiwaniana (Cycadaceae), an endemic endangered species to Hainan Island in China. Ecology and Evolution 12:e9508 doi: 10.1002/ece3.9508 CrossRef Google Scholar
[23]	Wang N, Liang B, Wang J, Yeh CF, Liu Y, et al. 2016. Incipient speciation with gene flow on a continental island: Species delimitation of the Hainan Hwamei (Leucodioptron canorum owstoni, Passeriformes, Aves). Molecular Phylogenetics and Evolution 102:62−73 doi: 10.1016/j.ympev.2016.05.022 CrossRef Google Scholar
[24]	Wang C, Ma X, Ren M, Tang L. 2020. Genetic diversity and population structure in the endangered tree Hopea hainanensis (Dipterocarpaceae) on Hainan Island, China. PLoS One 15:e0241452 doi: 10.1371/journal.pone.0241452 CrossRef Google Scholar
[25]	Gu S, Yan YR, Yi MR, Luo ZS, Wen H, et al. 2022. Genetic pattern and demographic history of cutlassfish (Trichiurus nanhaiensis) in South China Sea by the influence of Pleistocene climatic oscillations. Scientific Reports 12:14716 doi: 10.1038/s41598-022-18861-x CrossRef Google Scholar
[26]	Amos W, Harwood J. 1998. Factors affecting levels of genetic diversity in natural populations. Philosophical Transactions of the Royal Society of London Series B, Biological Sciences 353:177−86 doi: 10.1098/rstb.1998.0200 CrossRef Google Scholar
[27]	Kremen C, Merenlender AM. 2018. Landscapes that work for biodiversity and people. Science 362:eaau6020 doi: 10.1126/science.aau6020 CrossRef Google Scholar
[28]	Goodall-Copestake WP, Tarling GA, Murphy EJ. 2012. On the comparison of population-level estimates of haplotype and nucleotide diversity: a case study using the gene cox1 in animals. Heredity 109:50−6 doi: 10.1038/hdy.2012.12 CrossRef Google Scholar
[29]	Salgotra RK, Chauhan BS. 2023. Genetic diversity, conservation, and utilization of plant genetic resources. Genes 14:174 doi: 10.3390/genes14010174 CrossRef Google Scholar
[30]	Warschefsky E, Penmetsa RV, Cook DR, von Wettberg EJB. 2014. Back to the wilds: tapping evolutionary adaptations for resilient crops through systematic hybridization with crop wild relatives. American Journal of Botany 101:1791−800 doi: 10.3732/ajb.1400116 CrossRef Google Scholar
[31]	Niu S, Song Q, Koiwa H, Qiao D, Zhao D, et al. 2019. Genetic diversity, linkage disequilibrium, and population structure analysis of the tea plant (Camellia sinensis) from an origin center, Guizhou plateau, using genome-wide SNPs developed by genotyping-by-sequencing. BMC Plant Biology 19:328 doi: 10.1186/s12870-019-1917-5 CrossRef Google Scholar
[32]	Chen S, Zhou Y, Chen Y, Gu J. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884−i890 doi: 10.1093/bioinformatics/bty560 CrossRef Google Scholar
[33]	Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754−60 doi: 10.1093/bioinformatics/btp324 CrossRef Google Scholar
[34]	Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, et al. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078−9 doi: 10.1093/bioinformatics/btp352 CrossRef Google Scholar
[35]	McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20:1297−303 doi: 10.1101/gr.107524.110 CrossRef Google Scholar
[36]	Wang K, Li M, Hakonarson H. 2010. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research 38:e164 doi: 10.1093/nar/gkq603 CrossRef Google Scholar
[37]	Lee T-H, Guo H, Wang X, Kim C, Paterson AHJBg. 2014. SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data. BMC Genomics 15:162 doi: 10.1186/1471-2164-15-162 CrossRef Google Scholar
[38]	Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, et al. 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics 81:559−75 doi: 10.1086/519795 CrossRef Google Scholar
[39]	Yang J, Lee SH, Goddard ME, Visscher PM. 2011. GCTA: a tool for genome-wide complex trait analysis. American Journal of Human Genetics 88:76−82 doi: 10.1016/j.ajhg.2010.11.011 CrossRef Google Scholar
[40]	Alexander DH, Novembre J, Lange K. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Research 19:1655−64 doi: 10.1101/gr.094052.109 CrossRef Google Scholar
[41]	Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, et al. 2010. Robust relationship inference in genome-wide association studies. Bioinformatics 26:2867−73 doi: 10.1093/bioinformatics/btq559 CrossRef Google Scholar
[42]	Pickrell JK, Pritchard JK. 2012. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genetics 8:e1002967 doi: 10.1371/journal.pgen.1002967 CrossRef Google Scholar
[43]	Fitak RR. 2021. OptM: estimating the optimal number of migration edges on population trees using Treemix. Biology Methods and Protocols 6:bpab017 doi: 10.1093/biomethods/bpab017 CrossRef Google Scholar
[44]	Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, et al. 2012. Ancient admixture in human history. Genetics 192:1065−93 doi: 10.1534/genetics.112.145037 CrossRef Google Scholar
[45]	Malinsky M, Matschiner M, Svardal H. 2021. Dsuite - Fast D-statistics and related admixture evidence from VCF files. Molecular Ecology Resources 21:584−95 doi: 10.1111/1755-0998.13265 CrossRef Google Scholar
[46]	Schrempf D, Minh BQ, De Maio N, von Haeseler A, Kosiol C. 2016. Reversible polymorphism-aware phylogenetic models and their application to tree inference. Journal of Theoretical Biology 407:362−70 doi: 10.1016/j.jtbi.2016.07.042 CrossRef Google Scholar
[47]	Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution 32:268−74 doi: 10.1093/molbev/msu300 CrossRef Google Scholar

About this article

Cite this article

Guo D, Li D, Wang Z, Li D, Zhou Y, et al. 2024. Genome resequencing reveals an independently originated Camellia sinensis variety – Hainan tea. Agrobiodiversity 1(1): 3−12 doi: 10.48130/abd-0024-0003

Guo D, Li D, Wang Z, Li D, Zhou Y, et al. 2024. Genome resequencing reveals an independently originated Camellia sinensis variety – Hainan tea. Agrobiodiversity 1(1): 3−12 doi: 10.48130/abd-0024-0003

Figures(4) / Tables(3)

Download PDF

Article Metrics

Article views(7026) PDF downloads(1111)

Other Articles By Authors

on this site
on Google Scholar

HTML

Introduction

Tea (Camellia sinensis (L.) O. Kuntze) stands as China’s earliest documented tree crop, boasting a domestication history spanning over 3,000 years. Initially employed as a medicinal herb with roots dating back nearly 5,000 years, it later evolved into a beverage widely embraced for consumption^[1]. On a global scale, cultivated tea plants are classified into two primary groups: C. sinensis var. sinensis (CSS) and C. sinensis var. assamica (CSA)^[2].

Hainan Island, positioned in the northern part of the South China Sea, has a rich history of tea plant cultivation and extensive planting areas. There were reports of the abundant tea plant resources on Hainan Island at the end of the Qing Dynasty. For instance, the American missionary and botanist Benjamin Couch Henry uncovered a significant number of wild tea trees during his extensive exploration of the Li ethnic group area in Hainan, confirming the abundance of ancient tea tree resources on the island^[3]. As the Yunnan-Guizhou Plateau is widely recognized as a potential geographical origin of tea^[4−6], most studies on tea plant population genomics encompass samples from southwestern China, particularly CSA varieties^{[1, 6−8]}, leaving research on tea plants in Hainan Island relatively sparse. The self-incompatibility of tea trees results in high offspring heterozygosity, and the abundant wild tea plant germplasm on the island provides a wealth of genetic variation, laying the groundwork for cultivating new varieties with desirable traits^[7]. Despite Hainan Island’s abundance of tea resources, fully comprehending the genetic resources of tea plants there poses a challenge due to its unique climate and geographical environment. Hence, a genome-wide investigation into the genetic diversity of Hainan tea is imperative for a comprehensive understanding of the genetic resource background of Hainan tea.

It is noteworthy that the tea plant species on the island closely resembles CSA and is referred to as ‘Hainan dayezhong’^[9]. However, evidence is insufficient to conclusively determine whether the Hainan dayezhong belongs to CSA or not. The classification of Hainan tea presents a significant challenge for several reasons: Firstly, C. sinensis plants are prone to hybridization between different species, posing a challenge in accurately classifying various hybrid progenies. Secondly, numerous morphological characteristics of tea plants resemble each other, complicating precise taxonomic delineations^[10]. Lastly, traditional classification of tea plants primarily relies on morphological characteristics, which may sometimes conflict with the latest molecular-based classification results^[8]. Despite Hainan tea’s identification by the National Crop Variety Approval Committee in 1985, its taxonomic status within the Camellia genus on Hainan Island remains unclear due to the absence of support from modern genomic research data.

Islands, as an ideal system for studying the effects of geographical isolation and long-distance diffusion, offer valuable insights into species evolution, encompassing phenomena such as adaptive radiation and speciation^[11]. Previous studies have documented the discovery of several new plant species on Hainan Island, including Holttumochloa^[12], Euphorbia^[13], Cycadaceae^[14], among others. Moreover, advancements in whole-genome resequencing technology have confirmed the independent evolutionary histories and parallel domestication processes of CSS and CSA^{[7, 8]}. Building upon these findings, our hypothesis suggests that tea trees on Hainan Island may constitute a distinct species separated from CSS and CSA, and that Hainan tea has undergone an independent evolutionary trajectory on the island.

To furnish molecular evidence regarding the genomic divergence and relationship of Hainan tea with CSS and CSA, and to elucidate the genetic background of Hainan tea on Hainan Island, we procured 500 samples of Hainan tea from the Baisha, Qiongzhong, Wuzhishan, and Ledong regions of Hainan Province, China. Employing whole-genome resequencing technology, we identified SNPs in the Hainan tea samples and constructed a phylogenetic tree that included both cultivated tea and Hainan tea, utilizing the Yunkang 10 as the reference genome. Subsequently, detailed analyses of population structure and kinship relationships were conducted to offer a comprehensive understanding of the population structure and genetic diversity of Hainan tea and to unveil the phylogenetic relationships between Hainan tea and global Camellia sinensis varieties. This study furnished robust genomic data support and further corroborated the independent status of Hainan tea within the taxonomy of Camellia sinensis. Concurrently, these findings furnished a crucial scientific foundation for the conservation of tea germplasm resources and molecular breeding on Hainan Island. Furthermore, the research methodologies and techniques employed herein hold the potential to provide valuable insights into the origin and domestication analyses of other species on Hainan Island, as well as for the investigation of genetic diversity.

Discussion

Although Hainan Island is rich in wild tea tree resources and possesses vast plantation areas of rainforest tea trees, tea tree resources have not yet been comprehensively investigated and fully developed. In this study, we selected a large number of ancient tea tree samples from the rainforest area, and analyzed them by whole genome resequencing, obtaining 32,334,340 SNPs. This dataset is the most extensive resequencing dataset of Hainan tea samples reported so far.

The classification of Camellia species basing on traditional taxonomy is very challenging^[8], and Hainan dayezhong, as a unique Camellia species in Hainan, lacks the support of genomics data so far, and its status in taxonomy is always unknown so that it is often not yet a CSA. We analyzed the population relationship between Hainan tea and globally cultivated tea trees based on resequencing data to clarify the status of Hainan tea in Camellia from a genomic perspective. By constructing a phylogenetic tree between Hainan tea and globally cultivated tea trees, it can be observed that Hainan tea does not belong to either CSS or CSA, but rather forms an independent branch and clusters into a single taxon. It is important to note that in this cluster of Hainan tea, the samples from the LMS group formed distinct geographic subgroups, whereas the samples from the OLMS group did not appear to be geographically clustered (Fig. 2). This may be attributed to the fact that samples from the LMS population were collected in the Limu Mountain Rainforest Reserve, which is relatively undisturbed by human activities. In contrast, other areas have more human activities, which may lead to the mixing of genetic backgrounds of Hainan tea in multiple regions^[16].

Although the Wuzhishan region is located in a tropical rainforest reserve, according to the Qiongzhong County Record, the state actively promoted tea planting in the region in the mid-1990s and introduced CSA varieties for breeding and cultivation. Therefore, the samples from the Wuzhishan region did not show obvious geographical clustering (Fig. 2). Additional results of population structure and principal component analysis further confirmed this observation. The population structure analysis revealed that Hainan tea has an independent genetic background, whereas LMS differs from OLMS in genetic background. It is particularly noteworthy that, except for LMS, OLMS presented a mixture of genetic backgrounds, which coincided with the results of phylogenetic trees (Figs 2, 3a). The results of principal component analysis also clearly showed the independent group status of Hainan tea with CSS and CSA (Fig. 3b; Supplemental Fig. S5). Despite the presence of several Hainan tea samples in the global Assamica1 cluster, this is consistent with the historical context of the introduction of CSA from Yunnan in the mid-1990s.

Geographic isolation is one of the main causes of species formation^[17]. When populations of the same breeding stock separate, they face independent evolutionary histories defined by natural selection, genetic drift, adaptation, and colonization to local conditions^[18]. Hainan, as a tropical island, has extensive rainforests that provide high-quality growing environments for plants, and the island’s geography provides the necessary geographic isolation for new species to arise. The results of the population structure analysis, which incorporated data from additional Camellia plants, clearly indicated that Hainan tea possesses a distinct genetic background compared to other Camellia species. Moreover, Hainan tea clustered closer to CSS and CSA in the principal component analysis while remaining distant from other Camellia (Fig. 3a, b). Therefore, we cautiously proposed that Hainan tea represents a novel variety of Camellia sinensis distinct from CSS and CSA. Notably, samples from the LMS region form a distinct subgroup cluster in the phylogenetic tree depicted in Fig. 2 and demonstrated an independent genetic component in the population structure analysis, akin to the scenario observed with G. hirsutum L. purpurascens on Hainan Island^[19]. Thus, it was deduced that Hainan tea from the LMS region constitutes a unique endemic variety within the Hainan tea species.

Genetic drift is one of the important mechanisms for maintaining genetic diversity among biological populations. High levels of genetic drift help to reduce genetic differences and increase the homogeneity between two populations^[20]. However, when physical barriers prevent genetic drift, different populations may form or experience physical isolation that prevents the exchange of genetic materials. These physical barriers are usually, although not always, caused by natural factors^[21].

Hainan Island, once connected to the mainland, has undergone a long period of rotation and movement, rotating counter-clockwise from its original position in the Beibu Gulf to its current position. The initial separation occurred in the Paleocene (ca. 65 Mya), while the major part of the rotational drift occurred in the Eocene^[22]. During the Quaternary, ice ages and interglacial periods alternated, the most recent major ice age occurring about 15 Kya ago. The onset of the Ice Age led to a drop in global temperatures and a steady decline in global sea levels, which led to the formation of natural land bridges between sea islands and continents. During the ~8,000-year-long Ice Age (15 Kya-7 Kya ago), genetic exchange of species between Hainan Island and neighboring continents may have occurred. For example, a literature survey study found the existence of gene flow between Hainan’s native Painted Lady and the Chinese Painted Lady in South China^[23]. However, the cold global climate during the Ice Age reduced the population size of the species, especially for the cold-intolerant CSA, and the likelihood of genetic exchange diminished^[24]. After the Ice Age ended, the rise of the sea level led to the emergence of Qiongzhong Strait, which switched Hainan Island once again to the island mode. This geological event may have hindered genetic exchange between Hainan Island tea trees and those on the mainland, leading to their gradual and independent evolution in response to the tropical island climate. As a result, a new variant emerged, possibly falling under the categorization of Camellia sinensis^[25].

Considering the potential existence of land bridges facilitating gene flow between Hainan tea and mainland tea plants, we intensively investigated the gene flow between Hainan tea and cultivated tea. First, we performed f3 statistical analysis (Supplemental Table S4) and found that the genetic relationship between LMS and OLMS was closer comparing to cultivated tea, which is consistent with the results in Fig. 2 and 3a. Especially noteworthy is that Hainan tea was closer to Sinensis comparing to Assamica. The results of the Treemix analysis visually demonstrated how geographic isolation significantly impeded gene flow between cultivated and Hainan teas (Fig. 3c). In addition, the Dsuite program was applied to perform ABBA-BABA analysis, and this result further supported our view (Supplemental Table S5). These findings strongly suggested that the geographic separation of Hainan tea has prevented the exchange of genetic material between it and cultivated tea, thus contributing to its possible independent evolution as a new variant of Camellia sinensis.

Groups that are highly segregated and lack genetic drift are usually prone to inbreeding^[26]. However, the current analyses showed (Fig. 4a) that Hainan tea do not show excessive kinship among each other, and the concentration of samples with high kinship was overwhelmingly from samples from the WDB group (Supplemental Table S6), a group whose tea trees came from an artificially managed tea plantation. This phenomenon may be caused by anthropogenic factors. Genetic diversity, species diversity, and ecosystem diversity are the three pillars of biodiversity. Tea plants are typically propagated asexually via cuttings. If individuals propagated through this method are presented in the study samples, a significant portion of sample pairs will exhibit an affinity coefficient exceeding 0.354. Nevertheless, the present findings do not corroborate this hypothesis (Fig. 4a). Based on the principles of population genetics, the conservation of biodiversity is ultimately the conservation of genetic diversity^[27]. Nucleotide diversity is an important indicator for assessing the diversity of DNA sequences in a species or population^[28]. The processes of domestication and breeding have reduced the genetic diversity of crops, and the widespread cultivation of monoculture crop varieties has led to an increase in genetic vulnerability^[29,30]. Wild ancient tea trees, as a precious natural resource with high genetic diversity, are of great value for the study of the evolutionary mechanisms and diversity of the tea trees^[31]. Interestingly, the Hainan tea and LMS have higher genetic diversity than CSS and CSA (Fig. 4c), even though Hainan tea is affected by geographic isolation, resulting in restricted gene flow (Fig. 3c). This can be partially attributed to the unique climatic conditions of the tropical island, which are very favorable for the growth of tea trees. Combined with minimal anthropogenic disturbance, this has resulted in less natural pressure on tea tree population expansion, thus helping to maintain genetic diversity^[16]. Furthermore, the genetic relationships between tea plants in Hainan and LMS were closer to those of Sinensis than to Assamica, and the genetic relationships between tea trees in Hainan and LMS were closer to each other (Fig. 4b; Table 3). This is consistent with the results obtained in the f3 statistical analysis (Supplemental Table S4), suggesting that Hainan tea and Assamica taxa diverged earlier than Hainan tea and Sinensis taxa.

In summary, the whole-genome resequencing of 500 Hainan tea samples from major tea-producing regions of Hainan Island was performed in this study, and 32,334,340 SNPs were successfully identified. The results of this study strongly support the existence of Hainan tea as a new variant of Camellia sinensis, which is genetically distinct from CSS and CSA, and also reveal the existence of Hainan tea in the LMS region as an independently evolved local variety. Although Hainan tea did not show significant gene flow between Hainan tea and cultivated tea trees due to the geographic barrier of the strait, it still maintained high genetic diversity, which manifested itself in high π values. The results of this study help to clarify the position of Hainan tea in the taxonomy of Camellia sinensis from a genomic perspective. Additionally, they provide reliable data support for an in-depth understanding of the genetic background and diversity of Hainan tea on the island. Furthermore, they offer an important scientific basis for the conservation of tea germplasm resources and molecular breeding on Hainan Island. In addition, our research methods and techniques can also provide lessons and references for the analyses of the origin and domestication of other species on Hainan Island, as well as for genetic diversity studies.

Materials and methods

Sample collection

Systematically, 500 samples of Hainan tea from Hainan Province, China were collected. These included Jianfengling (Ledong, 28 samples), Limu Mountain (Qiongzhong, 160 samples), Gaofeng Fangtong Village (Nankai, Baisha, 41 samples), Miao Village Junior Class (Nankai, Baisha, 53 samples), Mengya Village (Nankai, Baisha, 24 samples), Shifu Village Junior Class (Nankai, Baisha, 26 samples), Yaxing (Nankai, Baisha, 15 samples), Junmin Village (Nansheng, Wuzhishan, 13 samples), Maoxiang Village (Nansheng, Wuzhishan, 32 samples), Baimaling (Qiongzhong, 13 samples), Fanglong Village (Shuiman, Wuzhishan, 14 samples), Maona Village (Shuiman, Wuzhishan, 47 samples), and Shuiman Village (Shuiman, Wuzhishan, 34 samples). Additionally, the teabase database^[15] and data from various Camellia species in the genome sequence archive with project number PRJCA001158 from Genome Sequence Archive^[8] were utilized for analysis. Notably, the KM6 strain (Cuspidata Camellia) was selected as an outgroup for our study and subsequent analyses. Detailed information on the study samples can be found in Supplemental Tables S1, S3, and Fig. 1.

DNA sample preparation and sequencing
Five hundred tea accessions were acquired exclusively from Hainan province in China. Young leaves were harvested from these plants and rapidly frozen in liquid nitrogen. Total DNA extraction was performed using the DNAsecure plant kit (Tiangen, Beijing). Subsequently, 2 µg of genomic DNA from each accession was utilized to prepare sequencing libraries according to the manufacturer’s protocol using the NEBNext Ultra DNA Library Prep Kit (NEB Inc., America). Sequencing was carried out on an Illumina NovaSeq 6000 sequencer, generating paired-end sequencing libraries with an approximate insert size of 400 bp.

Quality control and filtering
The paired-end resequencing reads underwent filtering utilizing fastp (Version: 0.12.2)^[32]. This process eliminated reads containing adapter sequences or poly-N sequences, as well as low-quality reads (defined as reads with more than 40% bases having Phred quality scores ≤ 20) from the raw data. The outcome of this step was the production of clean data, which were then utilized for subsequent downstream analyses.

Variation calling and annotation
The paired-end resequencing reads were aligned to our tea reference genome using BWA (Version: 0.7.17-r1188)^[33], employing default parameters. The mapping results were converted into the BAM format and unmapped as well as non-unique reads were filtered using SAMtools (Version: 1.3.1)^[34]. Additionally, duplicated reads were removed using the Picard package (picard.sourceforge.net, Version: 2.1.1).

Following BWA alignment, we performed realignment of reads around indels using GATK in a two-step process. Initially, the RealignerTargetCreator package was utilized to identify regions necessitating realignment. Subsequently, the identified regions were realigned using IndelRealigner, resulting in a realigned BAM file for each accession. Variant detection was conducted following the recommended best practice workflow by GATK^[35]. Specifically, variants were called for each accession using the GATK HaplotypeCaller^[35]. A joint genotyping step was carried out to merge variations comprehensively from the gVCF files. During the filtering step, the SNP filter expression was set as ‘QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 5.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0 || QUAL < 30’. SNPs that were not bi-allelic were excluded, resulting in the creation of the basic set. Subsequently, SNPs with more than 20% missing calls and MAF less than 0.05 were further eliminated to generate the core set, which was used for phylogenetic tree construction, PCA, and population structure analysis.

SNPs were annotated according to the tea genome using the ANNOVAR package (Version: 2015-12-14)^[36]. Based on the genome annotation, SNPs were classified into various genomic regions, including exonic regions (overlapping with coding exons), splicing sites (within 2 bp of a splicing junction), 5' UTRs, 3' UTRs, intronic regions (overlapping with introns), upstream and downstream regions (within a 1 kb region upstream or downstream from the transcription start site), and intergenic regions. SNPs located in coding exons were further categorized into synonymous SNPs (which did not cause amino acid changes), nonsynonymous SNPs (which caused amino acid changes), stop gain mutations (mutations resulting in the gain of a stop codon), and stop-loss mutations (mutations resulting in the loss of a stop codon). Indels within exonic regions were classified based on whether they caused frame-shift mutations (3 bp insertion or deletion) and whether they resulted in the gain or loss of a stop codon.

Population genetics analysis
Whole-genome SNPs were utilized to construct the maximum likelihood (ML) phylogenetic tree with 100 bootstrap replicates using SNPhylo (Version: 20140701)^[37]. Camellia cuspidata (KM6) served as an outgroup to provide corresponding positional information. The phylogenetic tree was visualized and color-coded using iTOL (http://itol.embl.de).

Chromosomal SNPs were filtered by removing SNPs in linkage disequilibrium with PLINK (Version v1.90b3.38)^[38] , employing a window size of 50 SNPs (advancing 1 SNP at a time) and an r² threshold of 0.5. Principal component analysis was conducted using Genome-wide Complex Trait Analysis (GCTA, version: 1.25.3) software^[39] , and the first three eigenvectors were plotted. Population structure analysis was performed using the ADMIXTURE program (Version: 1.3)^[40] with a block-relaxation algorithm. The number of genetic clusters (K) was predefined from 2 to 9, and the cross-validation error (CV) procedure was run to explore convergence of individuals. Default methods and settings were applied in all analyses.

Relationship inference
The relationship between each accession was examined using KING (Version: 2.2.5)^[41] , utilizing the basic set SNPs with the option ‘--kinship’. This option employed the KING-Robust algorithm to estimate pair-wise kinship coefficients. Close relatives were reliably inferred based on the estimated kinship coefficients using the following simple algorithm: an estimated kinship coefficient range greater than 0.354 indicates a duplicate relationship, while ranges of [0.177, 0.354], [0.0884, 0.177], and [0.0442, 0.0884] correspond to 1^st-degree, 2^nd-degree, and 3^rd-degree relationships, respectively.

Genetic variation and F_ST calculations
The calculation of average pairwise diversity within each population (π) was conducted using 100 kb sliding windows. Population differentiation (F_ST) was assessed through pairwise F_ST comparisons among populations.

Gene flow analysis
Admixture graphs of geographically defined Hainan tea populations were inferred using TreeMix^[42], employing a Maximum Likelihood (ML) approach based on a Gaussian model of allele frequency change. The topology of the ML trees varies depending on the number of migration events (m) permitted in the model, ranging from m = 0 to m = 5. Bootstrap values on the tree were derived from 1,000 replicates. Admixture events among different tea populations were indicated by arrows on the graph, with KM6 serving as the root. To ensure robustness, each migration event was iterated 10 times with a random seed. The optimal number of migration edges was determined using the R package ‘OptM’ (Version: v0.1.6)^[43].

f3 and Patterson’s D statistics
The f3 statistics were computed using the R package ‘admixr’ (Version: 0.9.1)^[44] for all conceivable combinations of tea groups, with KM6 serving as the outgroup. SNPs exhibiting missing data and monomorphism were excluded from the analysis.

To assess the presence of introgression signals among tea groups, Patterson’s D (also known as the ABBA-BABA test) and f4 admixture ratio statistics for all possible trios of tea groups were calculated using Dtrios in Dsuite (Version: 0.4 r42)^[45], with KM6 designated as the outgroup. SNPs with missing data and monomorphism were removed from consideration.

To investigate the species-level relationships among tea groups, we explored the backbone of the phylogeny using the PoMo model^[46] within IQ-Tree^[47]. This analysis included 1,000 bootstrap replicates, employing the ultrafast bootstrap approximation method. The tree was rooted using KM6 as the outgroup.

Author contributions

The authors confirm contribution to the paper as follows: study conception and design: Duan S; draft manuscript preparation: Guo D, Li D; manuscript revision and editing: Huang Y, Duan S; tea samples collection: Wang Z, Li D, Zhou Y, Xiang G, Zhang W, Wang W, Fang Z, Hao T, Zheng D, Lei Y, Yang L, Zhang W, Tang S, Zheng L, Cao Y. All authors reviewed the results and approved the final version of the manuscript.

Variants	Type	Core set
SNP	Total	32,334,340
	Intergenic	29,520,274
	Intronic	1,566,641
	Exonic	433,604
	5' UTR	40,383
	3' UTR	92,101
	UTR5;UTR3	229
	Upstream	326,710
	Downstream	341,541
	Upstream;downstream	7,803
	Splicing	4,838
	Exonic;splicing	216

Variants	Type	Core set
SNP	Total (exonic + exonic;splicing)	433,820
	Nonsynonymous	243,634
	Synonymous	179,902
	Nonsyn/Syn ratio	1.35
	Stop-gain	9,699
	Stop-loss	518
	Unknown	67

F_ST	Assamica1	Assamica2	OLMS	LMS	Sinensis
Assamica1		0.236	0.239	0.236	0.321
Assamica2			0.281	0.282	0.328
OLMS				0.036	0.209
LMS					0.212
Sinensis

{{lists.name}}

Genome resequencing reveals an independently originated Camellia sinensis variety – Hainan tea