The Asian lotus (<i>Nelumbo nucifera</i>) pan-plastome: diversity and divergence in a living fossil grown for seed, rhizome, and aesthetics

Jie Wang; Xuezhu Liao; Cuihua Gu; Kunli Xiang; Jie Wang; Sen Li; Luke R. Tembrock; Zhiqiang Wu; Wenchuang He; Jie Wang; Xuezhu Liao; Cuihua Gu; Kunli Xiang; Jie Wang; Sen Li; Luke R. Tembrock; Zhiqiang Wu; Wenchuang He

doi:10.48130/OPR-2022-0002

2022 Volume 2

Article Contents

Next Previous

ARTICLE Open Access

The Asian lotus (Nelumbo nucifera) pan-plastome: diversity and divergence in a living fossil grown for seed, rhizome, and aesthetics

1.
College of Landscape and Architecture, Zhejiang Provincial Key Laboratory of Germplasm Innovation and Utilization for Garden Plants, Key Laboratory of National Forestry and Grassland Administration on Germplasm Innovation and Utilization for Southern Garden Plants, Zhejiang Agriculture & Forestry University, Hangzhou 311300, China
2.
Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
3.
College of Horticulture, Shanxi Agricultural University, Shanxi 030801, China
4.
Department of Agricultural Biology, Colorado State University, Fort Collins, CO 80523, USA
^# These authors contributed equally: Jie Wang, Xuezhu Liao, Cuihua Gu

More Information

Corresponding authors: tembrock@colostate.edu; wuzhiqiang@caas.cn; hewenchuang@caas.cn

Received: 15 November 2021
Accepted: 04 January 2022
Published online: 20 January 2022
Ornamental Plant Research 2, Article number: 2 (2022) | Cite this article

Abstract

The Asian lotus (Nelumbo nucifera) has a history of cultivation in Asia dating back over 3,000 years where it has been an important food crop producing edible rhizomes and seeds as well as flowers of great aesthetic and cultural value. Here, we de novo assembled the plastomes of 316 lotus accessions including five North American lotus (N. lutea) and 311 Asian lotus (N. nucifera) to construct a pan-plastome genome map, and investigate the phylogeography and genetic diversity among the only two extant species within this living fossil lineage. A total of 113 unique genes were annotated and plastome sizes varied between 163,457 and 163,672 bp with only minor differences in each of the four major genomic units. The most abundant nucleotide differences among plastomes were single nucleotide variants followed by insertions/deletions and block substitutions mainly found in intergenic spacer regions of the large single copy portion of the plastome. Seven well-supported genetic clusters were resolved using multiple different population structure analyses. The different lotus types (flower, seed, rhizome, or wild) were disproportionally assigned to multiple different genetic clusters. This pattern indicates that the domestication of Asian lotus involved multiple genetic origins and possible matrilineal introgression. Geographic mapping of accessions also revealed that genetic diversity is unevenly distributed with eastern China possessing the highest genetic diversity and regions such as Yunnan, Indonesian, and Thailand possessing unique haplotypes. These results provide an important maternal history of Nelumbo and necessary groundwork for future studies on intergenomic gene transfer, cytonuclear incompatibility, and conservation genetics.
- Phylogeography,
- Sacred lotus,
- North American lotus,
- Centers of origin,
- Aquatic plants,
- Domestication

Supplementary information

Supplemental Table S1 Sampling Information of 316 lotus accessions.
Supplemental Table S2 Plastome size and GC content of different types of Asian lotus.
Supplemental Table S3 Functional groups of CDS with variant events.
Supplemental Table S4 Genetic clusters and haplotypes of lotus accessions.
Supplemental Fig. S1 An example of some species-specific variants among the pan-plastome.
Supplemental Fig. S2 Liner ML tree of 316 accessions.
Supplemental Fig. S3 Liner BI tree of 316 accessions. Genetic clusters were colored as ML.
Supplemental Fig. S4 Geographical distribution of wild Asian lotuses from different haplotypes. Blue, red, and green dotted boxes represented three distributive regions, namely, blue: northeastern China and North Korea; red: east central China; green: southern China and several Southeast Asian countries including India, Thailand, Indonesia, and Singapore.
Supplemental Annotation.txt Annotation of plastome sequence of LA001 (Nelumbo lutea) in this study.

Rights and permissions
Copyright: © 2022 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Li H, Yi T, Gao L, Ma P, Zhang T, et al. 2019. Origin of angiosperms and the puzzle of the Jurassic gap. Nature Plants 5:461−70 doi: 10.1038/s41477-019-0421-0 CrossRef Google Scholar
[2]	Li Y, Svetlana P, Yao J, Li C. 2014. A review on the taxonomic, evolutionary and phytogeographic studies of the lotus plant (Nelumbonaceae: Nelumbo). Acta Geologica Sinica 88:1252−61 doi: 10.1111/1755-6724.12287 CrossRef Google Scholar
[3]	Zhang Y, Lu X, Zeng S, Huang X, Guo Z, et al. 2015. Nutritional composition, physiological functions and processing of lotus (Nelumbo nucifera Gaertn.) seeds: A review. Phytochemistry Reviews 14:321−34 doi: 10.1007/s11101-015-9401-9 CrossRef Google Scholar
[4]	Zheng T, Li P, Li L, Zhang Q. 2021. Research advances in and prospects of ornamental plant genomics. Horticulture Research 8:65 doi: 10.1038/s41438-021-00499-x CrossRef Google Scholar
[5]	Xue J, Dong W, Cheng T, Zhou S. 2012. Nelumbonaceae: Systematic position and species diversification revealed by the complete chloroplast genome. Journal of Systematics and Evolution 50:477−87 doi: 10.1111/j.1759-6831.2012.00224.x CrossRef Google Scholar
[6]	Wu Z, Gui S, Quan Z, Pan L, Wang S, et al. 2014. A precise chloroplast genome of Nelumbo nucifera (Nelumbonaceae) evaluated with Sanger, Illumina MiSeq, and PacBio RS II sequencing platforms: Insight into the plastid evolution of basal eudicots. BMC Plant Biology 14:289 doi: 10.1186/s12870-014-0289-0 CrossRef Google Scholar
[7]	Shi T, Rahmani RS, Gugger PF, Wang M, Li H, et al. 2020. Distinct expression and methylation patterns for genes with different fates following a single whole-genome duplication in flowering plants. Molecular Biology and Evolution 37:2394−413 doi: 10.1093/molbev/msaa105 CrossRef Google Scholar
[8]	Guo HB, Li SM, Peng J, Ke WD. 2007. Genetic diversity of Nelumbo accessions revealed by RAPD. Genetic Resources and Crop Evolution 54:741−48 doi: 10.1007/s10722-006-0025-1 CrossRef Google Scholar
[9]	Chen Y, Zhou R, Lin X, Wu K, Qian X, et al. 2008. ISSR analysis of genetic diversity in sacred lotus cultivars. Aquatic Botany 89:311−16 doi: 10.1016/j.aquabot.2008.03.006 CrossRef Google Scholar
[10]	Hu J, Pan L, Liu H, Wang S, Wu Z, et al. 2012. Comparative analysis of genetic diversity in sacred lotus (Nelumbo nucifera Gaertn.) using AFLP and SSR markers. Molecular Biology Reports 39:3637−47 doi: 10.1007/s11033-011-1138-y CrossRef Google Scholar
[11]	Yang M, Xu L, Liu Y, Yang P. 2015. RNA-seq uncovers SNPs and alternative splicing events in Asian lotus (Nelumbo nucifera). PLoS One 10:e0125702 doi: 10.1371/journal.pone.0125702 CrossRef Google Scholar
[12]	Huang L, Yang M, Li L, Li H, Yang D, et al. 2018. Whole genome re-sequencing reveals evolutionary patterns of sacred lotus (Nelumbo nucifera). Journal of Integrative Plant Biology 60:2−15 doi: 10.1111/jipb.12606 CrossRef Google Scholar
[13]	Li Y, Zhu F, Zheng X, Hu M, Dong C, et al. 2020. Comparative population genomics reveals genetic divergence and selection in lotus, Nelumbo nucifera. BMC Genomics 21:146 doi: 10.1186/s12864-019-6376-8 CrossRef Google Scholar
[14]	Liu Z, Zhu H, Zhou J, Jiang S, Wang Y, et al. 2020. Resequencing of 296 cultivated and wild lotus accessions unravels its evolution and breeding history. The Plant Journal 104:1673−84 doi: 10.1111/tpj.15029 CrossRef Google Scholar
[15]	Fang K, Xia Z, Li H, Jiang X, Qin D, et al. 2021. Genome-wide association analysis identified molecular markers associated with important tea flavor-related metabolites. Horticulture Research 8:42 doi: 10.1038/s41438-021-00477-3 CrossRef Google Scholar
[16]	Wu Z, Liao X, Zhang X, Tembrock LR, Broz A. 2020. Genomic architectural variation of plant mitochondria—A review of multichromosomal structuring. Journal of Systematics and Evolution 60:160−68 doi: 10.1111/jse.12655 CrossRef Google Scholar
[17]	Biersma EM, Torres-Díaz C, Molina-Montenegro MA, Newsham KK, Vidal MA, et al. 2020. Multiple late-Pleistocene colonisation events of the Antarctic pearlwort Colobanthus quitensis (Caryophyllaceae) reveal the recent arrival of native Antarctic vascular flora. Journal of Biogeography 47:1663−73 doi: 10.1111/jbi.13843 CrossRef Google Scholar
[18]	Peters RS, Meusemann K, Petersen M, Mayer C, Wilbrandt J, et al. 2014. The evolutionary history of holometabolous insects inferred from transcriptome-based phylogeny and comprehensive morphological data. BMC Evolutionary Biology 14:52 doi: 10.1186/1471-2148-14-52 CrossRef Google Scholar
[19]	Kirschner P, Arthofer W, Pfeifenberger S, Záveská E, Schönswetter P, et al. 2021. Performance comparison of two reduced-representation based genome-wide marker-discovery strategies in a multi-taxon phylogeographic framework. Scientific Reports 11:3978 doi: 10.1038/s41598-020-79778-x CrossRef Google Scholar
[20]	Liu Y, Du H, Li P, Shen Y, Peng H, et al. 2020. Pan-genome of wild and cultivated soybeans. Cell 182:162−76.E13 doi: 10.1016/j.cell.2020.05.023 CrossRef Google Scholar
[21]	Tao Y, Luo H, Xu J, Cruickshank A, Zhao X, et al. 2021. Extensive variation within the pan-genome of cultivated and wild sorghum. Nature Plants 7:766−73 doi: 10.1038/s41477-021-00925-x CrossRef Google Scholar
[22]	Song J, Guan Z, Hu J, Guo C, Yang Z, et al. 2020. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nature Plants 6:34−45 doi: 10.1038/s41477-019-0577-7 CrossRef Google Scholar
[23]	Magdy M, Ou L, Yu H, Chen R, Zhou Y, et al. 2019. Pan-plastome approach empowers the assessment of genetic variation in cultivatedCapsicum species. Horticulture Research 6:108 doi: 10.1038/s41438-019-0191-x CrossRef Google Scholar
[24]	Wu Z, Gu C, Tembrock LR, Zhang D, Ge S. 2017. Characterization of the whole chloroplast genome of Chikusichloa mutica and its comparison with other rice tribe (Oryzeae) species. PLoS One 12:e0177553 doi: 10.1371/journal.pone.0177553 CrossRef Google Scholar
[25]	Wicke S, Schneeweiss GM, dePamphilis CW, Müller KF, Quandt D. 2011. The evolution of the plastid chromosome in land plants: gene content, gene order, gene function. Plant Molecular Biology 76:273−97 doi: 10.1007/s11103-011-9762-4 CrossRef Google Scholar
[26]	Cauz-Santos LA, da Costa ZP, Callot C, Cauet S, Zucchi MI, et al. 2020. A repertory of rearrangements and the loss of an inverted repeat region in Passiflora chloroplast genomes. Genome Biology and Evolution 12:1841−57 doi: 10.1093/gbe/evaa155 CrossRef Google Scholar
[27]	Olmstead RG, Kim KJ, Jansen RK, Wagstaff SJ. 2000. The phylogeny of the Asteridae sensu lato based on chloroplast ndhF gene sequences. Molecular Phylogenetics and Evolution 16:96−112 doi: 10.1006/mpev.1999.0769 CrossRef Google Scholar
[28]	Malinova I, Zupok A, Massouh A, Schöttler MA, Meyer EH, et al. 2021. Correction of frameshift mutations in the atpB gene by translational recoding in chloroplasts of Oenothera and tobacco. The Plant Cell 33:1682−705 doi: 10.1093/plcell/koab050 CrossRef Google Scholar
[29]	Wu Z, Ge S. 2012. The phylogeny of the BEP clade in grasses revisited: evidence from the whole-genome sequences of chloroplasts. Molecular Phylogenetics and Evolution 62:573−78 doi: 10.1016/j.ympev.2011.10.019 CrossRef Google Scholar
[30]	Gu C, Tembrock LR, Johnson NG, Simmons MP, Wu Z. 2016. The complete plastid genome of Lagerstroemia fauriei and loss of rpl2 intron from Lagerstroemia (Lythraceae). PLoS One 11:e0150752 doi: 10.1371/journal.pone.0150752 CrossRef Google Scholar
[31]	Zhou J, Zhang S, Wang J, Shen H, Ai B, et al. 2021. Chloroplast genomes in Populus (salicaceae): Comparisons from an intensively sampled genus reveal dynamic patterns of evolution. Scientific Reports 11:9471 doi: 10.1038/s41598-021-88160-4 CrossRef Google Scholar
[32]	Andreu Sánchez S, Chen W, Stiller J, Zhang G. 2021. Multiple origins of a frameshift insertion in a mitochondrial gene in birds and turtles. GigaScience 10:giaa161 doi: 10.1093/gigascience/giaa161 CrossRef Google Scholar
[33]	Avise JC. 2004. Molecular markers, natural history, and evolution (2nd edition). In The Auk, ed. Lovette IJ. 121:684. Sinauer Associates, Sunderland, Massachusetts. pp. 1298–99 https://doi.org/10.1093/auk/121.4.1298
[34]	Wang Z, Jiang Y, Bi H, Lu Z, Ma Y, et al. 2021. Hybrid speciation via inheritance of alternate alleles of parental isolating genes. Molecular Plant 14:208−22 doi: 10.1016/j.molp.2020.11.008 CrossRef Google Scholar
[35]	Guo C, Guo Z, Li D. 2019. Phylogenomic analyses reveal intractable evolutionary history of a temperate bamboo genus (Poaceae: Bambusoideae). Plant Diversity 41:213−19 doi: 10.1016/j.pld.2019.05.003 CrossRef Google Scholar
[36]	Choi JY, Purugganan MD. 2018. Multiple origin but single domestication led to Oryza sativa. G3 Genes\|Genomes\|Genetics 8:797−803 doi: 10.1534/g3.117.300334 CrossRef Google Scholar
[37]	He W, Chen C, Xiang K, Wang J, Zheng P, et al. 2021. The history and diversity of rice domestication as resolved from 1464 complete plastid genomes. Frontiers in Plant Science 12:781793 doi: 10.3389/fpls.2021.781793 CrossRef Google Scholar
[38]	Huang Y, Wang J, Yang Y, Fan C, Chen J. 2017. Phylogenomic analysis and dynamic evolution of chloroplast genomes in Salicaceae. Frontiers in Plant Science 8:1050 doi: 10.3389/fpls.2017.01050 CrossRef Google Scholar
[39]	Scossa F, Fernie AR. 2021. When a crop goes back to the wild: Feralization. Trends in Plant Science 26:543−45 doi: 10.1016/j.tplants.2021.02.002 CrossRef Google Scholar
[40]	Hall R, van Hattum MWA, Spakman W. 2008. Impact of India–Asia collision on SE Asia: The record in Borneo. Tectonophysics 451:366−89 doi: 10.1016/j.tecto.2007.11.058 CrossRef Google Scholar
[41]	Royer AM, Waite-Himmelwright J, Smith CI. 2020. Strong selection against early generation hybrids in joshua tree hybrid zone not explained by pollinators alone. Frontiers in Plant Science 11:640 doi: 10.3389/fpls.2020.00640 CrossRef Google Scholar
[42]	Hübner S, Bercovich N, Todesco M, Mandel JR, Odenheimer J, et al. 2019. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nature Plants 5:54−62 doi: 10.1038/s41477-018-0329-0 CrossRef Google Scholar
[43]	Keeling PJ, Palmer JD. 2008. Horizontal gene transfer in eukaryotic evolution. Nature Reviews Genetics 9:605−18 doi: 10.1038/nrg2386 CrossRef Google Scholar
[44]	Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, et al. 2012. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology 19:455−77 doi: 10.1089/cmb.2012.0021 CrossRef Google Scholar
[45]	Wick RR, Schultz MB, Zobel J, Holt KE. 2015. Bandage: Interactive visualization of de novo genome assemblies. Bioinformatics 31:3350−52 doi: 10.1093/bioinformatics/btv383 CrossRef Google Scholar
[46]	Shen W, Le S, Li Y, Hu F. 2016. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One 11:e0163962 doi: 10.1371/journal.pone.0163962 CrossRef Google Scholar
[47]	Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754−60 doi: 10.1093/bioinformatics/btp324 CrossRef Google Scholar
[48]	Ye J, McGinnis S, Madden TL. 2006. BLAST: Improvements for better sequence analysis. Nucleic Acids Research 34:W6−W9 doi: 10.1093/nar/gkl164 CrossRef Google Scholar
[49]	Lehwark P, Greiner S. 2019. GB2sequin - A file converter preparing custom GenBank files for database submission. Genomics 111:759−61 doi: 10.1016/j.ygeno.2018.05.003 CrossRef Google Scholar
[50]	Katoh K, Rozewicki J, Yamada KD. 2019. MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization. Briefings in Bioinformatics 20:1160−66 doi: 10.1093/bib/bbx108 CrossRef Google Scholar
[51]	Rozas J, Ferrer-Mata A, Sánchez-DelBarrio JC, Guirao-Rico S, Librado P, et al. 2017. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Molecular Biology and Evolution 34:3299−302 doi: 10.1093/molbev/msx248 CrossRef Google Scholar
[52]	Ginestet C. 2011. ggplot2: Elegant graphics for data analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society) 174:245−46 doi: 10.1111/j.1467-985X.2010.00676_9.x CrossRef Google Scholar
[53]	Leigh JW, Bryant D. 2015. POPART: full-feature software for haplotype network construction. Methods in Ecology and Evolution 6:1110−16 doi: 10.1111/2041-210X.12410 CrossRef Google Scholar
[54]	Li Y, Chao T, Fan Y, Lou D, Wang G. 2019. Population genomics and morphological features underlying the adaptive evolution of the eastern honey bee (Apis cerana). BMC Genomics 20:869 doi: 10.1186/s12864-019-6246-4 CrossRef Google Scholar
[55]	Nguyen LT, Schmidt HA, Von Haeseler A, Minh BQ. 2015. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution 32:268−74 doi: 10.1093/molbev/msu300 CrossRef Google Scholar
[56]	Alexander DH, Lange K. 2011. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics 12:246 doi: 10.1186/1471-2105-12-246 CrossRef Google Scholar

About this article

Cite this article

Wang J, Liao X, Gu C, Xiang K, Wang J, et al. 2022. The Asian lotus (Nelumbo nucifera) pan-plastome: diversity and divergence in a living fossil grown for seed, rhizome, and aesthetics. Ornamental Plant Research 2: 2 doi: 10.48130/OPR-2022-0002

Wang J, Liao X, Gu C, Xiang K, Wang J, et al. 2022. The Asian lotus (Nelumbo nucifera) pan-plastome: diversity and divergence in a living fossil grown for seed, rhizome, and aesthetics. Ornamental Plant Research 2: 2 doi: 10.48130/OPR-2022-0002

Figures(5) / Tables(1)

Download PDF

Article Metrics

Article views(14600) PDF downloads(3023)

Other Articles By Authors

on this site
on Google Scholar

HTML

INTRODUCTION

Nelumbo Adans. (Proteales, Nelumbonaceae), is a genus of aquatic plant species with an estimated origin of 135 million years ago (mya)^[1], making it one of the earliest diverging eudicot lineages. Given the phylogenetic position in the early eudicots and the morphological similarity of extant species with fossil taxa, Nelumbo is regarded as a living fossil. Two extant species, Nelumbo lutea (Willd.) Pers. (American lotus) and Nelumbo nucifera Gaertn. (Asian lotus) are recognized in this genus^[2]. The Asian lotus (also referred to as sacred lotus and 莲 'lian' in Chinese) is distributed throughout Eastern Asia and northern Oceania in freshwater habitats. The cultivation of Asian lotus is thought to have begun more than 3,000 years ago for the production of edible seeds^[3]. In addition to seeds, Asian lotus is also grown for the large edible rhizome it produces, and as an ornamental in water gardens. From these different uses, Asian lotus growers and researchers categorize the different types of plants into seed, rhizome, and flower types based on morphological characteristics that best suit each of these applications^[4]. In addition to the cultivated types, wild Asian lotus is common throughout east and southeast Asia in lakes and ponds.

Because Asian lotus is an important food plant, extensive molecular work has been conducted to better understand the genetic diversity and history of this species. Some examples of this work include estimating the divergence time between the two Nelumbo species from complete plastome sequences at 1.5 mya^[5,6] and the discovery of an ancient whole genome duplication unique to Nelumbo using high-quality nuclear genome assemblies^[7]. Population structure and genetic diversity in Asian lotus have also been extensively studied using several different markers including random amplified polymorphic DNA (RAPD)^[8], inter-simple sequence repeats (ISSR)^[9], amplified fragment length polymorphisms (AFLP), simple sequence repeats (SSR)^[10], single nucleotide polymorphisms (SNPs)^[11], and whole-genome resequencing methods^[12−14]. From the above studies, higher genetic diversity was generally found among wild Asian lotus compared to cultivated Asian lotus, and among the cultivated lineages, seed and rhizome types resolved in distinct clades. However, some conflicts remained unresolved among these studies. In particular, Huang et al.^[12] determined that the seed lotuses were monophyletic in respect to wild and rhizome lotuses but with low bootstrap support, while Liu et al.^[14] found that seed and flower lotuses possessed higher genetic diversity, and were more often crossbred to each other than either were to rhizome lotuses. While many of the previous studies focused on patterns of genetic diversity, few have integrated geographic origin into their analyses, leaving gaps in our knowledge regarding centers of origin of the different Asian lotus types^[14]. For example, wild Asian lotus from Indonesia and their relation to cultivated types has not been properly characterized in previous studies. Given issues with incomplete lineage sorting associated with whole genome duplications^[15], using a pan-plastome approach can provide an improved resolution regarding questions of population structure, centers of origin, and assignment of cultivated types to well supported genetic clusters.

Chloroplasts (plastid refers to all membrane bound organelles of the same origin but serving different metabolic functions such as chloroplasts, chromoplasts, and leucoplasts) are the photosynthesis organelle in plant and algae cells, originating from cyanobacteria through an ancient endosymbiotic event and contain a distinct streamlined genome primarily made up of photosynthesis and replication related genes^[16]. Compared to the nuclear genome, the plastome is uniparentally inherited and nonrecombinant, which can provide a less noisy signal for inferring relatedness especially in lineages with polyploidy, incomplete lineage sorting, and/or frequent introgression^[17]. Most previous studies in phylogenomics have been focused at the species level or above and often employ genomic simplification strategies such as using only the transcriptome^[18] or reduced-representation/finely-filtered genomes^[19] in the final analyses. As whole-genome sequencing and assembly have gotten more accurate and complete, large intraspecies collections of genomes (often referred to as the pan-genome) are now being published that include all or nearly all major nucleotide variants found in a given lineage across the entire genome. Such pan-genomes have been produced for important agronomic plant species such as Glycine soja^[20], Sorghum bicolor^[21], and Brassica napus^[22]. Similarly, pan-plastomes are now being generated for several plant species with the first such dataset involving 321 complete plastomes to differentiate pepper (Capsicum) cultivars and lineages^[23]. As with other phylogenomic approaches pan-plastomics have several advantages over nuclear pan-genomes such as larger more complete reference sets for assembly and comparison, occurrence in higher copy number in the cell resulting in greater read depth, and the absence of large duplicate gene arrays reducing problems associated with paralogy^[24]. Therefore, we employed a pan-plastome approach to address several outstanding questions regarding Nelumbo cultivation and evolution. In addition, the dataset presented here is an important comparative resource for pan-plastome studies in other species which are at present uncommon.

Here, we assembled a large plastome data set including 316 (five N. lutea and 311 N. nucifera) complete circular plastomes to: (a) construct a reliable pan-plastome map for Nelumbo, (b) identify genomic patterns in the data set, such as mutational hotspots and characterization of different nucleotide variants, and (c) resolve well supported maternal lineages within Nelumbo and relate these to the different cultivated and wild types including the sister species N. lutea to address questions regarding origin and relatedness. For convenience, the names of the different lotus types and species used in this study were simplified as follows: N. lutea (North American lotus): LA; wild Asian lotus: LW; flower lotus: LF; seed lotus: LS; and rhizome lotus: LR (all cultivated types are from N. nucifera).

DISCUSSION

Hypervariable regions across the lotus pan-plastome

Plastomes are highly conserved in most land plants in terms of size, structure, and gene content, with lotuses being no exception in this regard^[25]. The pan-plastome resolved here indicates that the gene content and order were highly conserved and consistent with those previously described in Xue et al.^[5] and Wu et al.^[6]. Despite structural and genic conservation, abundant nucleotide variants were found across the Nelumbo pan-plastome. Relatively few variants were detected in cds regions, save ycf1 which was extraordinarily rich in nucleotide variants, possible because of its position spanning the junction of IRA and SSC (Figs 1 & 2). This is similar to what has been previously reported for junction spanning genes in Passiflora trichocarpa^[26]. Following ycf1 in a number of cds variants were rpoC1 and ndhF which have also been found to contain a higher number of nucleotide variants than other functional plastome genes and thus their use in phylogenetic studies for eudicot groups like Apioideae, Cactoideae, and Asteridae^[27]. While mutations (especially InDels) in cds regions are expected to result in a loss of function in translated proteins, ribosomal frameshifting in plastomes has been shown to recover original functions from cds containing mutations^[28]. As such mutations in some plastome cds regions may have less of an impact than predicted. Unlike the relatively limited number of cds variants the type and abundance of nucleotide variants in IGS and introns were considerably greater. For instance, block substitutions in cds regions are found only in the ycf1 and rpoA genes whereas they are relatively abundant in IGS and intronic regions (although less so in introns). Specifically the large number of variants found in the Nelumbo rpl16 intron is similar to that found in the distantly related plant families Crypteroniaceae and Poaceae where this hypervariable region was employed in phylogenetic studies^[29,30]. Similarly, the presence of InDels is higher outside of cds regions. This pattern of variant abundance and type found in the Nelumbo pan-plastome follows that found in other plastomes^[31]. Because frameshift mutations result in greater disruption to protein structure and function, they are often purged via selection^[32]. That said, frameshift correction through translation recoding has been recently described from chloroplasts^[28] which may render some cds mutations less impactful by retaining original protein function across lineages despite underlying differences in DNA. As more pan-plastome studies are completed it is becoming increasingly apparent that genomic regions previously used for higher-level systematic studies such as rbcL and matK should be supplemented with hypervariable regions found in IGS and intronic regions for improved resolution in intraspecific studies. In this study the IGS regions rps18-rpl33 and trnQ-UUG-rps16 proved especially rich with informative markers. Our pan-plastome study like those from cultivated species Brassica napus and Sorghum sp^[21,22] found that SNVs are by far the most common variant type. However, one of the outstanding questions is whether variants differ in type and effect (in regard to gene function) between domesticated lineages and wild progenitors. With the completion of more pan-plastome studies from diverse cultivated taxa, patterns specific to domesticated lineages can now be resolved to try to understand the function and importance of plastids in the domestication syndrome specifically, and in plant evolutionary biology more generally.

Divergence among genetic clusters and the centers of origin for different types of Asian lotus
Plant population structure and genetic diversity are known to be affected through a number of different processes including genetic drift, reproductive isolation, local adaptation, demographic fluctuations, mode of reproduction, and additionally from artificial selection and human translocation associated with the domestication process^[33]. Such patterns are evident in this pan-plastome study of Nelumbo wherein geographic separation between N. nucifera and N. lutea is reflected in the numerous fixed genetic differences between these species (Fig. 3). Both analyses based on nuclear^[13,14] and plastomic dataset here supported the indubitable divergence between these two Nelumbo species, while some differences were also found regarding phylogeny of genetic clusters within N. nucifera mainly due to the conflicts of maternal and paternal inheritance (nucleocytoplasmic conflicts) common seen in many other species^[34,35], which was also important evidence of hybridization or introgression, such as most seed lotus accessions were resolved as monophyly in the previous two researches, but into genetic clusters II and IV here. Additionally, sample differences between the two previous works, and the limited genetic information plastome carried compared to nuclear loci controlling morphologic traits used to designated lotus types could also cause these differences like seed and flower lotuses in genetic cluster IV, but not much. This was also reflected by the much lower genetic diversities of each genetic cluster compared to that in Li et al.^[13], and Liu et al.^[14]. Genetic clusters II and III showed much higher genetic diversity than others the same as nuclear analyses in Liu et al.^[14] regarding seed and wild types, whichever genetic cluster VI (rhizome type) showed relatively lower genetic diversity.

Within N. nucifera, six well-supported genetic clusters were resolved with notable differences in the genetic and haplotypic diversity as well as the cultivated types found in each (Fig. 3). For instance, genetic cluster III is characterized by having a large number of haplotypes each separated by many genetic differences with few repeats per haplotype. In addition, the membership of genetic cluster III is made up of all wild accession except for a single seed type (LS036, h22). One possible interpretation from this pattern is that genetic cluster III represents a wild lineage from which few cultivated types have been selected. This interpretation is further confirmed by noting that the patterns resolved in genetic cluster III are similar to those found in wild N. lutea, although more sampling in N. lutea is needed to confirm this pattern. It suggested that each cultivated type — flower, rhizome, or seed lotus was not single-originated (cultivated from the single wild population) because no cultivated type was found solely within single genetic cluster, implying potential multiple origins for all types and/or maternal introgression into cultivated types from different origins, as like the instances where it was clearly known when the cultivated rice was initially selected from certain cultivated lineages^[36,37]. Types can be further resolved by haplotype wherein several types are sometimes found within a single haplotype. For instance, the largest haplotype h7 (genetic cluster VI) with 109 accessions is made up of 1% flower, 93% rhizome, and 6% wild types. Furthermore, the relatively narrow distributions of wild lotus in genetic clusters I and III, while cultivated types in genetic clusters V and VI further expanded their range, indicating the domestication and cultivation history of lotus has gradually expanded under the action of human activity. Genetic diversity also showed a decreasing trend from wild to cultivated types (genetic clusters III to V to VI), which may also be a signal of human domestication. It showed that cultivated types were selected to be cultivated from multiple origins or if it has maternal introgression, both of which could result in a polyphyletic pattern among the cultivated types, for instance, rhizome lotuses were found in four of six genetic clusters with 12 haplotypes; flower lotuses found in five of six genetic clusters with nine haplotypes, and wild lotuses found in all six genetic clusters with 28 haplotypes (Fig. 3a, b, & Fig. 5). Given this pattern among our plastomic data a monophyletic origin for seed lotuses is not supported^[12], however because seed lotuses were found in four out of six genetic clusters with eight unique haplotypes, claims regarding diversity are supported by our data. Based on this wild type, lotuses contain the highest level of genetic diversity with cultivated types also exhibiting high levels of plastomic diversity. It should also be noted that the classification of cultivated types was based on their primary use, and some types also have traits that make them usable for other purposes, which might cause some tenuous designations to bring out conflicting results in determining monophyly. An important step in understanding the evolution of cultivated lotuses would be to analyze the nuclear genes involved with lotus domestication^[13,38] in concert with the pan-plastome data to better understand how plastome divergence is concordant with patterns of artificial selection detected in the nuclear genome. Such findings may help to elucidate patterns of introgression in the domestication of lotus and how plastomes might have been involved in controlling the directionality of crosses through cytonuclear-incompatibility.

With regard to the geographic origins of cultivated lotuses, several inferences can be made. Genetic cluster I has a probable origin in Yunnan province as all wild accessions were collected there and this genetic cluster has been the matrilineal source for a very small number of flower type cultivars (two flower types from genetic cluster I collected in this study). Of any of the geographic patterns genetic cluster I is the most restricted and least selected from in generating lotus cultivars. The only wild accession in genetic cluster II was from Chiang Mai, Thailand suggesting that this may be the origin of the many seed types collected from this genetic cluster in central eastern China (Fig. 5). However alternative inferences include matrilineal gene flow into wild Thai populations or the Thai accession is the result of an escaped cultivar^[39]. Given that higher genetic diversity is present in China within genetic cluster II, the alternative inferences cannot be ruled out as centers of origin can also be centers of genetic diversity. Genetic cluster III like I is made up primarily of wild type accessions but unlike genetic cluster I, III is geographically distributed throughout Southeast Asia and eastern China. Additionally, haplotypes within genetic cluster III are restricted to a given geographic location. As such, genetic cluster III may represent a lineage that broadly dispersed in the distant past and thereafter through adaptation and drift have produced localized haplotypes. The geography of island and peninsula formation in the Sunda Shelf over the last 50 million years may have helped drive this pattern^[40]. Genetic clusters IV, V, and VI all appear to originate in eastern China as no wild accession were found outside this geographic area. In genetic cluster IV, a single wild accession from Yunnan shares the h6 haplotype with 98% of the mostly flower and seed type accessions in this genetic cluster. The low nucleotide and haplotype diversity of matrilineal genetic cluster IV is counter to findings found among seed and flower lotuses where high levels of admixture have been noted in these lotus types using nuclear data. That said it is possible to have had a matrilineal bottleneck induced from cytoplasmic incompatibility within a lineage while maintaining a highly diverse and admixed nuclear genome over time^[41]. That said, flower and seed types are found in nearly all of the genetic clusters, albeit only two out of 166 are flower type and no seed types in genetic cluster VI, suggesting high levels of maternal introgression among flower and seed types across genetic clusters. Genetic cluster VI is clearly the source of most rhizome type lotuses and because the plant part selected for is unrelated to sexual reproduction, a few very common haplotypes (resulting from asexual reproduction via rhizome cuttings to plant fields) account for nearly all rhizome types in this genetic cluster. Despite most accessions in genetic cluster VI having only a few haplotypes, numerous wild haplotypes were also assigned to this cluster with some having restricted geographic distribution (Supplemental Fig. S4). This suggests that a good deal of wild diversity remains throughout eastern China and especially in the northeastern region.

The domestication of aquatic plants for human consumption is unsurpassed in diversity and extent outside of the eastern coastal plain of China. Lotus, because of the many parts of the plant that can be used for human consumption and the health benefits from eating these parts, has been and will continue to be an important food crop for humans. As with any crop, genetic diversity is essential to maintain high levels of nutrition, disease resistance, yields, and improving or developing traits of interest^[42]. Wild populations of Asian lotus are known to be threatened by human development and environmental pollution^[13] making the characterization and mapping of genetic diversity all the more important in prioritizing conservation efforts. Our study has shown that cultivated and wild Asian lotus are divided into at least six maternal lineages with geographic distribution and selection of lotus types differing between genetic clusters. From these results, several regions in China (namely Yunnan and the northeast) as well as regions in southeast Asia should be explored further to more properly characterize the unique genetic diversity of lotuses from these areas. In addition, these wild haplotypes should be assessed for their potential use in developing new lotus cultivars. The experimental breeding of diverse lotuses may also provide useful insights into cytonuclear incompatibility and further our understanding of genomic evolution in this living fossil lineage. In summary, the pan-plastome resources presented here for lotus will provide new insights into the natural and domestication history of this lineage as well as prove useful in applied studies such as marker-assisted breeding or the development of transplastomic lines for improved yield or disease resistance.

Variants	Total	Region			Location
Variants	Total	LSC	SSC	IRA/B	CDS	Intron	IGS
SNV	418 (294)	274 (182)	112 (89)	16 (12)	117 (91)	33 (23)	268 (180)
Block substitution	70 (54)	56 (42)	12 (11)	1 (0)	3 (3)	4 (4)	63 (46)
InDel	208 (151)	157 (118)	43 (27)	4 (3)	4 (2)	25 (19)	179 (130)
Total	696 (499)	487 (342)	167 (127)	21 (15)	124 (96)	62 (46)	510 (356)

{{lists.name}}

The Asian lotus (Nelumbo nucifera) pan-plastome: diversity and divergence in a living fossil grown for seed, rhizome, and aesthetics