ARTICLE   Open Access    

The Asian lotus (Nelumbo nucifera) pan-plastome: diversity and divergence in a living fossil grown for seed, rhizome, and aesthetics

  • # These authors contributed equally: Jie Wang, Xuezhu Liao, Cuihua Gu

More Information
  • The Asian lotus (Nelumbo nucifera) has a history of cultivation in Asia dating back over 3,000 years where it has been an important food crop producing edible rhizomes and seeds as well as flowers of great aesthetic and cultural value. Here, we de novo assembled the plastomes of 316 lotus accessions including five North American lotus (N. lutea) and 311 Asian lotus (N. nucifera) to construct a pan-plastome genome map, and investigate the phylogeography and genetic diversity among the only two extant species within this living fossil lineage. A total of 113 unique genes were annotated and plastome sizes varied between 163,457 and 163,672 bp with only minor differences in each of the four major genomic units. The most abundant nucleotide differences among plastomes were single nucleotide variants followed by insertions/deletions and block substitutions mainly found in intergenic spacer regions of the large single copy portion of the plastome. Seven well-supported genetic clusters were resolved using multiple different population structure analyses. The different lotus types (flower, seed, rhizome, or wild) were disproportionally assigned to multiple different genetic clusters. This pattern indicates that the domestication of Asian lotus involved multiple genetic origins and possible matrilineal introgression. Geographic mapping of accessions also revealed that genetic diversity is unevenly distributed with eastern China possessing the highest genetic diversity and regions such as Yunnan, Indonesian, and Thailand possessing unique haplotypes. These results provide an important maternal history of Nelumbo and necessary groundwork for future studies on intergenomic gene transfer, cytonuclear incompatibility, and conservation genetics.
  • Starting in the early 2000s, China has experienced rapid growth as an emerging wine market. It has now established itself as the world's second-largest grape-growing country in terms of vineyard surface area. Furthermore, China has also secured its position as the sixth-biggest wine producer globally and the fifth-most significant wine consumer in terms of volume[1]. The Ningxia Hui autonomous region, known for its reputation as the highest quality wine-producing area in China, is considered one of the country's most promising wine regions. The region's arid or semiarid climate, combined with ample sunlight and warmth, thanks to the Yellow River, provides ideal conditions for grape cultivation. Wineries in the Ningxia Hui autonomous region are renowned as the foremost representatives of elite Chinese wineries. All wines produced in this region originate from grapes grown in their vineyards, adhering to strict quality requirements, and have gained a well-deserved international reputation for excellence. Notably, in 2011, Helan Mountain's East Foothill in the Ningxia Hui Autonomous Region received protected geographic indication status in China. Subsequently, in 2012, it became the first provincial wine region in China to be accepted as an official observer by the International Organisation of Vine and Wine (OIV)[2]. The wine produced in the Helan Mountain East Region of Ningxia, China, is one of the first Agricultural and Food Geographical Indications. Starting in 2020, this wine will be protected in the European Union[3].

    Marselan, a hybrid variety of Cabernet Sauvignon and Grenache was introduced to China in 2001 by the French National Institute for Agricultural Research (INRA). Over the last 15 years, Marselan has spread widely across China, in contrast to its lesser cultivation in France. The wines produced from Marselan grapes possess a strong and elegant structure, making them highly suitable for the preferences of Chinese consumers. As a result, many wineries in the Ningxia Hui Autonomous Region have made Marselan wines their main product[4]. Wine is a complex beverage that is influenced by various natural and anthropogenic factors throughout the wine-making process. These factors include soil, climate, agrochemicals, and human intervention. While there is an abundance of research available on wine production, limited research has been conducted specifically on local wines in the Eastern Foot of Helan Mountain. This research gap is of significant importance for the management and quality improvement of Chinese local wines.

    Ion mobility spectrometry (IMS) is a rapid analytical technique used to detect trace gases and characterize chemical ionic substances. It achieves this through the gas-phase separation of ionized molecules under an electric field at ambient pressure. In recent years, IMS has gained increasing popularity in the field of food-omics due to its numerous advantages. These advantages include ultra-high analytical speed, simplicity, easy operation, time efficiency, relatively low cost, and the absence of sample preparation steps. As a result, IMS is now being applied more frequently in various areas of food analysis, such as food composition and nutrition, food authentication, detection of food adulteration, food process control, and chemical food safety[5,6]. The orthogonal hyphenation of gas chromatography (GC) and IMS has greatly improved the resolution of complex food matrices when using GC-IMS, particularly in the analysis of wines[7].

    The objective of this study was to investigate the changes in the physicochemical properties of Marselan wine during the winemaking process, with a focus on the total phenolic and flavonoids content, antioxidant activity, and volatile profile using the GC-IMS method. The findings of this research are anticipated to make a valuable contribution to the theoretical framework for evaluating the authenticity and characterizing Ningxia Marselan wine. Moreover, it is expected that these results will aid in the formulation of regulations and legislation pertaining to Ningxia Marselan wine in China.

    All the grapes used to produce Marselan wines, grow in the Xiban vineyard (106.31463° E and 38.509541° N) situated in Helan Mountain's East Foothill of Ningxia Hui Autonomous Region in China.

    Folin-Ciocalteau reagent, (±)-6-Hydroxy-2,5,7,8-tetramethylchroman-2-carboxylic acid (Trolox), 2,20-azino-bis-(3-ethylbenzthiazoline-6-sulfonic acid) (ABTS), 2,4,6-tris (2-pyridyl)-s-triazine (TPTZ), anhydrous methanol, sodium nitrite, and sodium carbonate anhydrous were purchased from Shanghai Aladdin Biochemical Technology Co., Ltd. (Shanghai, China). Reference standards of (+)-catechin, gallic acid, and the internal standard (IS) 4-methyl-2-pentanol were supplied by Shanghai Yuanye Bio-Technology Co., Ltd (Shanghai, China). The purity of the above references was higher than 98%. Ultrapure water (18.2 MΩ cm) was prepared by a Milli-Q system (Millipore, Bedford, MA, USA).

    Stage 1−Juice processing: Grapes at the fully mature stage are harvested and crushed, and potassium metabisulfite (5 mg/L of SO2) was evenly spread during the crushing process. The obtained must is transferred into stainless steel tanks. Stage 2−Alcoholic fermentation: Propagated Saccharomyces cerevisiae ES488 (Enartis, Italy) are added to the fresh must, and alcoholic fermentation takes place, after the process is finished, it is kept in the tanks for 7 d for traditional maceration to improve color properties and phenolics content. Stage 3−Malolactic fermentation: When the pomace is fully concentrated at the bottom of the tanks, the wine is transferred to another tank for separation from these residues. Oenococcus oeni VP41 (Lallemand Inc., France) is inoculated and malic acid begins to convert into lactic acid. Stage 4−Wine stabilization: After malolactic fermentation, potassium metabisulfite is re-added (35 mg/L of SO2), and then transferred to oak barrels for stabilization, this process usually takes 6-24 months. A total of four batches of samples during the production process of Marselan wine were collected in this study.

    Total polyphenols were determined on 0.5 mL diluted wine sample using the Folin-Ciocalteu method[8], using gallic acid as a reference compound, and expressed as milligrams of gallic acid equivalents per liter of wine. The total flavonoid content was measured on 0.05 mL of wine sample by a colorimetric method previously described[9]. Results are calculated from the calibration curve obtained with catechin, as milligrams of catechin equivalents per liter of wine.

    The antioxidative activity was determined using the ABTS·+ assay[10]. Briefly, the ABTS·+ radical was prepared from a mixture of 88 μL of potassium persulfate (140 mmol/L) with 5 mL of the ABTS·+ solution (7 mmol/L). The reaction was kept at room temperature under the absence of light for 16 h. Sixty μL samples were mixed with 3 mL of ABTS·+ solution with measured absorption of 0.700 ± 0.200 at 734 nm. After 6 min reaction, the absorbance of samples were measured with a spectrophotometer at 734 nm. Each sample was tested in triplicate. The data were expressed as mmol Trolox equivalent of antioxidative capacity per liter of the wine sample (mmol TE/L). Calibration curves, in the range 64.16−1,020.20 μmol TE/L, showed good linearity (R2 ≥ 0.99).

    The FRAP assay was conducted according to a previous study[11]. The FRAP reagent was freshly prepared and mixed with 10 mM/L TPTZ solution prepared in 20 mM/L FeCl3·6H2O solution, 40 mM/L HCl, and 300 mM/L acetate buffer (pH 3.6) (1:1:10; v:v:v). Ten ml of diluted sample was mixed with 1.8 ml of FRAP reagent and incubated at 37 °C for 30 min. The absorbance was determined at 593 nm and the results were reported as mM Fe (II) equivalent per liter of the wine sample. The samples were analyzed and calculated by a calibration curve of ferrous sulphate (0.15−2.00 mM/mL) for quantification.

    The volatile compounds were analyzed on a GC-IMS instrument (FlavourSpec, GAS, Dortmund, Germany) equipped with an autosampler (Hanon Auto SPE 100, Shandong, China) for headspace analysis. One mL of each wine was sampled in 20 mL headspace vials (CNW Technologies, Germany) with 20 μL of 4-methyl-2-pentanol (20 mg/L) ppm as internal standard, incubated at 60 °C and continuously shaken at 500 rpm for 10 min. One hundred μL of headspace sample was automatically loaded into the injector in splitless mode through a syringe heated to 65 °C. The analytes were separated on a MxtWAX capillary column (30 m × 0.53 mm, 1.0 μm) from Restek (Bellefonte, Pennsylvania, USA) at a constant temperature of 60 °C and then ionized in the IMS instrument (FlavourSpec®, Gesellschaft für Analytische Sensorsysteme mbH, Dortmund, Germany) at 45 °C. High purity nitrogen gas (99.999%) was used as the carrier gas at 150 mL/min, and drift gas at 2 ml/min for 0−2.0 min, then increased to 100 mL/min from 2.0 to 20 min, and kept at 100 mL/min for 10 min. Ketones C4−C9 (Sigma Aldrich, St. Louis, MO, USA) were used as an external standard to determine the retention index (RI) of volatile compounds. Analyte identification was performed using a Laboratory Analytical Viewer (LAV) 2.2.1 (GAS, Dortmund, Germany) by comparing RI and the drift time of the standard in the GC-IMS Library.

    All samples were prepared in duplicate and tested at least six times, and the results were expressed as mean ± standard error (n = 4) and the level of statistical significance (p < 0.05) was analyzed by using Tukey's range test using SPSS 18.0 software (SPSS Inc., IL, USA). The principal component analysis (PCA) was performed using the LAV software in-built 'Dynamic PCA' plug-in to model patterns of aroma volatiles. Orthogonal partial least-square discriminant analysis (OPLS-DA) in SIMCA-P 14.1 software (Umetrics, Umeă, Sweden) was used to analyze the different volatile organic compounds in the different fermentation stages.

    The results of the changes in the antioxidant activity of Marselan wines during the entire brewing process are listed in Table 1. It can be seen that the contents of flavonoids and polyphenols showed an increasing trend during the brewing process of Marselan wine, which range from 315.71−1,498 mg CE/L and 1,083.93−3,370.92 mg GAE/L, respectively. It was observed that the content increased rapidly in the alcoholic fermentation stage, but slowly in the subsequent fermentation stage. This indicated that the formation of flavonoid and phenolic substances in wine mainly concentrated in the alcoholic fermentation stage, which is consistent with previous reports. This is mainly because during the alcoholic fermentation of grapes, impregnation occurred to extract these compounds[12]. The antioxidant activities of Marselan wine samples at different fermentation stages were detected by FRAP and ABTS methods[11]. The results showed that the ferric reduction capacity and ABST·+ free radical scavenging capacity of the fermented Marselan wines were 2.4 and 1.5 times higher than the sample from the juice processing stage, respectively, indicating that the fermented Marselan wine had higher antioxidant activity. A large number of previous studies have suggested that there is a close correlation between antioxidant activity and the content of polyphenols and flavonoids[1315]. Previous studies have reported that Marselan wine has the highest total phenol and anthocyanin content compared to the wine of Tannat, Cabernet Sauvignon, Merlot, Cabernet Franc, and Syrah[13]. Polyphenols and flavonoids play an important role in improving human immunity. Therefore, Marselan wines are popular because of their high phenolic and flavonoid content and high antioxidant capacity.

    Table 1.  GC-IMS integration parameters of volatile compounds in Marselan wine at different fermentation stages.
    No. Compounds Formula RI* Rt
    [sec]**
    Dt
    [RIPrel]***
    Identification
    approach
    Concentration (μg/mL) (n = 4)
    Stage 1 Stage 2 Stage 3 Stage 4
    Aldehydes
    5 Furfural C5H4O2 1513.1 941.943 1.08702 RI, DT, IS 89.10 ± 4.05c 69.98 ± 3.22c 352.16 ± 39.06b 706.30 ± 58.22a
    6 Furfural dimer C5H4O2 1516.6 948.77 1.33299 RI, DT, IS 22.08 ± 0.69b 18.68 ± 2.59c 23.73 ± 2.69b 53.39 ± 9.42a
    12 (E)-2-hexenal C6H10O 1223.1 426.758 1.18076 RI, DT, IS 158.17 ± 7.26a 47.57 ± 2.51b 39.00 ± 2.06c 43.52 ± 4.63bc
    17 (E)-2-pentenal C5H8O 1129.2 333.392 1.1074 RI, DT, IS 23.00 ± 4.56a 16.42 ± 1.69c 18.82 ± 0.27b 18.81 ± 0.55b
    19 Heptanal C7H14O 1194.2 390.299 1.33002 RI, DT, IS 17.28 ± 2.25a 10.22 ± 0.59c 14.50 ± 8.84b 9.11 ± 1.06c
    22 Hexanal C6H12O 1094.6 304.324 1.25538 RI, DT, IS 803.11 ± 7.47c 1631.34 ± 19.63a 1511.11 ± 26.91b 1526.53 ± 8.12b
    23 Hexanal dimer C6H12O 1093.9 303.915 1.56442 RI, DT, IS 588.85 ± 7.96a 93.75 ± 4.67b 92.93 ± 3.13b 95.49 ± 2.50b
    29 3-Methylbutanal C5H10O 914.1 226.776 1.40351 RI, DT, IS 227.86 ± 6.39a 33.32 ± 2.59b 22.36 ± 1.18c 21.94 ± 1.73c
    33 Dimethyl sulfide C2H6S 797.1 193.431 0.95905 RI, DT, IS 120.07 ± 4.40c 87.a02 ± 3.82d 246.81 ± 5.62b 257.18 ± 3.04a
    49 2-Methylpropanal C4H8O 828.3 202.324 1.28294 RI, DT, IS 150.49 ± 7.13a 27.08 ± 1.48b 19.36 ± 1.10c 19.69 ± 0.92c
    Ketones
    45 3-Hydroxy-2-butanone C4H8O2 1293.5 515.501 1.20934 RI, DT, IS 33.20 ± 3.83c 97.93 ± 8.72b 163.20 ± 21.62a 143.51 ± 21.48a
    46 Acetone C3H6O 836.4 204.638 1.11191 RI, DT, IS 185.75 ± 8.16c 320.43 ± 12.32b 430.74 ± 3.98a 446.58 ± 10.41a
    Organic acid
    3 Acetic acid C2H4O2 1527.2 969.252 1.05013 RI, DT, IS 674.66 ± 46.30d 3602.39 ± 30.87c 4536.02 ± 138.86a 4092.30 ± 40.33b
    4 Acetic acid dimer C2H4O2 1527.2 969.252 1.15554 RI, DT, IS 45.25 ± 3.89c 312.16 ± 19.39b 625.79 ± 78.12a 538.35 ± 56.38a
    Alcohols
    8 1-Hexanol C6H14O 1365.1 653.825 1.32772 RI, DT, IS 1647.65 ± 28.94a 886.33 ± 32.96b 740.73 ± 44.25c 730.80 ± 21.58c
    9 1-Hexanol dimer C6H14O 1365.8 655.191 1.64044 RI, DT, IS 378.42 ± 20.44a 332.65 ± 25.76a 215.78 ± 21.04b 200.14 ± 28.34b
    13 3-Methyl-1-butanol C5H12O 1213.3 414.364 1.24294 RI, DT, IS 691.86 ± 9.95c 870.41 ± 22.63b 912.80 ± 23.94a 939.49 ± 12.44a
    14 3-Methyl-1-butanol dimer C5H12O 1213.3 414.364 1.49166 RI, DT, IS 439.90 ± 29.40c 8572.27 ± 60.56b 9083.14 ± 193.19a 9152.25 ± 137.80a
    15 1-Butanol C4H10O 1147.2 348.949 1.18073 RI, DT, IS 157.33 ± 9.44b 198.92 ± 3.92a 152.78 ± 10.85b 156.02 ± 9.80b
    16 1-Butanol dimer C4H10O 1146.8 348.54 1.38109 RI, DT, IS 24.14 ± 2.15c 274.75 ± 12.60a 183.02 ± 17.72b 176.80 ± 19.80b
    24 1-Propanol C3H8O 1040.9 274.803 1.11042 RI, DT, IS 173.73 ± 4.75a 55.84 ± 2.16c 80.80 ± 4.99b 83.57 ± 2.34b
    25 1-Propanol dimer C3H8O 1040.4 274.554 1.24784 RI, DT, IS 58.20 ± 1.30b 541.37 ± 11.94a 541.33 ± 15.57a 538.84 ± 9.74a
    28 Ethanol C2H6O 930.6 231.504 1.11901 RI, DT, IS 5337.84 ± 84.16c 11324.05 ± 66.18a 9910.20 ± 100.76b 9936.10 ± 101.24b
    34 Methanol CH4O 903.6 223.79 0.98374 RI, DT, IS 662.08 ± 13.87a 76.94 ± 2.15b 61.92 ± 1.96c 62.89 ± 0.81c
    37 2-Methyl-1-propanol C4H10O 1098.5 306.889 1.35839 RI, DT, IS 306.91 ± 4.09c 3478.35 ± 25.95a 3308.79 ± 61.75b 3313.85 ± 60.88b
    48 1-Pentanol C5H12O 1257.6 470.317 1.25222 RI, DT, IS 26.13 ± 2.52c 116.50 ± 3.71ab 112.37 ± 6.26b 124.17 ± 7.04a
    Esters
    1 Methyl salicylate C8H8O3 1859.6 1616.201 1.20489 RI, DT, IS 615.00 ± 66.68a 485.08 ± 31.30b 470.14 ± 23.02b 429.12 ± 33.74b
    7 Butyl hexanoate C10H20O2 1403.0 727.561 1.47354 RI, DT, IS 95.83 ± 17.04a 62.87 ± 3.62a 92.59 ± 11.88b 82.13 ± 3.61c
    10 Hexyl acetate C8H16O2 1298.6 524.366 1.40405 RI, DT, IS 44.72 ± 8.21a 33.18 ± 2.17d 41.50 ± 4.38c 40.89 ± 4.33b
    11 Propyl hexanoate C9H18O2 1280.9 499.577 1.39274 RI, DT, IS 34.65 ± 3.90d 70.43 ± 5.95a 43.97 ± 4.39b 40.12 ± 4.05c
    18 Ethyl hexanoate C8H16O2 1237.4 444.749 1.80014 RI, DT, IS 55.55 ± 5.62c 1606.16 ± 25.63a 787.24 ± 16.95b 788.91 ± 28.50b
    20 Isoamyl acetate C7H14O2 1127.8 332.164 1.30514 RI, DT, IS 164.22 ± 1.00d 243.69 ± 8.37c 343.51 ± 13.98b 365.46 ± 1.60a
    21 Isoamyl acetate dimer C7H14O2 1126.8 331.345 1.75038 RI, DT, IS 53.61 ± 4.79d 4072.20 ± 11.94a 2416.70 ± 49.84b 2360.46 ± 43.29c
    26 Isobutyl acetate C6H12O2 1020.5 263.605 1.23281 RI, DT, IS 101.65 ± 1.81a 15.52 ± 0.67c 44.87 ± 3.21b 45.96 ± 1.41b
    27 Isobutyl acetate dimer C6H12O2 1019.6 263.107 1.61607 RI, DT, IS 34.60 ± 1.05d 540.84 ± 5.64a 265.54 ± 8.31c 287.06 ± 3.66b
    30 Ethyl acetate dimer C4H8O2 885.2 218.564 1.33587 RI, DT, IS 1020.75 ± 6.86d 5432.71 ± 6.55a 5052.99 ± 9.65b 5084.47 ± 7.30c
    31 Ethyl acetate C4H8O2 878.3 216.574 1.09754 RI, DT, IS 215.65 ± 3.58a 38.29 ± 2.37c 71.59 ± 2.99b 69.32 ± 2.85b
    32 Ethyl formate C3H6O2 838.1 205.127 1.19738 RI, DT, IS 175.48 ± 3.79d 1603.20 ± 13.72a 1472.10 ± 5.95c 1509.08 ± 13.26b
    35 Ethyl octanoate C10H20O2 1467.0 852.127 1.47312 RI, DT, IS 198.86 ± 36.71b 1853.06 ± 17.60a 1555.51 ± 24.21a 1478.05 ± 33.63a
    36 Ethyl octanoate dimer C10H20O2 1467.0 852.127 2.03169 RI, DT, IS 135.50 ± 13.02d 503.63 ± 15.86a 342.89 ± 11.62b 297.28 ± 14.40c
    38 Ethyl butanoate C6H12O2 1042.1 275.479 1.5664 RI, DT, IS 21.29 ± 2.68c 1384.67 ± 8.97a 1236.52 ± 20.21b 1228.09 ± 5.09b
    39 Ethyl 3-methylbutanoate C7H14O2 1066.3 288.754 1.26081 RI, DT, IS 9.70 ± 1.85d 200.29 ± 4.21a 146.87 ± 8.70b 127.13 ± 12.54c
    40 Propyl acetate C5H10O2 984.7 246.908 1.48651 RI, DT, IS 4.57 ± 1.07c 128.63 ± 4.28a 87.75 ± 3.26b 88.49 ± 1.99b
    41 Ethyl propanoate C5H10O2 962.1 240.47 1.46051 RI, DT, IS 10.11 ± 0.34d 107.08 ± 3.50a 149.60 ± 5.39c 167.15 ± 12.90b
    42 Ethyl isobutyrate C6H12O2 971.7 243.229 1.56687 RI, DT, IS 18.29 ± 2.61d 55.22 ± 1.07c 98.81 ± 4.67b 104.71 ± 4.73a
    43 Ethyl lactate C5H10O3 1352.2 628.782 1.14736 RI, DT, IS 31.81 ± 2.91c 158.03 ± 2.80b 548.14 ± 74.21a 527.01 ± 39.06a
    44 Ethyl lactate dimer C5H10O3 1351.9 628.056 1.53618 RI, DT, IS 44.55 ± 2.03c 47.56 ± 4.02c 412.23 ± 50.96a 185.87 ± 31.25b
    47 Ethyl heptanoate C9H18O2 1339.7 604.482 1.40822 RI, DT, IS 39.55 ± 6.37a 38.52 ± 2.47a 28.44 ± 1.52c 30.77 ± 2.79b
    Unknown
    1 RI, DT, IS 15.53 ± 0.18 35.69 ± 0.80 12.70 ± 0.80 10.57 ± 0.86
    2 RI, DT, IS 36.71 ± 1.51 120.41 ± 3.44 198.12 ± 6.01 201.19 ± 3.70
    3 RI, DT, IS 44.35 ± 0.88 514.12 ± 4.28 224.78 ± 6.56 228.32 ± 4.62
    4 RI, DT, IS 857.64 ± 8.63 33.22 ± 1.99 35.05 ± 5.99 35.17 ± 3.97
    * Represents the retention index calculated using n-ketones C4−C9 as external standard on MAX-WAX column. ** Represents the retention time in the capillary GC column. *** Represents the migration time in the drift tube.
     | Show Table
    DownLoad: CSV

    This study adopted the GC-IMS method to test the volatile organic compounds (VOCs) in the samples from the different fermentation stages of Marselan wine. Figure 1 shows the gas phase ion migration spectrum obtained, in which the ordinate represents the retention time of the gas chromatographic peaks and the abscissa represents the ion migration time (normalized)[16]. The entire spectrum represents the aroma fingerprints of Marselan wine at different fermentation stages, with each signal point on the right of the relative reactant ion peak (RIP) representing a volatile organic compound detected from the sample[17]. Here, the sample in stage 1 (juice processing) was used as a reference and the characteristic peaks in the spectrum of samples in other fermentation stages were compared and analyzed after deducting the reference. The colors of the same component with the same concentration cancel each other to form a white background. In the topographic map of other fermentation stages, darker indicates higher concentration compared to the white background. In the 2D spectra of different fermentation stages, the position and number of peaks indicated that peak intensities are basically the same, and there is no obvious difference. However, it is known that fermentation is an extremely complex chemical process, and the content and types of volatile organic compounds change with the extension of fermentation time, so other detection and characterization methods are needed to make the distinction.

    Figure 1.  2D-topographic plots of volatile organic compounds in Marselan wine at different fermentation stages.

    To visually display the dynamic changes of various substances in the fermentation process of Marselan wine, peaks with obvious differences were extracted to form the characteristic fingerprints for comparison (Fig. 2). Each row represents all signal peaks selected from samples at the same stage, and each column means the signal peaks of the same volatile compound in samples from different fermentation stages. Figure 2 shows the volatile organic compounds (VOCs) information for each sample and the differences between samples, where the numbers represent the undetermined substances in the migration spectrum library. The changes of volatile substances in the process of Marselan winemaking is observed by the fingerprint. As shown in Fig. 2 and Table 2, a total of 40 volatile chemical components were detected by qualitative analysis according to their retention time and ion migration time in the HS-GC-IMS spectrum, including 17 esters, eight alcohols, eight aldehydes, two ketones, one organic acid, and four unanalyzed flavor substances. The 12 volatile organic compounds presented dimer due to ionization of the protonated neutral components before entering the drift tube[18]. As can be seen from Table 2, the VOCs in the winemaking process of Marselan wine are mainly composed of esters, alcohols, and aldehydes, which play an important role in the construction of aroma characteristics.

    Figure 2.  Fingerprints of volatile organic compounds in Marselan wine at different fermentation stages.
    Table 2.  Antioxidant activity, total polyphenols, and flavonoids content of Marselan wine at different fermentation stages.
    Winemaking stage TFC (mg CE/L) TPC (mg GAE/L) FRAP (mM FeSO4/mL) ABTs (mM Trolox/L)
    Stage 1 315.71 ± 0.00d 1,083.93 ± 7.79d 34.82c 38.92 ± 2.12c
    Stage 2 1,490.00 ± 7.51c 3,225.51 ± 53.27c 77.32b 52.17 ± 0.95b
    Stage 3 1,510.00 ± 8.88a 3,307.143 ± 41.76b 77.56b 53.04 ± 0.76b
    Stage 4 1,498.57 ± 6.34b 3,370.92 ± 38.29a 85.07a 57.46 ± 2.55a
    Means in the same column with different letters are significantly different (p < 0.05).
     | Show Table
    DownLoad: CSV

    Esters are produced by the reaction of acids and alcohols in wine, mainly due to the activity of yeast during fermentation[19], and are the main components of fruit juices and wines that produce fruit flavors[20,21]. In this study, it was found that they were the largest detected volatile compound group in Marselan wine samples, which is consistent with previous reports[22]. It can be observed from Table 2 that the contents of most esters increased gradually with the extension of fermentation time, and they mainly began to accumulate in large quantities during the stage of alcohol fermentation. The contents of ethyl hexanoate (fruity), isoamyl acetate (banana, pear), ethyl octanoate (fruity, pineapple, apple, brandy), ethyl acetate (fruity), ethyl formate (spicy, pineapple), and ethyl butanoate (sweet, pineapple, banana, apple) significantly increased at the stage of alcoholic fermentation and maintained a high level in the subsequent fermentation stage (accounting for 86% of the total detected esters). These esters can endow a typical fruity aroma of Marselan wine, and played a positive role in the aroma profiles of Marselan wine. Among them, the content of ethyl acetate is the highest, which is 5,153.79 μg/mL in the final fermentation stage, accounting for 33.6% of the total ester. However, the content of ethyl acetate was relatively high before fermentation, which may be from the metabolic activity of autochthonous microorganisms present in the raw materials. Isobutyl acetate, ethyl 3-methyl butanoate, propyl acetate, ethyl propanoate, ethyl isobutyrate, and ethyl lactate were identified and quantified in all fermentation samples. The total contents of these esters in stage 1 and 4 were 255.28 and 1,533.38 μg/mL, respectively, indicating that they may also have a potential effect on the aroma quality of Marselan wine. The results indicate that esters are an important factor in the formation of flavor during the brewing process of Marselan wine.

    Alcohols were the second important aromatic compound in Marselan wine, which were mainly synthesized by glucose and amino acid decomposition during alcoholic fermentation[23,24]. According to Table 2, eight alcohols including methanol, ethanol, propanol, butanol, hexanol, amyl alcohol, 3-methyl-1-butanol, and 2-methyl-1-propanol were detected in the four brewing stages of Marselan wine. The contents of ethanol (slightly sweet), 3-methyl-1-butanol (apple, brandy, spicy), and 2-methyl-1-propanol (whiskey) increased gradually during the fermentation process. The sum of these alcohols account for 91%−92% of the total alcohol content, which is the highest content of three alcohols in Marselan wine, and may be contributing to the aromatic and clean-tasting wines. On the contrary, the contents of 1-hexanol and methanol decreased gradually in the process of fermentation. Notably, the content of these rapidly decreased at the stage of alcoholic fermentation, from 2,026.07 to 1,218.98 μg/mL and 662.08 to 76.94 μg/mL, respectively, which may be ascribed to volatiles changed from alcohols to esters throughout fermentation. The reduction of the concentration of some alcohols also alleviates the strong odor during wine fermentation, which plays an important role in the improvement of aroma characteristics.

    Acids are mainly produced by yeast and lactic acid bacteria metabolism at the fermentation stage and are considered to be an important part of the aroma of wine[22]. Only one type of acid (acetic acid) was detected in this experiment, which was less than previously reported, which may be related to different brewing processes. Acetic acid content is an important factor in the balance of aroma and taste of wine. Low contents of volatile acids can provide a mild acidic smell in wine, which is widely considered to be ideal for producing high-quality wines. However, levels above 700 μg/mL can produce a pungent odor and weaken the wine's distinctive flavor[25]. The content of acetic acid increased first and then decreased during the whole fermentation process. The content of acetic acid increased rapidly in the second stage, from 719.91 to 3,914.55 μg/mL reached a peak in the third stage (5,161.81 μg/mL), and decreased to 4,630.65 μg/mL in the last stage of fermentation. Excessive acetic acid in Marselan wine may have a negative impact on its aroma quality.

    It was also found that the composition and content of aldehydes produced mainly through the catabolism of amino acids or decarboxylation of ketoacid were constantly changing during the fermentation of Marselan wines. Eight aldehydes, including furfural, hexanal, heptanal, 2-methylpropanal, 3-methylbutanal, dimethyl sulfide, (E)-2-hexenal, and (E)-2-pentenal were identified in all stage samples. Among them, furfural (caramel bread flavor) and hexanal (grass flavor) are the main aldehydes in Marselan wine, and the content increases slightly with the winemaking process. While other aldehydes such as (E)-2-hexenal (green and fruity), 3-methylbutanol (fresh and malt), and 2-methylpropanal (fresh and malt) were decomposed during brewing, reducing the total content from 536.52 to 85.15 μg/mL, which might potently affect the final flavor of the wine. Only two ketones, acetone, and 3-hydroxy-2-butanone, were detected in the wine samples, and their contents had no significant difference in the fermentation process, which might not affect the flavor of the wine.

    To more intuitively analyze the differences of volatile organic compounds in different brewing stages of Marselan wine samples, principal component analysis was performed[2628]. As presented in Fig. 3, the points corresponding to one sample group were clustered closely on the score plot, while samples at different fermentation stages were well separated in the plot. PC1 (79%) and PC2 (18%) together explain 97% of the total variance between Marselan wine samples, indicating significant changes in volatile compounds during the brewing process. As can be seen from the results in Fig. 3, samples of stages 1, 2, and 3 can be distinguished directly by PCA, suggesting that there are significant differences in aroma components in these three fermentation stages. Nevertheless, the separation of stage 3 and stage 4 samples is not very obvious and both presented in the same quadrant, which means that their volatile characteristics were highly similar, indicating that the volatile components of Marselan wine are formed in stage 3 during fermentation (Fig. S1). The above results prove that the unique aroma fingerprints of the samples from the distinct brewing stages of Marselan wine were successfully constructed using the HS-GC-IMS method.

    Figure 3.  PCA based on the signal intensity obtained with different fermentation stages of Marselan wine.

    Based on the results of the PCA, OPLS-DA was used to eliminate the influence of uncontrollable variables on the data through permutation test, and to quantify the differences between samples caused by characteristic flavors[28]. Figure 4 revealed that the point of flavor substances were colored according to their density and the samples obtained at different fermentation stages of wine have obvious regional characteristics and good spatial distribution. In addition, the reliability of the OPLS-DA model was verified by the permutation method of 'Y-scrambling'' validation. In this method, the values of the Y variable were randomly arranged 200 times to re-establish and analyze the OPLS-DA model. In general, the values of R2 (y) and Q2 were analyzed to assess the predictability and applicability of the model. The results of the reconstructed model illustrate that the slopes of R2 and Q2 regression lines were both greater than 0, and the intercept of the Q2 regression line was −0.535 which is less than 0 (Fig. 5). These results indicate that the OPLS-DA model is reliable and there is no fitting phenomenon, and this model can be used to distinguish the four brewing stages of Marselan wine.

    Figure 4.  Scores plot of OPLS-DA model of volatile components in Marselan wine at different fermentation stages.
    Figure 5.  Permutation test of OPLS-DA model of volatile components in Marselan wine at different fermentation stages (n = 200).

    VIP is the weight value of OPLS-DA model variables, which was used to measure the influence intensity and explanatory ability of accumulation difference of each component on classification and discrimination of each group of samples. In previous studies, VIP > 1 is usually used as a screening criterion for differential volatile substances[2830]. In this study, a total of 22 volatile substances had VIP values above 1, indicating that these volatiles could function as indicators of Marselan wine maturity during fermentation (see Fig. 6). These volatile compounds included furfural, ethyl lactate, heptanal, dimethyl sulfide, 1-propanol, ethyl isobutyrate, propyl acetate, isobutyl acetate, ethanol, ethyl hexanoate, acetic acid, methanol, ethyl formate, ethyl 3-methylbutanoate, ethyl acetate, hexanal, isoamyl acetate, 2-methylpropanal, 2-methyl-1-propanol, and three unknown compounds.

    Figure 6.  VIP plot of OPLS-DA model of volatile components in Marselan wine at different fermentation stages.

    This study focuses on the change of volatile flavor compounds and antioxidant activity in Marselan wine during different brewing stages. A total of 40 volatile aroma compounds were identified and collected at different stages of Marselan winemaking. The contents of volatile aroma substances varied greatly at different stages, among which alcohols and esters were the main odors in the fermentation stage. The proportion of furfural was small, but it has a big influence on the wine flavor, which can be used as one of the standards to measure wine flavor. Flavonoids and phenols were not only factors of flavor formation, but also important factors to improve the antioxidant capacity of Marselan wine. In this study, the aroma of Marselan wines in different fermentation stages was analyzed, and its unique aroma fingerprint was established, which can provide accurate and scientific judgment for the control of the fermentation process endpoint, and has certain guiding significance for improving the quality of Marselan wines (Table S1). In addition, this work will provide a new approach for the production management of Ningxia's special wine as well as the development of the native Chinese wine industry.

  • The authors confirm contribution to the paper as follows: study conception and design: Gong X, Fang L; data collection: Fang L, Li Y; analysis and interpretation of results: Qi N, Chen T; draft manuscript preparation: Fang L. All authors reviewed the results and approved the final version of the manuscript.

  • The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

  • This work were supported by the project of Hainan Province Science and Technology Special Fund (ZDYF2023XDNY031) and the Central Public-interest Scientific Institution Basal Research Fund for Chinese Academy of Tropical Agricultural Sciences in China (Grant No. 1630122022003).

  • The authors declare that they have no conflict of interest.

  • Supplemental Table S1 Sampling Information of 316 lotus accessions.
    Supplemental Table S2 Plastome size and GC content of different types of Asian lotus.
    Supplemental Table S3 Functional groups of CDS with variant events.
    Supplemental Table S4 Genetic clusters and haplotypes of lotus accessions.
    Supplemental Fig. S1 An example of some species-specific variants among the pan-plastome.
    Supplemental Fig. S2 Liner ML tree of 316 accessions.
    Supplemental Fig. S3 Liner BI tree of 316 accessions. Genetic clusters were colored as ML.
    Supplemental Fig. S4 Geographical distribution of wild Asian lotuses from different haplotypes. Blue, red, and green dotted boxes represented three distributive regions, namely, blue: northeastern China and North Korea; red: east central China; green: southern China and several Southeast Asian countries including India, Thailand, Indonesia, and Singapore.
    Supplemental Annotation.txt Annotation of plastome sequence of LA001 (Nelumbo lutea) in this study.
  • [1]

    Li H, Yi T, Gao L, Ma P, Zhang T, et al. 2019. Origin of angiosperms and the puzzle of the Jurassic gap. Nature Plants 5:461−70

    doi: 10.1038/s41477-019-0421-0

    CrossRef   Google Scholar

    [2]

    Li Y, Svetlana P, Yao J, Li C. 2014. A review on the taxonomic, evolutionary and phytogeographic studies of the lotus plant (Nelumbonaceae: Nelumbo). Acta Geologica Sinica 88:1252−61

    doi: 10.1111/1755-6724.12287

    CrossRef   Google Scholar

    [3]

    Zhang Y, Lu X, Zeng S, Huang X, Guo Z, et al. 2015. Nutritional composition, physiological functions and processing of lotus (Nelumbo nucifera Gaertn.) seeds: A review. Phytochemistry Reviews 14:321−34

    doi: 10.1007/s11101-015-9401-9

    CrossRef   Google Scholar

    [4]

    Zheng T, Li P, Li L, Zhang Q. 2021. Research advances in and prospects of ornamental plant genomics. Horticulture Research 8:65

    doi: 10.1038/s41438-021-00499-x

    CrossRef   Google Scholar

    [5]

    Xue J, Dong W, Cheng T, Zhou S. 2012. Nelumbonaceae: Systematic position and species diversification revealed by the complete chloroplast genome. Journal of Systematics and Evolution 50:477−87

    doi: 10.1111/j.1759-6831.2012.00224.x

    CrossRef   Google Scholar

    [6]

    Wu Z, Gui S, Quan Z, Pan L, Wang S, et al. 2014. A precise chloroplast genome of Nelumbo nucifera (Nelumbonaceae) evaluated with Sanger, Illumina MiSeq, and PacBio RS II sequencing platforms: Insight into the plastid evolution of basal eudicots. BMC Plant Biology 14:289

    doi: 10.1186/s12870-014-0289-0

    CrossRef   Google Scholar

    [7]

    Shi T, Rahmani RS, Gugger PF, Wang M, Li H, et al. 2020. Distinct expression and methylation patterns for genes with different fates following a single whole-genome duplication in flowering plants. Molecular Biology and Evolution 37:2394−413

    doi: 10.1093/molbev/msaa105

    CrossRef   Google Scholar

    [8]

    Guo HB, Li SM, Peng J, Ke WD. 2007. Genetic diversity of Nelumbo accessions revealed by RAPD. Genetic Resources and Crop Evolution 54:741−48

    doi: 10.1007/s10722-006-0025-1

    CrossRef   Google Scholar

    [9]

    Chen Y, Zhou R, Lin X, Wu K, Qian X, et al. 2008. ISSR analysis of genetic diversity in sacred lotus cultivars. Aquatic Botany 89:311−16

    doi: 10.1016/j.aquabot.2008.03.006

    CrossRef   Google Scholar

    [10]

    Hu J, Pan L, Liu H, Wang S, Wu Z, et al. 2012. Comparative analysis of genetic diversity in sacred lotus (Nelumbo nucifera Gaertn.) using AFLP and SSR markers. Molecular Biology Reports 39:3637−47

    doi: 10.1007/s11033-011-1138-y

    CrossRef   Google Scholar

    [11]

    Yang M, Xu L, Liu Y, Yang P. 2015. RNA-seq uncovers SNPs and alternative splicing events in Asian lotus (Nelumbo nucifera). PLoS One 10:e0125702

    doi: 10.1371/journal.pone.0125702

    CrossRef   Google Scholar

    [12]

    Huang L, Yang M, Li L, Li H, Yang D, et al. 2018. Whole genome re-sequencing reveals evolutionary patterns of sacred lotus (Nelumbo nucifera). Journal of Integrative Plant Biology 60:2−15

    doi: 10.1111/jipb.12606

    CrossRef   Google Scholar

    [13]

    Li Y, Zhu F, Zheng X, Hu M, Dong C, et al. 2020. Comparative population genomics reveals genetic divergence and selection in lotus, Nelumbo nucifera. BMC Genomics 21:146

    doi: 10.1186/s12864-019-6376-8

    CrossRef   Google Scholar

    [14]

    Liu Z, Zhu H, Zhou J, Jiang S, Wang Y, et al. 2020. Resequencing of 296 cultivated and wild lotus accessions unravels its evolution and breeding history. The Plant Journal 104:1673−84

    doi: 10.1111/tpj.15029

    CrossRef   Google Scholar

    [15]

    Fang K, Xia Z, Li H, Jiang X, Qin D, et al. 2021. Genome-wide association analysis identified molecular markers associated with important tea flavor-related metabolites. Horticulture Research 8:42

    doi: 10.1038/s41438-021-00477-3

    CrossRef   Google Scholar

    [16]

    Wu Z, Liao X, Zhang X, Tembrock LR, Broz A. 2020. Genomic architectural variation of plant mitochondria—A review of multichromosomal structuring. Journal of Systematics and Evolution 60:160−68

    doi: 10.1111/jse.12655

    CrossRef   Google Scholar

    [17]

    Biersma EM, Torres-Díaz C, Molina-Montenegro MA, Newsham KK, Vidal MA, et al. 2020. Multiple late-Pleistocene colonisation events of the Antarctic pearlwort Colobanthus quitensis (Caryophyllaceae) reveal the recent arrival of native Antarctic vascular flora. Journal of Biogeography 47:1663−73

    doi: 10.1111/jbi.13843

    CrossRef   Google Scholar

    [18]

    Peters RS, Meusemann K, Petersen M, Mayer C, Wilbrandt J, et al. 2014. The evolutionary history of holometabolous insects inferred from transcriptome-based phylogeny and comprehensive morphological data. BMC Evolutionary Biology 14:52

    doi: 10.1186/1471-2148-14-52

    CrossRef   Google Scholar

    [19]

    Kirschner P, Arthofer W, Pfeifenberger S, Záveská E, Schönswetter P, et al. 2021. Performance comparison of two reduced-representation based genome-wide marker-discovery strategies in a multi-taxon phylogeographic framework. Scientific Reports 11:3978

    doi: 10.1038/s41598-020-79778-x

    CrossRef   Google Scholar

    [20]

    Liu Y, Du H, Li P, Shen Y, Peng H, et al. 2020. Pan-genome of wild and cultivated soybeans. Cell 182:162−76.E13

    doi: 10.1016/j.cell.2020.05.023

    CrossRef   Google Scholar

    [21]

    Tao Y, Luo H, Xu J, Cruickshank A, Zhao X, et al. 2021. Extensive variation within the pan-genome of cultivated and wild sorghum. Nature Plants 7:766−73

    doi: 10.1038/s41477-021-00925-x

    CrossRef   Google Scholar

    [22]

    Song J, Guan Z, Hu J, Guo C, Yang Z, et al. 2020. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nature Plants 6:34−45

    doi: 10.1038/s41477-019-0577-7

    CrossRef   Google Scholar

    [23]

    Magdy M, Ou L, Yu H, Chen R, Zhou Y, et al. 2019. Pan-plastome approach empowers the assessment of genetic variation in cultivatedCapsicum species. Horticulture Research 6:108

    doi: 10.1038/s41438-019-0191-x

    CrossRef   Google Scholar

    [24]

    Wu Z, Gu C, Tembrock LR, Zhang D, Ge S. 2017. Characterization of the whole chloroplast genome of Chikusichloa mutica and its comparison with other rice tribe (Oryzeae) species. PLoS One 12:e0177553

    doi: 10.1371/journal.pone.0177553

    CrossRef   Google Scholar

    [25]

    Wicke S, Schneeweiss GM, dePamphilis CW, Müller KF, Quandt D. 2011. The evolution of the plastid chromosome in land plants: gene content, gene order, gene function. Plant Molecular Biology 76:273−97

    doi: 10.1007/s11103-011-9762-4

    CrossRef   Google Scholar

    [26]

    Cauz-Santos LA, da Costa ZP, Callot C, Cauet S, Zucchi MI, et al. 2020. A repertory of rearrangements and the loss of an inverted repeat region in Passiflora chloroplast genomes. Genome Biology and Evolution 12:1841−57

    doi: 10.1093/gbe/evaa155

    CrossRef   Google Scholar

    [27]

    Olmstead RG, Kim KJ, Jansen RK, Wagstaff SJ. 2000. The phylogeny of the Asteridae sensu lato based on chloroplast ndhF gene sequences. Molecular Phylogenetics and Evolution 16:96−112

    doi: 10.1006/mpev.1999.0769

    CrossRef   Google Scholar

    [28]

    Malinova I, Zupok A, Massouh A, Schöttler MA, Meyer EH, et al. 2021. Correction of frameshift mutations in the atpB gene by translational recoding in chloroplasts of Oenothera and tobacco. The Plant Cell 33:1682−705

    doi: 10.1093/plcell/koab050

    CrossRef   Google Scholar

    [29]

    Wu Z, Ge S. 2012. The phylogeny of the BEP clade in grasses revisited: evidence from the whole-genome sequences of chloroplasts. Molecular Phylogenetics and Evolution 62:573−78

    doi: 10.1016/j.ympev.2011.10.019

    CrossRef   Google Scholar

    [30]

    Gu C, Tembrock LR, Johnson NG, Simmons MP, Wu Z. 2016. The complete plastid genome of Lagerstroemia fauriei and loss of rpl2 intron from Lagerstroemia (Lythraceae). PLoS One 11:e0150752

    doi: 10.1371/journal.pone.0150752

    CrossRef   Google Scholar

    [31]

    Zhou J, Zhang S, Wang J, Shen H, Ai B, et al. 2021. Chloroplast genomes in Populus (salicaceae): Comparisons from an intensively sampled genus reveal dynamic patterns of evolution. Scientific Reports 11:9471

    doi: 10.1038/s41598-021-88160-4

    CrossRef   Google Scholar

    [32]

    Andreu Sánchez S, Chen W, Stiller J, Zhang G. 2021. Multiple origins of a frameshift insertion in a mitochondrial gene in birds and turtles. GigaScience 10:giaa161

    doi: 10.1093/gigascience/giaa161

    CrossRef   Google Scholar

    [33]

    Avise JC. 2004. Molecular markers, natural history, and evolution (2nd edition). In The Auk, ed. Lovette IJ. 121:684. Sinauer Associates, Sunderland, Massachusetts. pp. 1298–99 https://doi.org/10.1093/auk/121.4.1298

    [34]

    Wang Z, Jiang Y, Bi H, Lu Z, Ma Y, et al. 2021. Hybrid speciation via inheritance of alternate alleles of parental isolating genes. Molecular Plant 14:208−22

    doi: 10.1016/j.molp.2020.11.008

    CrossRef   Google Scholar

    [35]

    Guo C, Guo Z, Li D. 2019. Phylogenomic analyses reveal intractable evolutionary history of a temperate bamboo genus (Poaceae: Bambusoideae). Plant Diversity 41:213−19

    doi: 10.1016/j.pld.2019.05.003

    CrossRef   Google Scholar

    [36]

    Choi JY, Purugganan MD. 2018. Multiple origin but single domestication led to Oryza sativa. G3 Genes|Genomes|Genetics 8:797−803

    doi: 10.1534/g3.117.300334

    CrossRef   Google Scholar

    [37]

    He W, Chen C, Xiang K, Wang J, Zheng P, et al. 2021. The history and diversity of rice domestication as resolved from 1464 complete plastid genomes. Frontiers in Plant Science 12:781793

    doi: 10.3389/fpls.2021.781793

    CrossRef   Google Scholar

    [38]

    Huang Y, Wang J, Yang Y, Fan C, Chen J. 2017. Phylogenomic analysis and dynamic evolution of chloroplast genomes in Salicaceae. Frontiers in Plant Science 8:1050

    doi: 10.3389/fpls.2017.01050

    CrossRef   Google Scholar

    [39]

    Scossa F, Fernie AR. 2021. When a crop goes back to the wild: Feralization. Trends in Plant Science 26:543−45

    doi: 10.1016/j.tplants.2021.02.002

    CrossRef   Google Scholar

    [40]

    Hall R, van Hattum MWA, Spakman W. 2008. Impact of India–Asia collision on SE Asia: The record in Borneo. Tectonophysics 451:366−89

    doi: 10.1016/j.tecto.2007.11.058

    CrossRef   Google Scholar

    [41]

    Royer AM, Waite-Himmelwright J, Smith CI. 2020. Strong selection against early generation hybrids in joshua tree hybrid zone not explained by pollinators alone. Frontiers in Plant Science 11:640

    doi: 10.3389/fpls.2020.00640

    CrossRef   Google Scholar

    [42]

    Hübner S, Bercovich N, Todesco M, Mandel JR, Odenheimer J, et al. 2019. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nature Plants 5:54−62

    doi: 10.1038/s41477-018-0329-0

    CrossRef   Google Scholar

    [43]

    Keeling PJ, Palmer JD. 2008. Horizontal gene transfer in eukaryotic evolution. Nature Reviews Genetics 9:605−18

    doi: 10.1038/nrg2386

    CrossRef   Google Scholar

    [44]

    Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, et al. 2012. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology 19:455−77

    doi: 10.1089/cmb.2012.0021

    CrossRef   Google Scholar

    [45]

    Wick RR, Schultz MB, Zobel J, Holt KE. 2015. Bandage: Interactive visualization of de novo genome assemblies. Bioinformatics 31:3350−52

    doi: 10.1093/bioinformatics/btv383

    CrossRef   Google Scholar

    [46]

    Shen W, Le S, Li Y, Hu F. 2016. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One 11:e0163962

    doi: 10.1371/journal.pone.0163962

    CrossRef   Google Scholar

    [47]

    Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754−60

    doi: 10.1093/bioinformatics/btp324

    CrossRef   Google Scholar

    [48]

    Ye J, McGinnis S, Madden TL. 2006. BLAST: Improvements for better sequence analysis. Nucleic Acids Research 34:W6−W9

    doi: 10.1093/nar/gkl164

    CrossRef   Google Scholar

    [49]

    Lehwark P, Greiner S. 2019. GB2sequin - A file converter preparing custom GenBank files for database submission. Genomics 111:759−61

    doi: 10.1016/j.ygeno.2018.05.003

    CrossRef   Google Scholar

    [50]

    Katoh K, Rozewicki J, Yamada KD. 2019. MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization. Briefings in Bioinformatics 20:1160−66

    doi: 10.1093/bib/bbx108

    CrossRef   Google Scholar

    [51]

    Rozas J, Ferrer-Mata A, Sánchez-DelBarrio JC, Guirao-Rico S, Librado P, et al. 2017. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Molecular Biology and Evolution 34:3299−302

    doi: 10.1093/molbev/msx248

    CrossRef   Google Scholar

    [52]

    Ginestet C. 2011. ggplot2: Elegant graphics for data analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society) 174:245−46

    doi: 10.1111/j.1467-985X.2010.00676_9.x

    CrossRef   Google Scholar

    [53]

    Leigh JW, Bryant D. 2015. POPART: full-feature software for haplotype network construction. Methods in Ecology and Evolution 6:1110−16

    doi: 10.1111/2041-210X.12410

    CrossRef   Google Scholar

    [54]

    Li Y, Chao T, Fan Y, Lou D, Wang G. 2019. Population genomics and morphological features underlying the adaptive evolution of the eastern honey bee (Apis cerana). BMC Genomics 20:869

    doi: 10.1186/s12864-019-6246-4

    CrossRef   Google Scholar

    [55]

    Nguyen LT, Schmidt HA, Von Haeseler A, Minh BQ. 2015. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution 32:268−74

    doi: 10.1093/molbev/msu300

    CrossRef   Google Scholar

    [56]

    Alexander DH, Lange K. 2011. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics 12:246

    doi: 10.1186/1471-2105-12-246

    CrossRef   Google Scholar

  • Cite this article

    Wang J, Liao X, Gu C, Xiang K, Wang J, et al. 2022. The Asian lotus (Nelumbo nucifera) pan-plastome: diversity and divergence in a living fossil grown for seed, rhizome, and aesthetics. Ornamental Plant Research 2: 2 doi: 10.48130/OPR-2022-0002
    Wang J, Liao X, Gu C, Xiang K, Wang J, et al. 2022. The Asian lotus (Nelumbo nucifera) pan-plastome: diversity and divergence in a living fossil grown for seed, rhizome, and aesthetics. Ornamental Plant Research 2: 2 doi: 10.48130/OPR-2022-0002

Figures(5)  /  Tables(1)

Article Metrics

Article views(9229) PDF downloads(1326)

ARTICLE   Open Access    

The Asian lotus (Nelumbo nucifera) pan-plastome: diversity and divergence in a living fossil grown for seed, rhizome, and aesthetics

Ornamental Plant Research  2 Article number: 2  (2022)  |  Cite this article

Abstract: The Asian lotus (Nelumbo nucifera) has a history of cultivation in Asia dating back over 3,000 years where it has been an important food crop producing edible rhizomes and seeds as well as flowers of great aesthetic and cultural value. Here, we de novo assembled the plastomes of 316 lotus accessions including five North American lotus (N. lutea) and 311 Asian lotus (N. nucifera) to construct a pan-plastome genome map, and investigate the phylogeography and genetic diversity among the only two extant species within this living fossil lineage. A total of 113 unique genes were annotated and plastome sizes varied between 163,457 and 163,672 bp with only minor differences in each of the four major genomic units. The most abundant nucleotide differences among plastomes were single nucleotide variants followed by insertions/deletions and block substitutions mainly found in intergenic spacer regions of the large single copy portion of the plastome. Seven well-supported genetic clusters were resolved using multiple different population structure analyses. The different lotus types (flower, seed, rhizome, or wild) were disproportionally assigned to multiple different genetic clusters. This pattern indicates that the domestication of Asian lotus involved multiple genetic origins and possible matrilineal introgression. Geographic mapping of accessions also revealed that genetic diversity is unevenly distributed with eastern China possessing the highest genetic diversity and regions such as Yunnan, Indonesian, and Thailand possessing unique haplotypes. These results provide an important maternal history of Nelumbo and necessary groundwork for future studies on intergenomic gene transfer, cytonuclear incompatibility, and conservation genetics.

    • Nelumbo Adans. (Proteales, Nelumbonaceae), is a genus of aquatic plant species with an estimated origin of 135 million years ago (mya)[1], making it one of the earliest diverging eudicot lineages. Given the phylogenetic position in the early eudicots and the morphological similarity of extant species with fossil taxa, Nelumbo is regarded as a living fossil. Two extant species, Nelumbo lutea (Willd.) Pers. (American lotus) and Nelumbo nucifera Gaertn. (Asian lotus) are recognized in this genus[2]. The Asian lotus (also referred to as sacred lotus and 莲 'lian' in Chinese) is distributed throughout Eastern Asia and northern Oceania in freshwater habitats. The cultivation of Asian lotus is thought to have begun more than 3,000 years ago for the production of edible seeds[3]. In addition to seeds, Asian lotus is also grown for the large edible rhizome it produces, and as an ornamental in water gardens. From these different uses, Asian lotus growers and researchers categorize the different types of plants into seed, rhizome, and flower types based on morphological characteristics that best suit each of these applications[4]. In addition to the cultivated types, wild Asian lotus is common throughout east and southeast Asia in lakes and ponds.

      Because Asian lotus is an important food plant, extensive molecular work has been conducted to better understand the genetic diversity and history of this species. Some examples of this work include estimating the divergence time between the two Nelumbo species from complete plastome sequences at 1.5 mya[5,6] and the discovery of an ancient whole genome duplication unique to Nelumbo using high-quality nuclear genome assemblies[7]. Population structure and genetic diversity in Asian lotus have also been extensively studied using several different markers including random amplified polymorphic DNA (RAPD)[8], inter-simple sequence repeats (ISSR)[9], amplified fragment length polymorphisms (AFLP), simple sequence repeats (SSR)[10], single nucleotide polymorphisms (SNPs)[11], and whole-genome resequencing methods[1214]. From the above studies, higher genetic diversity was generally found among wild Asian lotus compared to cultivated Asian lotus, and among the cultivated lineages, seed and rhizome types resolved in distinct clades. However, some conflicts remained unresolved among these studies. In particular, Huang et al.[12] determined that the seed lotuses were monophyletic in respect to wild and rhizome lotuses but with low bootstrap support, while Liu et al.[14] found that seed and flower lotuses possessed higher genetic diversity, and were more often crossbred to each other than either were to rhizome lotuses. While many of the previous studies focused on patterns of genetic diversity, few have integrated geographic origin into their analyses, leaving gaps in our knowledge regarding centers of origin of the different Asian lotus types[14]. For example, wild Asian lotus from Indonesia and their relation to cultivated types has not been properly characterized in previous studies. Given issues with incomplete lineage sorting associated with whole genome duplications[15], using a pan-plastome approach can provide an improved resolution regarding questions of population structure, centers of origin, and assignment of cultivated types to well supported genetic clusters.

      Chloroplasts (plastid refers to all membrane bound organelles of the same origin but serving different metabolic functions such as chloroplasts, chromoplasts, and leucoplasts) are the photosynthesis organelle in plant and algae cells, originating from cyanobacteria through an ancient endosymbiotic event and contain a distinct streamlined genome primarily made up of photosynthesis and replication related genes[16]. Compared to the nuclear genome, the plastome is uniparentally inherited and nonrecombinant, which can provide a less noisy signal for inferring relatedness especially in lineages with polyploidy, incomplete lineage sorting, and/or frequent introgression[17]. Most previous studies in phylogenomics have been focused at the species level or above and often employ genomic simplification strategies such as using only the transcriptome[18] or reduced-representation/finely-filtered genomes[19] in the final analyses. As whole-genome sequencing and assembly have gotten more accurate and complete, large intraspecies collections of genomes (often referred to as the pan-genome) are now being published that include all or nearly all major nucleotide variants found in a given lineage across the entire genome. Such pan-genomes have been produced for important agronomic plant species such as Glycine soja[20], Sorghum bicolor[21], and Brassica napus[22]. Similarly, pan-plastomes are now being generated for several plant species with the first such dataset involving 321 complete plastomes to differentiate pepper (Capsicum) cultivars and lineages[23]. As with other phylogenomic approaches pan-plastomics have several advantages over nuclear pan-genomes such as larger more complete reference sets for assembly and comparison, occurrence in higher copy number in the cell resulting in greater read depth, and the absence of large duplicate gene arrays reducing problems associated with paralogy[24]. Therefore, we employed a pan-plastome approach to address several outstanding questions regarding Nelumbo cultivation and evolution. In addition, the dataset presented here is an important comparative resource for pan-plastome studies in other species which are at present uncommon.

      Here, we assembled a large plastome data set including 316 (five N. lutea and 311 N. nucifera) complete circular plastomes to: (a) construct a reliable pan-plastome map for Nelumbo, (b) identify genomic patterns in the data set, such as mutational hotspots and characterization of different nucleotide variants, and (c) resolve well supported maternal lineages within Nelumbo and relate these to the different cultivated and wild types including the sister species N. lutea to address questions regarding origin and relatedness. For convenience, the names of the different lotus types and species used in this study were simplified as follows: N. lutea (North American lotus): LA; wild Asian lotus: LW; flower lotus: LF; seed lotus: LS; and rhizome lotus: LR (all cultivated types are from N. nucifera).

    • To characterize the plastome structure and genomic organization of N. lutea and N. nucifera, comprehensive comparisons were conducted with regards to genome size, tetrad length, GC content, and gene order and function (Fig. 1, Supplemental Table S2). All plastomes assembled as part of this study retained the typical quadripartite structure found in most chloroplast genomes (comprised of a LSC and SSC separated by a pair of IRs). The plastome size across all 316 accessions varied from 163,457 to 163,672 bp (Med = 163,647 bp). Among the different plastome regions, the LSC size ranged from 91,746 bp to 91,914 bp (Med = 91,888 bp), the SSC between 19,605 and 19,639 bp (Med = 19,627 bp), and the IRs from 26,053 bp to 26,071 bp (Med = 26,066 bp). The total GC content (%) of the complete plastomes ranged from 37.95 to 38.00 (Med = 37.96) with 36.17 to 36.22 (Med = 36.19) for the LSCs, 32.22 to 32.34 (Med = 32.25) for the SSCs, and 43.18 to 43.22 (Med = 43.19) for the IRs. In all four regions and in total length the N. lutea plastomes were shorter in length than the N. nucifera plastomes.

      Figure 1. 

      The pan-plastome of Nelumbo. The inner genes of the outer circle are transcribed counterclockwise while the outer genes are transcribed clockwise; genes with introns were marked with an asterisk (ycf3, clpP, and rps12 contain two introns all others contain one). The GC content is displayed as gray bars inside of the tetrad divisions (LSC, SSC, IRA, and IRB) SNVs, InDels, and block substitutions are represented within the GC content as orange, blue, and yellow lines, respectively. Comparisons of the complete genome length and each of the four subregions (given in median lengths) are compared for each of the different Nelumbo types N. lutea LA, LW, LF, LS, and LR.

      A total of 113 unique genes were annotated and grouped into functional categories as follows: 79 protein-coding genes (PCGs), 30 transfer RNA (tRNA) genes, and four ribosomal RNA (rRNA) genes. The annotated genes were located in the following genomic regions: 83 genes (61 PCGs and 22 tRNA genes) in the LSC, 12 (11 PCGs and one tRNA gene) in the SSC, and 18 (seven PCGs, seven tRNA genes, and all four rRNA genes) duplicated in the IRs. Among the genes 18 (13 located in LSC, one in SSC and four in the IRs) including 12 PCGs (atpF, clpP, ndhA, ndhB, petB, petD, rpl2, rpl16, rpoC1, rps12, rps16, and ycf3) and six tRNA (trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) contained introns. Of the genes containing introns three PCGs ycf3, clpP, and trans-spliced rps12 (characterized by the first exon locating in LSC and the other two in the IRs) contained two introns (Fig. 1; Supplemental Annotation). These results indicate a highly conserved genome structure and gene content across the Nelumbo pan-plastome.

    • In order to assess nucleotide variants, all plastomes were aligned and scanned for variants. The pan-plastome alignment length was 164,754 bp, in which 164,058 (99.58%) sites were conserved and 696 (0.42%) sites were variable of which 577 sites were parsimony-informative and 45 were autapomorphic. Using LA001 as a reference, variants among 316 plastomes were identified as either SNVs, InDels, or block substitutions. Among these variants, SNVs were the most abundant (418, 60.06%) followed by InDels (208, 29.89%) and block substitutions (70, 10.06%). Variants were unevenly distributed throughout the pan-plastome (Table 1, Fig. 1). Most variants were located in the LSC (487, 69.97%), followed by the SSC (167, 23.99%), and the fewest in the highly conserved IRs (42, 6.03%). In regard to the location of variants to genes, intergenic spacer regions (IGS) contained the most (510, 73.28%) variants, while cds (124, 17.82%) and intronic (62, 8.91%) sequences contained far fewer (Fig. 2). A total of 499 fixed variants (294 SNVs, 151 InDels, and 54 block substitutions) were found that could distinguish American from Asian lotuses (Table 1, Supplemental Fig. S1). Nonsynonymous mutations were found most often in the PCGs accD, ccsA, cemA, matK, ndhE, ndhF, rpoC1, ycf1, and ycf2. When assessed by functional group, these nonsynonymous mutations were most abundant in the groups NADH dehydrogenase and RNA polymerase (Fig. 2, Supplemental Table S3). After excluding sites with gaps, a total of 162,648 sites were retained with 566 variable sites, among which 31 were autapomorphic and 535 were parsimony-informative including 529 bi-allelic sites and six tri-allelic sites (i.e., site 16,914 in rpl33, site 18,026 in psaJ-trnP-UGG, site 40,576 in trnL-UAA-trnT-UGU, site 40,801 in trnT-UGU-rps4, site 90,747 in psbA, and site 138,254 in ycf1).

      Table 1.  Number of variants among 316 Nelumbo accessions. The number of variants found only in N. nucifera are indicated in parenthesis.

      VariantsTotalRegionLocation
      LSCSSCIRA/BCDSIntronIGS
      SNV418 (294)274 (182)112 (89)16 (12)117 (91)33 (23)268 (180)
      Block substitution70 (54)56 (42)12 (11)1 (0)3 (3)4 (4)63 (46)
      InDel208 (151)157 (118)43 (27)4 (3)4 (2)25 (19)179 (130)
      Total696 (499)487 (342)167 (127)21 (15)124 (96)62 (46)510 (356)

      Figure 2. 

      Variant locations categorized by genic position (CDS, Introns, and IGS) and functional group.

    • In order to elucidate matrilineal relationships among lotuses, multiple approaches were employed to resolve population structure and calculate genetic diversity as relates to the different species and cultivated types. Based on a SNV only input matrix, population structure as inferred using ADMIXTURE was determined based on the lowest CV error of 0.026 at K = 7 (Fig. 3d). All N. lutea individuals resolved into a distinct genetic cluster with no cross assignment to any of the N. nucifera genetic clusters (designated as genetic clusters I−VI, Fig. 3e). Individual assignment proportions (based on q-values) were high among nearly all Asian lotus individuals except for six accessions in genetic cluster III that had less than 35% assignment to genetic cluster I and two individuals with less than 30% assignment to genetic cluster VI as well as 15 accessions in genetic cluster in VI with less than 45% assignment to genetic cluster V. The other three methods, namely, an ML tree using a SNV only matrix, PCA using a complete plastome matrix, and a median-joining network using a gap-excluded matrix, all corroborated the population structure resolved in the ADMIXTURE analyses. For example, the close relationship between genetic clusters V and VI is resolved across methods by: (1) resolution as sister clades in the ML analyses, (2) multiple connecting nodes and few variants on branches separating V and VI in the median-joining network, (3) partial overlap between V and VI in the PCA graph, and (4) partial assignment of some individuals in VI to V in the ADMIXTURE analyses (Fig. 3ac, 3e).

      Figure 3. 

      Population structure of Nelumbo. All analyses resolved similar membership into six genetic clusters (I−VI). (a) ML tree of 316 accessions. (b) Median-joining network with haplotype identifiers adjacent to nodes, size of pie chart proportional to number of accessions sharing the same haplotype, colors in the pie chart represent percentage of accessions from a given Asian lotus type. (c) PCA analysis including LA (upper left inset) and excluding LA showing the first two components in both cases. (d) CV errors across a range of K values from 3−9. (e) Population structure bar-plot at K = 7.

      The different Asian lotus types were unevenly represented among genetic clusters (Fig. 3a, Supplemental Fig. S2 & S3). Genetic cluster I contained three (21%) LF and 11 (79%) LW accessions; II contained 1 (3%) LF, 1 (3%) LR, 29 (91%) LS, and 1 (3%) LW; III contained 1 (3%) LS, and 34 (97%) LW; IV contained 26 (61%) LF, 1 (2%) LR, 15 (35%) LS, and 1 (2%) LW; V contained 7 (18%) LF, 4 (11%) LR, 10 (26%) LS, and 17 (45%) LW; and VI contained 2 (1%) LF, 148 (89%) LR, and 16 (10%) LW. Furthermore, rhizome lotuses were found in four of six genetic clusters with 12 haplotypes; flower lotuses found in five of six genetic clusters with nine haplotypes, and wild lotuses found in all six genetic clusters with 28 haplotypes.

      From the median-joining network 42 haplotypes were designated among the 316 accessions examined (designated h1−h42). The four haplotypes (h1−h4) from N. lutea were denoted for possessing a large number of nucleotide variants separating the haplotypes within the species and an even larger number of variants separating these haplotypes from N. nucifera resulting in high levels of genetic and haplotypic diversity (π = 2.02e-04, Hd = 0.900) for the LA genetic cluster. Nucleotide and haplotype diversity among the six genetic clusters within Asian lotus varied widely (Fig. 4). Genetic cluster I contains 2 haplotypes with π =8.7300e-07 and Hd = 0.143; II 5 haplotypes with π = 3.7673e-05 and Hd = 0.738; III 9 haplotypes with π = 1.5533e-04 and Hd = 0.882; IV 2 haplotypes with π = 5.6860e-07 and Hd = 0.092, V 10 haplotypes π = 3.7673e-05, and Hd = 0.738; and VI 10 haplotypes with π = 8.8611e-05 and Hd = 0.533.

      Figure 4. 

      Genetic diversity and differentiation of six genetic clusters of Asian lotus. Numbers above lines connecting two bubbles represent pairwise Fst calculated between respective genetic clusters.

      As with genetic diversity, divergence patterns varied between genetic clusters. Genetic cluster III had the lowest level of divergence when compared to all other genetic clusters (III to I Fst = 0.823; III to II Fst = 0.701; III to IV Fst = 0.670; III to V Fst = 0.553; and III to VI Fst = 0.657). By contrast genetic cluster I had higher levels of divergence when compared to the other Asian lotus genetic clusters (I to II Fst = 0.921; I to III Fst = 0.823; I to IV Fst = 0.998; I to V Fst = 0.971; and I to VI Fst = 0.985). The typical levels of divergence between genetic clusters are more similar to that found between genetic cluster I and others, with genetic cluster III being an outlier in regard to the lower levels of divergence to all other genetic clusters.

    • In order to resolve patterns related to origin and dispersal of Asian lotuses individuals were mapped by genetic cluster and type (Fig. 5, Supplemental Tables S1 & S4). The Asian lotus collections used in this study were made from three main areas of historical cultivation including: (1) northeastern China and North Korea; (2) east central China; and (3) southern China and several Southeast Asian countries including India, Thailand, Indonesia, and Singapore (Fig. 5). All wild lotuses in genetic cluster I were collected from Yunnan province with a single flower type collected in east central China and all remaining flower types in this cluster also collected from Yunnan. Most seed lotus accessions (28/32) in genetic cluster II were from central China with a single wild accession collected from northern Thailand, a single flower type from Japan, and a single rhizome type collected from central eastern China (Fig. 5). Accessions assigning to genetic cluster III were broadly arrayed throughout Southeast Asia in the countries of India, Indonesia, Singapore, Thailand as well as eastern China. Within genetic cluster III several haplotypes were found to have geographically narrow distributions such as h25 and h28 in China, h32 and h34 in India, h41 and h42 in Indonesia, h32 in Thailand, and h35 in Singapore. Nearly all of the accessions assigned to genetic cluster IV were collected from China (a single sample from Japan), mainly of haplotype h6 (42/43), and most of seed or flower type (26/43 flower and 15/43 seed). Accessions from genetic clusters V and VI were all collected from across eastern China except for a single accession from North Korea. Haplotypes in genetic cluster V were widespread across China except for h27, h30, h31, h37, h38, h39, and h40 which were each only collected once. All of these uncommon haplotypes in genetic cluster V were wild type lotuses. In genetic cluster VI all accessions from the northeastern most provinces of Heilongjiang and Jilin were all wild types. Most haplotypes in genetic cluster VI were geographically widespread except for h15, h16, h18, h26, and h29 which were uncommon and only collected in the provinces of Jiangsu, Guizhou, Shandong, Heilongjiang, and Anhui respectively.

      Figure 5. 

      Geographic distribution of Asian lotus collections used in this study separated by genetic cluster and color coded by type (see Fig. 3).

    • Plastomes are highly conserved in most land plants in terms of size, structure, and gene content, with lotuses being no exception in this regard[25]. The pan-plastome resolved here indicates that the gene content and order were highly conserved and consistent with those previously described in Xue et al.[5] and Wu et al.[6]. Despite structural and genic conservation, abundant nucleotide variants were found across the Nelumbo pan-plastome. Relatively few variants were detected in cds regions, save ycf1 which was extraordinarily rich in nucleotide variants, possible because of its position spanning the junction of IRA and SSC (Figs 1 & 2). This is similar to what has been previously reported for junction spanning genes in Passiflora trichocarpa[26]. Following ycf1 in a number of cds variants were rpoC1 and ndhF which have also been found to contain a higher number of nucleotide variants than other functional plastome genes and thus their use in phylogenetic studies for eudicot groups like Apioideae, Cactoideae, and Asteridae[27]. While mutations (especially InDels) in cds regions are expected to result in a loss of function in translated proteins, ribosomal frameshifting in plastomes has been shown to recover original functions from cds containing mutations[28]. As such mutations in some plastome cds regions may have less of an impact than predicted. Unlike the relatively limited number of cds variants the type and abundance of nucleotide variants in IGS and introns were considerably greater. For instance, block substitutions in cds regions are found only in the ycf1 and rpoA genes whereas they are relatively abundant in IGS and intronic regions (although less so in introns). Specifically the large number of variants found in the Nelumbo rpl16 intron is similar to that found in the distantly related plant families Crypteroniaceae and Poaceae where this hypervariable region was employed in phylogenetic studies[29,30]. Similarly, the presence of InDels is higher outside of cds regions. This pattern of variant abundance and type found in the Nelumbo pan-plastome follows that found in other plastomes[31]. Because frameshift mutations result in greater disruption to protein structure and function, they are often purged via selection[32]. That said, frameshift correction through translation recoding has been recently described from chloroplasts[28] which may render some cds mutations less impactful by retaining original protein function across lineages despite underlying differences in DNA. As more pan-plastome studies are completed it is becoming increasingly apparent that genomic regions previously used for higher-level systematic studies such as rbcL and matK should be supplemented with hypervariable regions found in IGS and intronic regions for improved resolution in intraspecific studies. In this study the IGS regions rps18-rpl33 and trnQ-UUG-rps16 proved especially rich with informative markers. Our pan-plastome study like those from cultivated species Brassica napus and Sorghum sp[21,22] found that SNVs are by far the most common variant type. However, one of the outstanding questions is whether variants differ in type and effect (in regard to gene function) between domesticated lineages and wild progenitors. With the completion of more pan-plastome studies from diverse cultivated taxa, patterns specific to domesticated lineages can now be resolved to try to understand the function and importance of plastids in the domestication syndrome specifically, and in plant evolutionary biology more generally.

    • Plant population structure and genetic diversity are known to be affected through a number of different processes including genetic drift, reproductive isolation, local adaptation, demographic fluctuations, mode of reproduction, and additionally from artificial selection and human translocation associated with the domestication process[33]. Such patterns are evident in this pan-plastome study of Nelumbo wherein geographic separation between N. nucifera and N. lutea is reflected in the numerous fixed genetic differences between these species (Fig. 3). Both analyses based on nuclear[13,14] and plastomic dataset here supported the indubitable divergence between these two Nelumbo species, while some differences were also found regarding phylogeny of genetic clusters within N. nucifera mainly due to the conflicts of maternal and paternal inheritance (nucleocytoplasmic conflicts) common seen in many other species[34,35], which was also important evidence of hybridization or introgression, such as most seed lotus accessions were resolved as monophyly in the previous two researches, but into genetic clusters II and IV here. Additionally, sample differences between the two previous works, and the limited genetic information plastome carried compared to nuclear loci controlling morphologic traits used to designated lotus types could also cause these differences like seed and flower lotuses in genetic cluster IV, but not much. This was also reflected by the much lower genetic diversities of each genetic cluster compared to that in Li et al.[13], and Liu et al.[14]. Genetic clusters II and III showed much higher genetic diversity than others the same as nuclear analyses in Liu et al.[14] regarding seed and wild types, whichever genetic cluster VI (rhizome type) showed relatively lower genetic diversity.

      Within N. nucifera, six well-supported genetic clusters were resolved with notable differences in the genetic and haplotypic diversity as well as the cultivated types found in each (Fig. 3). For instance, genetic cluster III is characterized by having a large number of haplotypes each separated by many genetic differences with few repeats per haplotype. In addition, the membership of genetic cluster III is made up of all wild accession except for a single seed type (LS036, h22). One possible interpretation from this pattern is that genetic cluster III represents a wild lineage from which few cultivated types have been selected. This interpretation is further confirmed by noting that the patterns resolved in genetic cluster III are similar to those found in wild N. lutea, although more sampling in N. lutea is needed to confirm this pattern. It suggested that each cultivated type — flower, rhizome, or seed lotus was not single-originated (cultivated from the single wild population) because no cultivated type was found solely within single genetic cluster, implying potential multiple origins for all types and/or maternal introgression into cultivated types from different origins, as like the instances where it was clearly known when the cultivated rice was initially selected from certain cultivated lineages[36,37]. Types can be further resolved by haplotype wherein several types are sometimes found within a single haplotype. For instance, the largest haplotype h7 (genetic cluster VI) with 109 accessions is made up of 1% flower, 93% rhizome, and 6% wild types. Furthermore, the relatively narrow distributions of wild lotus in genetic clusters I and III, while cultivated types in genetic clusters V and VI further expanded their range, indicating the domestication and cultivation history of lotus has gradually expanded under the action of human activity. Genetic diversity also showed a decreasing trend from wild to cultivated types (genetic clusters III to V to VI), which may also be a signal of human domestication. It showed that cultivated types were selected to be cultivated from multiple origins or if it has maternal introgression, both of which could result in a polyphyletic pattern among the cultivated types, for instance, rhizome lotuses were found in four of six genetic clusters with 12 haplotypes; flower lotuses found in five of six genetic clusters with nine haplotypes, and wild lotuses found in all six genetic clusters with 28 haplotypes (Fig. 3a, b, & Fig. 5). Given this pattern among our plastomic data a monophyletic origin for seed lotuses is not supported[12], however because seed lotuses were found in four out of six genetic clusters with eight unique haplotypes, claims regarding diversity are supported by our data. Based on this wild type, lotuses contain the highest level of genetic diversity with cultivated types also exhibiting high levels of plastomic diversity. It should also be noted that the classification of cultivated types was based on their primary use, and some types also have traits that make them usable for other purposes, which might cause some tenuous designations to bring out conflicting results in determining monophyly. An important step in understanding the evolution of cultivated lotuses would be to analyze the nuclear genes involved with lotus domestication[13,38] in concert with the pan-plastome data to better understand how plastome divergence is concordant with patterns of artificial selection detected in the nuclear genome. Such findings may help to elucidate patterns of introgression in the domestication of lotus and how plastomes might have been involved in controlling the directionality of crosses through cytonuclear-incompatibility.

      With regard to the geographic origins of cultivated lotuses, several inferences can be made. Genetic cluster I has a probable origin in Yunnan province as all wild accessions were collected there and this genetic cluster has been the matrilineal source for a very small number of flower type cultivars (two flower types from genetic cluster I collected in this study). Of any of the geographic patterns genetic cluster I is the most restricted and least selected from in generating lotus cultivars. The only wild accession in genetic cluster II was from Chiang Mai, Thailand suggesting that this may be the origin of the many seed types collected from this genetic cluster in central eastern China (Fig. 5). However alternative inferences include matrilineal gene flow into wild Thai populations or the Thai accession is the result of an escaped cultivar[39]. Given that higher genetic diversity is present in China within genetic cluster II, the alternative inferences cannot be ruled out as centers of origin can also be centers of genetic diversity. Genetic cluster III like I is made up primarily of wild type accessions but unlike genetic cluster I, III is geographically distributed throughout Southeast Asia and eastern China. Additionally, haplotypes within genetic cluster III are restricted to a given geographic location. As such, genetic cluster III may represent a lineage that broadly dispersed in the distant past and thereafter through adaptation and drift have produced localized haplotypes. The geography of island and peninsula formation in the Sunda Shelf over the last 50 million years may have helped drive this pattern[40]. Genetic clusters IV, V, and VI all appear to originate in eastern China as no wild accession were found outside this geographic area. In genetic cluster IV, a single wild accession from Yunnan shares the h6 haplotype with 98% of the mostly flower and seed type accessions in this genetic cluster. The low nucleotide and haplotype diversity of matrilineal genetic cluster IV is counter to findings found among seed and flower lotuses where high levels of admixture have been noted in these lotus types using nuclear data. That said it is possible to have had a matrilineal bottleneck induced from cytoplasmic incompatibility within a lineage while maintaining a highly diverse and admixed nuclear genome over time[41]. That said, flower and seed types are found in nearly all of the genetic clusters, albeit only two out of 166 are flower type and no seed types in genetic cluster VI, suggesting high levels of maternal introgression among flower and seed types across genetic clusters. Genetic cluster VI is clearly the source of most rhizome type lotuses and because the plant part selected for is unrelated to sexual reproduction, a few very common haplotypes (resulting from asexual reproduction via rhizome cuttings to plant fields) account for nearly all rhizome types in this genetic cluster. Despite most accessions in genetic cluster VI having only a few haplotypes, numerous wild haplotypes were also assigned to this cluster with some having restricted geographic distribution (Supplemental Fig. S4). This suggests that a good deal of wild diversity remains throughout eastern China and especially in the northeastern region.

      The domestication of aquatic plants for human consumption is unsurpassed in diversity and extent outside of the eastern coastal plain of China. Lotus, because of the many parts of the plant that can be used for human consumption and the health benefits from eating these parts, has been and will continue to be an important food crop for humans. As with any crop, genetic diversity is essential to maintain high levels of nutrition, disease resistance, yields, and improving or developing traits of interest[42]. Wild populations of Asian lotus are known to be threatened by human development and environmental pollution[13] making the characterization and mapping of genetic diversity all the more important in prioritizing conservation efforts. Our study has shown that cultivated and wild Asian lotus are divided into at least six maternal lineages with geographic distribution and selection of lotus types differing between genetic clusters. From these results, several regions in China (namely Yunnan and the northeast) as well as regions in southeast Asia should be explored further to more properly characterize the unique genetic diversity of lotuses from these areas. In addition, these wild haplotypes should be assessed for their potential use in developing new lotus cultivars. The experimental breeding of diverse lotuses may also provide useful insights into cytonuclear incompatibility and further our understanding of genomic evolution in this living fossil lineage. In summary, the pan-plastome resources presented here for lotus will provide new insights into the natural and domestication history of this lineage as well as prove useful in applied studies such as marker-assisted breeding or the development of transplastomic lines for improved yield or disease resistance.

    • The de novo assembly of all plastomes to complete circular molecules was conducted rather than SNV calling to a reference genome because de novo assembly allows for the assessment and removal of intergenomic (e.g., horizontal gene transfer from chloroplast to nuclear genome) transfer sequences[43]. From the total 365 accessions provided in Li et al.[13] and Liu et al.[14], some accessions (one LA, 17 LW, 11 LF, four LS, and 16 LR) had to be discarded because complete plastomes could not be assembled due to lower sequencing quality in these samples. In the final set, a total of 316 plastomes (five LA, 63 LW, 39 LF, 49 LS, and 160 LR) were successfully assembled for use in downstream analyses. Except for N. lutea sampled from North America, the N. nucifera accessions were broadly collected across China and Southeast Asia (Supplemental Table S1). Based on Illumina next-generation sequencing (NGS) reads from whole-genome sequencing (WGS), de novo assembly of all plastomes was completed using SPAdes 3.14[44] from which a graph of major contigs was used to generate a circular molecule in Bandage 0.8.1[45]. The clean raw WGS reads were first randomly extracted using the 'sample' function in SeqKit 0.13.1[46] to generate approximate 6–8 Gb datasets, and were then aligned against two published congeneric plastomes (NC_015605, and NC_025339) to filter out plastomic reads using BWA-MEM algorithm in bwa 0.7.17[47] with default settings[47]. Filtered reads were used for de novo assembly in SPAdes 3.15.2 with five k-mers (51, 71, 91, 101, and 121) and further combined automatically in SPAdes 3.15.2 following settings provided in He et al.[37]. Bandage 0.8.1 was used to obtain the final circular molecule for each accession. Average assembly depth of each accession surpassed 100×. All plastome sequences were manually adjusted to start with the first base of the LSC region using Blastn 2.9.0[48] against the sequence itself.

      Length, together with GC content of LSC, SSC, and IR for each plastome was detected using Perl script, which were processed in IBM SPSS Statistics 22 (SPSS Inc., Chicago, USA). Accessions LA001 (N. lutea) and LF001 (N. nucifera, flower lotus) were annotated as exemplars from each of the two Nelumbo species respectively using GB2sequin[49] with published plastomes of N. lutea (NC_015605) and N. nucifera (NC_025339) from NCBI used as comparative references to check the accuracy of the de novo assemblies.

    • Assembled plastome sequences were aligned in MAFFT 7[50] using the default settings. All aligned plastomes were manually scanned with variants detected by DnaSP 6[51] using LA001 used as the reference. Nucleotide variants were classified into SNVs, Block substitutions (consecutive nucleotide substitutions greater than 1 nucleotide in length which in some cases includes gaps), and InDels (insertions or deletions one nucleotide in length). The position of each variant was mapped to the reference to characterize the location in the genome as cds, intronic, or intergenic. The stacked graph was plotted in package ggplot2 in R v 4.0[52].

    • Based on the complete aligned plastome sequences, the median-joining network was resolved in Popart 1.7[53] to resolve haplotype diversity. The principal component analysis (PCA) of the two datasets (including or excluding LA) was performed in TASSEL 5.2[54], using the highest (in regard to percent explanation) two eigenvectors for plotting in two dimensions. Using a SNVs only matrix extracted from the aligned plastomes (alignment file of SNVs from 316 accessions was available at Figshare, https://doi.org/10.6084/m9.figshare.17694764.v2), IQ-tree 2.1[55] was used to reconstruct a maximum likelihood (ML) tree using a TVMe+ASC+R2 nucleotide substitution model chosen by the Bayesian information criterion (BIC) and 1,000 bootstrap replicates to assess branch support. Population structure analysis was performed using ADMIXTURE 1.3[56] using default settings for haploid data with runs on different K values from 3−9. The optimal K was chosen from the lowest cross-validation (CV) error value compared across all values of K. Nucleotide diversity (π), haplotype diversity (Hd), and genetic differentiation (Fst) were calculated in DnaSP 6 to assess the genetic diversity and divergence within and among different genetic clusters. Source information of each accession was collected from previous publications and plotted using ggplot and map packages in R v 4.0.

      • This work was supported by the National Natural Science Foundation of China (Grant No. 31970244), also co-funded by the Training of Excellent Science and Technology Innovation talents in Shenzhen - Basic Research on Outstanding Youth (STIC: RCYX20200714114538196) and the Zhejiang Provincial Natural Science Foundation of China [LY21C160001].

      • The authors declare that they have no conflict of interest.

      • # These authors contributed equally: Jie Wang, Xuezhu Liao, Cuihua Gu

      • Supplemental Table S1 Sampling Information of 316 lotus accessions.
      • Supplemental Table S2 Plastome size and GC content of different types of Asian lotus.
      • Supplemental Table S3 Functional groups of CDS with variant events.
      • Supplemental Table S4 Genetic clusters and haplotypes of lotus accessions.
      • Supplemental Fig. S1 An example of some species-specific variants among the pan-plastome.
      • Supplemental Fig. S2 Liner ML tree of 316 accessions.
      • Supplemental Fig. S3 Liner BI tree of 316 accessions. Genetic clusters were colored as ML.
      • Supplemental Fig. S4 Geographical distribution of wild Asian lotuses from different haplotypes. Blue, red, and green dotted boxes represented three distributive regions, namely, blue: northeastern China and North Korea; red: east central China; green: southern China and several Southeast Asian countries including India, Thailand, Indonesia, and Singapore.
      • Supplemental Annotation.txt Annotation of plastome sequence of LA001 (Nelumbo lutea) in this study.
      • Copyright: © 2022 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.
    Figure (5)  Table (1) References (56)
  • About this article
    Cite this article
    Wang J, Liao X, Gu C, Xiang K, Wang J, et al. 2022. The Asian lotus (Nelumbo nucifera) pan-plastome: diversity and divergence in a living fossil grown for seed, rhizome, and aesthetics. Ornamental Plant Research 2: 2 doi: 10.48130/OPR-2022-0002
    Wang J, Liao X, Gu C, Xiang K, Wang J, et al. 2022. The Asian lotus (Nelumbo nucifera) pan-plastome: diversity and divergence in a living fossil grown for seed, rhizome, and aesthetics. Ornamental Plant Research 2: 2 doi: 10.48130/OPR-2022-0002

Catalog

  • About this article

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return