2024 Volume 3
Article Contents
About this article
ARTICLE   Open Access    

Comparative chloroplast genome analysis of Camellia oleifera and C. meiocarpa: phylogenetic relationships, sequence variation and polymorphic markers

  • # Authors contributed equally: Heng Liang, Huasha Qi

More Information
  • Received: 21 March 2024
    Revised: 22 April 2024
    Accepted: 26 April 2024
    Published online: 24 July 2024
    Tropical Plants  3 Article number: e023 (2024)  |  Cite this article
  • Compared to C. oleifera (HZP), there were differences ranging between 460 bp (CKX) and 490 bp (XG) in C. meiocarpa.

    C. meiocarpa was considered as a separated species.

    The development of 17 primers could be used for the resource assessment of Camellia.

  • Tea-oil Camellia, a prominently woody oil crop, serves as a crucial source of edible oil, protein feed, and industrial raw materials. Notably, C. Oleifera and C. meiocarpa yield higher oil production and larger cultivation areas than other Tea-oil Camellia species. However, the taxonomy and phylogenetic relationship between these species remains elusive, complicating their commercial application. Here, we sequenced and analyzed the complete chloroplast genomes of these two species, compared them with related Camellia species, and developed chloroplast DNA markers to distinguish between them. The chloroplast genome of C. Oleifera was 157,009 bp (HZP) and C. meiocarpa was 156,549 bp (CKX) and 156,512 bp (XG) in length. Comparative analysis indicated that distinct differences in the chloroplast genome between HZP and CKX (or XG) than between CKX and XG. The repetitive sequences and interspecific variations among them showed that the differences in the number and distribution in CKX and XG were smaller than those in HZP. Phylogenetic analysis showed that C. meiocarpa was not closely related to C. oleifera. A total of 56 pairs of primers were developed to test the polymorphism among them. After PCR and sequencing verification, variations were detected in the target sequences of 17 primers. The data derived from the chloroplast genomes and the newly developed markers are invaluable for understanding the phylogenetic relationships and assessing the genetic diversity of tea-oil Camellia germplasm resources.
    Graphical Abstract
  • Aquaporins (AQPs) constitute a large family of transmembrane channel proteins that function as regulators of intracellular and intercellular water flow[1,2]. Since their first discovery in the 1990s, AQPs have been found not only in three domains of life, i.e., bacteria, eukaryotes, and archaea, but also in viruses[3,4]. Each AQP monomer is composed of an internal repeat of three transmembrane helices (i.e., TM1–TM6) as well as two half helixes that are formed by loop B (LB) and LE through dipping into the membrane[5]. The dual Asn-Pro-Ala (NPA) motifs that are located at the N-terminus of two half helixes act as a size barrier of the pore via creating an electrostatic repulsion of protons, whereas the so-called aromatic/arginine (ar/R) selectivity filter (i.e., H2, H5, LE1, and LE2) determines the substrate specificity by rendering the pore constriction site diverse in both size and hydrophobicity[59]. Based on sequence similarity, AQPs in higher plants could be divided into five subfamilies, i.e., plasma membrane intrinsic protein (PIP), tonoplast intrinsic protein (TIP), NOD26-like intrinsic protein (NIP), X intrinsic protein (XIP), and small basic intrinsic protein (SIP)[1017]. Among them, PIPs, which are typically localized in the cell membrane, are most conserved and play a central role in controlling plant water status[12,1822]. Among two phylogenetic groups present in the PIP subfamily, PIP1 possesses a relatively longer N-terminus and PIP2 features an extended C-terminus with one or more conserved S residues for phosphorylation modification[5,15,17].

    Tigernut (Cyperus esculentus L.), which belongs to the Cyperaceae family within Poales, is a novel and promising herbaceous C4 oil crop with wide adaptability, large biomass, and short life period[2327]. Tigernut is a unique species accumulating up to 35% oil in the underground tubers[2830], which are developed from stolons and the process includes three main stages, i.e., initiation, swelling, and maturation[3133]. Water is essential for tuber development and tuber moisture content maintains a relatively high level of approximately 85% until maturation when a significant drop to about 45% is observed[28,32]. Thereby, uncovering the mechanism of tuber water balance is of particular interest. Despite crucial roles of PIPs in the cell water balance, to date, their characterization in tigernut is still in the infancy[21]. The recently available genome and transcriptome datasets[31,33,34] provide an opportunity to address this issue.

    In this study, a global characterization of PIP genes was conducted in tigernut, including gene localizations, gene structures, sequence characteristics, and evolutionary patterns. Moreover, the correlation of CePIP mRNA/protein abundance with water content during tuber development as well as subcellular localizations were also investigated, which facilitated further elucidating the water balance mechanism in this special species.

    PIP genes reported in Arabidopsis (Arabidopsis thaliana)[10] and rice (Oryza sativa)[11] were respectively obtained from TAIR11 (www.arabidopsis.org) and RGAP7 (http://rice.uga.edu), and detailed information is shown in Supplemental Table S1. Their protein sequences were used as queries for tBLASTn[35] (E-value, 1e–10) search of the full-length tigernut transcriptome and genome sequences that were accessed from CNGBdb (https://db.cngb.org/search/assembly/CNA0051961)[31,34]. RNA sequencing (RNA-seq) reads that are available in NCBI (www.ncbi.nlm.nih.gov/sra) were also adopted for gene structure revision as described before[13], and presence of the conserved MIP (major intrinsic protein, Pfam accession number PF00230) domain in candidates was confirmed using MOTIF Search (www.genome.jp/tools/motif). To uncover the origin and evolution of CePIP genes, a similar approach was also employed to identify homologs from representative plant species, i.e., Carex cristatella (v1, Cyperaceae)[36], Rhynchospora breviuscula (v1, Cyperaceae)[37], and Juncus effusus (v1, Juncaceae)[37], whose genome sequences were accessed from NCBI (www.ncbi.nlm.nih.gov). Gene structure of candidates were displayed using GSDS 2.0 (http://gsds.gao-lab.org), whereas physiochemical parameters of deduced proteins were calculated using ProtParam (http://web.expasy.org/protparam). Subcellular localization prediction was conducted using WoLF PSORT (www.genscript.com/wolf-psort.html).

    Nucleotide and protein multiple sequence alignments were respectively conducted using ClustalW and MUSCLE implemented in MEGA6[38] with default parameters, and phylogenetic tree construction was carried out using MEGA6 with the maximum likelihood method and bootstrap of 1,000 replicates. Systematic names of PIP genes were assigned with two italic letters denoting the source organism and a progressive number based on sequence similarity. Conserved motifs were identified using MEME Suite 5.5.3 (https://meme-suite.org/tools/meme) with optimized parameters as follows: Any number of repetitions, maximum number of 15 motifs, and a width of 6 and 250 residues for each motif. TMs and conserved residues were identified using homology modeling and sequence alignment with the structure resolved spinach (Spinacia oleracea) SoPIP2;1[5].

    Synteny analysis was conducted using TBtools-II[39] as described previously[40], where the parameters were set as E-value of 1e-10 and BLAST hits of 5. Duplication modes were identified using the DupGen_finder pipeline[41], and Ks (synonymous substitution rate) and Ka (nonsynonymous substitution rate) of duplicate pairs were calculated using codeml in the PAML package[42]. Orthologs between different species were identified using InParanoid[43] and information from synteny analysis, and orthogroups (OGs) were assigned only when they were present in at least two species examined.

    Plant materials used for gene cloning, qRT-PCR analysis, and 4D-parallel reaction monitoring (4D-PRM)-based protein quantification were derived from a tigernut variety Reyan3[31], and plants were grown in a greenhouse as described previously[25]. For expression profiling during leaf development, three representative stages, i.e., young, mature, and senescing, were selected and the chlorophyll content was checked using SPAD-502Plus (Konica Minolta, Shanghai, China) as previously described[44]. Young and senescing leaves are yellow in appearance, and their chlorophyll contents are just half of that of mature leaves that are dark green. For diurnal fluctuation regulation, mature leaves were sampled every 4 h from the onset of light at 8 a.m. For gene regulation during tuber development, fresh tubers at 1, 5, 10, 15, 20, 25, and 35 d after tuber initiation (DAI) were collected as described previously[32]. All samples with three biological replicates were quickly frozen with liquid nitrogen and stored at −80 °C for further use. For subcellular localization analysis, tobacco (Nicotiana benthamiana) plants were grown as previously described[20].

    Tissue-specific expression profiles of CePIP genes were investigated using Illumina RNA-seq samples (150 bp paired-end reads) with three biological replicates for young leaf, mature leaf, sheath of mature leaf, shoot apex, root, rhizome, and three stages of developmental tuber (40, 85, and 120 d after sowing (DAS)), which are under the NCBI accession number of PRJNA703731. Raw sequence reads in the FASTQ format were obtained using fastq-dump, and quality control was performed using fastQC (www.bioinformatics.babraham.ac.uk/projects/fastqc). Read mapping was performed using HISAT2 (v2.2.1, https://daehwankimlab.github.io/hisat2), and relative gene expression level was presented as FPKM (fragments per kilobase of exon per million fragments mapped)[45].

    For qRT-PCR analysis, total RNA extraction and synthesis of the first-strand cDNA were conducted as previously described[24]. Primers used in this study are shown in Supplemental Table S2, where CeUCE2 and CeTIP41[25,33] were employed as two reference genes. PCR reaction in triplicate for each biological sample was carried out using the SYBR-green Mix (Takara) on a Real-time Thermal Cycler Type 5100 (Thermal Fisher Scientific Oy). Relative gene abundance was estimated with the 2−ΔΔCᴛ method and statistical analysis was performed using SPSS Statistics 20 as described previously[13].

    Raw proteomic data for tigernut roots, leaves, freshly harvested, dried, rehydrated for 48 h, and sprouted tubers were downloaded from ProteomeXchange/PRIDE (www.proteomexchange.org, PXD021894, PXD031123, and PXD035931), which were further analyzed using Maxquant (v1.6.15.0, www.maxquant.org). Three dominant members, i.e., CePIP1;1, -2;1, and -2;8, were selected for 4D-PRM quantification analysis, and related unique peptides are shown in Supplemental Table S3. Protein extraction, trypsin digestion, and LC-MS/MS analysis were conducted as described previously[46].

    For subcellular localization analysis, the coding region (CDS) of CePIP1;1, -2;1, and -2;8 were cloned into pNC-Cam1304-SubN via Nimble Cloning as described before[30]. Then, recombinant plasmids were introduced into Agrobacterium tumefaciens GV3101 with the helper plasmid pSoup-P19 and infiltration of 4-week-old tobacco leaves were performed as previously described[20]. For subcellular localization analysis, the plasma membrane marker HbPIP2;3-RFP[22] was co-transformed as a positive control. Fluorescence observation was conducted using confocal laser scanning microscopy imaging (Zeiss LMS880, Germany): The wavelength of laser-1 was set as 730 nm for RFP observation, where the fluorescence was excited at 561 nm; the wavelength of laser-2 was set as 750 nm for EGFP observation, where the fluorescence was excited at 488 nm; and the wavelength of laser-3 was set as 470 nm for chlorophyll autofluorescence observation, where the fluorescence was excited at 633 nm.

    As shown in Table 1, a total of 14 PIP genes were identified from eight tigernut scaffolds (Scfs). The CDS length varies from 831 to 882 bp, putatively encoding 276–293 amino acids (AA) with a molecular weight (MW) of 29.16–31.59 kilodalton (kDa). The theoretical isoelectric point (pI) varies from 7.04 to 9.46, implying that they are all alkaline. The grand average of hydropathicity (GRAVY) is between 0.344 and 0.577, and the aliphatic index (II) ranges from 94.57 to 106.90, which are consistent with the hydrophobic characteristic of AQPs[47]. As expected, like SoPIP2;1, all CePIPs include six TMs, two typical NPA motifs, the invariable ar/R filter F-H-T-R, five conserved Froger's positions Q/M-S-A-F-W, and two highly conserved residues corresponding to H193 and L197 in SoPIP2;1 that were proven to be involved in gating[5,48], though the H→F variation was found in CePIP2;9, -2;10, and -2;11 (Supplemental Fig. S1). Moreover, two S residues, corresponding to S115 and S274 in SoPIP2;1[5], respectively, were also found in the majority of CePIPs (Supplemental Fig. S1), implying their posttranslational regulation by phosphorylation.

    Table 1.  Fourteen PIP genes identified in C. esculentus.
    Gene name Locus Position Intron no. AA MW (kDa) pI GRAVY AI TM MIP
    CePIP1;1 CESC_15147 Scf9:2757378..2759502(–) 3 288 30.76 8.82 0.384 95.28 6 47..276
    CePIP1;2 CESC_04128 Scf4:3806361..3807726(–) 3 291 31.11 8.81 0.344 95.95 6 46..274
    CePIP1;3 CESC_15950 Scf54:5022493..5023820(+) 3 289 31.06 8.80 0.363 94.57 6 49..278
    CePIP2;1 CESC_15350 Scf9:879960..884243(+) 3 288 30.34 8.60 0.529 103.02 6 33..269
    CePIP2;2 CESC_00011 Scf30:4234620..4236549(+) 3 293 31.59 9.27 0.394 101.57 6 35..268
    CePIP2;3 CESC_00010 Scf30:4239406..4241658(+) 3 291 30.88 9.44 0.432 98.97 6 31..266
    CePIP2;4 CESC_05080 Scf46:307799..309544(+) 3 285 30.44 7.04 0.453 100.32 6 28..265
    CePIP2;5 CESC_05079 Scf46:312254..314388(+) 3 286 30.49 7.04 0.512 101.68 6 31..268
    CePIP2;6 CESC_05078 Scf46:316024..317780(+) 3 288 30.65 7.68 0.475 103.06 6 31..268
    CePIP2;7 CESC_05077 Scf46:320439..322184(+) 3 284 30.12 8.55 0.500 100.00 6 29..266
    CePIP2;8 CESC_14470 Scf2:4446409..4448999(+) 3 284 30.37 8.30 0.490 106.90 6 33..263
    CePIP2;9 CESC_02223 Scf1:2543928..2545778(–) 3 283 30.09 9.46 0.533 106.47 6 31..262
    CePIP2;10 CESC_10007 Scf27:1686032..1688010(–) 3 276 29.16 9.23 0.560 106.05 6 26..256
    CePIP2;11 CESC_10009 Scf27:1694196..1696175(–) 3 284 29.71 9.10 0.577 105.49 6 33..263
    AA: amino acid; AI: aliphatic index; GRAVY: grand average of hydropathicity; kDa: kilodalton; MIP: major intrinsic protein; MW: molecular weight; pI: isoelectric point; PIP: plasma membrane intrinsic protein; Scf: scaffold; TM: transmembrane helix.
     | Show Table
    DownLoad: CSV

    To uncover the evolutionary relationships, an unrooted phylogenetic tree was constructed using the full-length protein sequences of CePIPs together with 11 OsPIPs and 13 AtPIPs. As shown in Fig. 1a, these proteins were clustered into two main groups, corresponding to PIP1 and PIP2 as previously defined[10,49], and each appears to have evolved into several subgroups. Compared with PIP1s, PIP2s possess a relatively shorter N-terminal but an extended C-terminal with one conserved S residue (Supplemental Fig. S1). Interestingly, a high number of gene repeats were detected, most of which seem to be species-specific, i.e., AtPIP1;1/-1;2/-1;3/-1;4/-1;5, AtPIP2;1/-2;2/-2;3/-2;4/-2;5/-2;6, AtPIP2;7/-2;8, OsPIP1;1/-1;2/-1;3, OsPIP2;1/-2;4/-2;5, OsPIP2;2/-2;3, CePIP1;1/-1;2, CePIP2;2/-2;3, CePIP2;4/-2;5/-2;6/-2;7, and CePIP2;9/-2;10/-2;11, reflecting the occurrence of more than one lineage-specific whole-genome duplications (WGDs) after their divergence[50,51]. In Arabidopsis that experienced three WGDs (i.e. γ, β, and α) after the split with the monocot clade[52], AtPIP1;5 in the PIP1 group first gave rise to AtPIP1;1 via the γ WGD shared by all core eudicots[50], which latter resulted in AtPIP1;3, -1;4, and -1;2 via β and α WGDs; AtPIP2;1 in the PIP2 group first gave rise to AtPIP2;6 via the γ WGD, and they latter generated AtPIP2;2, and -2;5 via the α WGD (Supplemental Table S1). In rice, which also experienced three WGDs (i.e. τ, σ, and ρ) after the split with the eudicot clade[51], OsPIP1;2 and -2;3 generated OsPIP1;1 and -2;2 via the Poaceae-specific ρ WGD, respectively. Additionally, tandem, proximal, transposed and dispersed duplications also played a role on the gene expansion in these two species (Supplemental Table S1).

    Figure 1.  Structural and phylogenetic analysis of PIPs in C. esculentus, O. sativa, and A. thaliana. (a) Shown is an unrooted phylogenetic tree resulting from full-length PIPs with MEGA6 (maximum likelihood method and bootstrap of 1,000 replicates), where the distance scale denotes the number of amino acid substitutions per site. (b) Shown are the exon-intron structures. (c) Shown is the distribution of conserved motifs among PIPs, where different motifs are represented by different color blocks as indicated and the same color block in different proteins indicates a certain motif. (At: A. thaliana; Ce: C. esculentus; PIP: plasma membrane intrinsic protein; Os: O. sativa).

    Analysis of gene structures revealed that all CePIP and AtPIP genes possess three introns and four exons in the CDS, in contrast to the frequent loss of certain introns in rice, including OsPIP1;2, -1;3, -2;1, -2;3, -2;4, -2;5, -2;6, -2;7, and -2;8 (Fig. 1b). The positions of three introns are highly conserved, which are located in sequences encoding LB (three residues before the first NPA), LD (one residue before the conserved L involved in gating), and LE (18 residues after the second NPA), respectively (Supplemental Fig. S1). The intron length of CePIP genes is highly variable, i.e., 109–993 bp, 115–1745 bp, and 95–866 bp for three introns, respectively. By contrast, the exon length is relatively less variable: Exons 2 and 3 are invariable with 296 bp and 141 bp, respectively, whereas Exons 1 and 4 are of 277–343 bp and 93–132 bp, determining the length of N- and C-terminus of PIP1 and PIP2, respectively (Fig. 1b). Correspondingly, their protein structures were shown to be highly conserved, and six (i.e., Motifs 1–6) out of 15 motifs identified are broadly present. Among them, Motif 3, -2, -6, -1, and -4 constitute the conserved MIP domain. In contrast to a single Motif 5 present in most PIP2s, all PIP1s possess two sequential copies of Motif 5, where the first one is located at the extended N-terminal. In CePIP2;3 and OsPIP2;7, Motif 5 is replaced by Motif 13; in CePIP2;2, it is replaced by two copies of Motif 15; and no significant motif was detected in this region of CePIP2;10. PIP1s and PIP2s usually feature Motif 9 and -7 at the C-terminal, respectively, though it is replaced by Motif 12 in CePIP2;6 and OsPIP2;8. PIP2s usually feature Motif 8 at the N-terminal, though it is replaced by Motif 14 in CePIP2;2 and -2;3 or replaced by Motif 11 in CePIP2;10 and -2;11 (Fig. 1c).

    As shown in Fig. 2a, gene localization of CePIPs revealed three gene clusters, i.e., CePIP2;2/-2;3 on Scf30, CePIP2;4/-2;5/-2;6/-2;7 on Scf46, and CePIP2;10/-2;11 on Scf27, which were defined as tandem repeats for their high sequence similarities and neighboring locations. The nucleotide identities of these duplicate pairs vary from 70.5% to 91.2%, and the Ks values range from 0.0971 to 1.2778 (Table 2), implying different time of their birth. According to intra-species synteny analysis, two duplicate pairs, i.e., CePIP1;1/-1;2 and CePIP2;2/-2;4, were shown to be located within syntenic blocks (Fig. 2b) and thus were defined as WGD repeats. Among them, CePIP1;1/-1;2 possess a comparable Ks value to CePIP2;2/-2;3, CePIP1;1/-1;3, and CePIP2;4/-2;8 (1.2522 vs 1.2287–1.2778), whereas CePIP2;2/-2;4 harbor a relatively higher Ks value of 1.5474 (Table 2), implying early origin or fast evolution of the latter. While CePIP1;1/-1;3 and CePIP2;1/-2;8 were characterized as transposed repeats, CePIP2;1/-2;2, CePIP2;9/-2;10, and CePIP2;8/-2;10 were characterized as dispersed repeats (Fig. 2a). The Ks values of three dispersed repeats vary from 0.8591 to 3.0117 (Table 2), implying distinct times of origin.

    Figure 2.  Duplication events of CePIP genes and synteny analysis within and between C. esculentus, O. sativa, and A. thaliana. (a) Duplication events detected in tigernut. Serial numbers are indicated at the top of each scaffold, and the scale is in Mb. Duplicate pairs identified in this study are connected using lines in different colors, i.e., tandem (shown in green), transposed (shown in purple), dispersed (shown in gold), and WGD (shown in red). (b) Synteny analysis within and between C. esculentus, O. sativa, and A. thaliana. (c) Synteny analysis within and between C. esculentus, C. cristatella, R. breviuscula, and J. effusus. Shown are PIP-encoding chromosomes/scaffolds and only syntenic blocks that contain PIP genes are marked, i.e., red and purple for intra- and inter-species, respectively. (At: A. thaliana; Cc: C. cristatella; Ce: C. esculentus; Je: J. effusus; Mb: megabase; PIP: plasma membrane intrinsic protein; Os: O. sativa; Rb: R. breviuscula; Scf: scaffold; WGD: whole-genome duplication).
    Table 2.  Sequence identity and evolutionary rate of homologous PIP gene pairs identified in C. esculentus. Ks and Ka were calculated using PAML.
    Duplicate 1 Duplicate 2 Identity (%) Ka Ks Ka/Ks
    CePIP1;1 CePIP1;3 78.70 0.0750 1.2287 0.0610
    CePIP1;2 CePIP1;1 77.20 0.0894 1.2522 0.0714
    CePIP2;1 CePIP2;4 74.90 0.0965 1.7009 0.0567
    CePIP2;3 CePIP2;2 70.50 0.1819 1.2778 0.1424
    CePIP2;4 CePIP2;2 66.50 0.2094 1.5474 0.1353
    CePIP2;5 CePIP2;4 87.30 0.0225 0.4948 0.0455
    CePIP2;6 CePIP2;5 84.90 0.0545 0.5820 0.0937
    CePIP2;7 CePIP2;6 78.70 0.0894 1.0269 0.0871
    CePIP2;8 CePIP2;4 72.90 0.1401 1.2641 0.1109
    CePIP2;9 CePIP2;10 76.40 0.1290 0.8591 0.1502
    CePIP2;10 CePIP2;8 64.90 0.2432 3.0117 0.0807
    CePIP2;11 CePIP2;10 91.20 0.0562 0.0971 0.5783
    Ce: C. esculentus; Ka: nonsynonymous substitution rate; Ks: synonymous substitution rate; PIP: plasma membrane intrinsic protein.
     | Show Table
    DownLoad: CSV

    According to inter-species syntenic analysis, six out of 14 CePIP genes were shown to have syntelogs in rice, including 1:1, 1:2, and 2:2 (i.e. CePIP1;1 vs OsPIP1;3, CePIP1;3 vs OsPIP1;2/-1;1, CePIP2;1 vs OsPIP2;4, CePIP2;2/-2;4 vs OsPIP2;3/-2;2, and CePIP2;8 vs OsPIP2;6), in striking contrast to a single one found in Arabidopsis (i.e. CePIP1;2 vs AtPIP1;2). Correspondingly, only OsPIP1;2 in rice was shown to have syntelogs in Arabidopsis, i.e., AtPIP1;3 and -1;4 (Fig. 2b). These results are consistent with their taxonomic relationships that tigernut and rice are closely related[50,51], and also imply lineage-specific evolution after their divergence.

    As described above, phylogenetic and syntenic analyses showed that the last common ancestor of tigernut and rice is more likely to possess only two PIP1s and three PIP2s. However, it is not clear whether the gene expansion observed in tigernut is species-specific or Cyperaceae-specific. To address this issue, recently available genomes were used to identify PIP subfamily genes from C. cristatella, R. breviuscula, and J. effuses, resulting in 15, 13, and nine members, respectively. Interestingly, in contrast to a high number of tandem repeats found in Cyperaceae species, only one pair of tandem repeats (i.e., JePIP2;3 and -2;4) were identified in J. effusus, a close outgroup species to Cyperaceae in the Juncaceae family[36,37]. According to homologous analysis, a total of 12 orthogroups were identified, where JePIP genes belong to PIP1A (JePIP1;1), PIP1B (JePIP1;2), PIP1C (JePIP1;3), PIP2A (JePIP2;1), PIP2B (JePIP2;2), PIP2F (JePIP2;3 and -2;4), PIP2G (JePIP2;5), and PIP2H (JePIP2;6) (Table 3). Further intra-species syntenic analysis revealed that JePIP1;1/-1;2 and JePIP2;2/-2;3 are located within syntenic blocks, which is consistent with CePIP1;1/-1;2, CePIP2;2/-2;4, CcPIP1;1/-1;2, CcPIP2;3/-2;4, RbPIP1;1/-1;2, and RbPIP2;2/-2;5 (Fig. 2c), implying that PIP1A/PIP1B and PIP2B/PIP2D were derived from WGDs occurred sometime before Cyperaceae-Juncaceae divergence. After the split with Juncaceae, tandem duplications frequently occurred in Cyperaceae, where PIP2B/PIP2C and PIP2D/PIP2E/PIP2F retain in most Cyperaceae plants examined in this study. By contrast, species-specific expansion was also observed, i.e., CePIP2;4/-2;5, CePIP2;10/-2;11, CcPIP1;2/-1;3, CcPIP2;4/-2;5, CcPIP2;8/-2;9, CcPIP2;10/-2;11, RbPIP2;3/-2;4, and RbPIP2;9/-2;10 (Table 3 & Fig. 2c).

    Table 3.  Twelve proposed orthogroups based on comparison of representative plant species.
    Orthogroup C. esculentus C. cristatella R. breviuscula J. effusus O. sativa A. thaliana
    PIP1A CePIP1;1 CcPIP1;1 RbPIP1;1 JePIP1;1 OsPIP1;3 AtPIP1;1, AtPIP1;2,
    AtPIP1;3, AtPIP1;4,
    AtPIP1;5
    PIP1B CePIP1;2 CcPIP1;2, CcPIP1;3 RbPIP1;2 JePIP1;2
    PIP1C CePIP1;3 CcPIP1;4 RbPIP1;3 JePIP1;3 OsPIP1;1, OsPIP1;2
    PIP2A CePIP2;1 CcPIP2;1 RbPIP2;1 JePIP2;1 OsPIP2;1, OsPIP2;4,
    OsPIP2;5
    AtPIP2;1, AtPIP2;2,
    AtPIP2;3, AtPIP2;4,
    AtPIP2;5, AtPIP2;6
    PIP2B CePIP2;2 CcPIP2;2 RbPIP2;2 JePIP2;2 OsPIP2;2, OsPIP2;3
    PIP2C CePIP2;3 CcPIP2;3 RbPIP2;3, RbPIP2;4
    PIP2D CePIP2;4, CePIP2;5 CcPIP2;4, CcPIP2;5 RbPIP2;5
    PIP2E CePIP2;5 CcPIP2;5 RbPIP2;6
    PIP2F CePIP2;6 CcPIP2;6
    PIP2G CePIP2;7 CcPIP2;7 RbPIP2;7 JePIP2;3, JePIP2;4
    PIP2H CePIP2;8 CcPIP2;8, CcPIP2;9 RbPIP2;8 JePIP2;5 OsPIP2;6 AtPIP2;7, AtPIP2;8
    PIP2I CePIP2;9, CePIP2;10,
    CePIP2;11
    CcPIP2;10, CcPIP2;11 RbPIP2;9, RbPIP2;10 JePIP2;6 OsPIP2;7, OsPIP2;8
    At: A. thaliana; Cc: C. cristatella; Ce: C. esculentus; Je: J. effuses; Os: O. sativa; Rb: R. breviuscula; PIP: plasma membrane intrinsic protein.
     | Show Table
    DownLoad: CSV

    Tissue-specific expression profiles of CePIP genes were investigated using transcriptome data available for young leaf, mature leaf, sheath, root, rhizome, shoot apex, and tuber. As shown in Fig. 3a, CePIP genes were mostly expressed in roots, followed by sheaths, moderately in tubers, young leaves, rhizomes, and mature leaves, and lowly in shoot apexes. In most tissues, CePIP1;1, -2;1, and -2;8 represent three dominant members that contributed more than 90% of total transcripts. By contrast, in rhizome, these three members occupied about 80% of total transcripts, which together with CePIP1;3 and -2;4 contributed up to 96%; in root, CePIP1;1, -1;3, -2;4, and -2;7 occupied about 84% of total transcripts, which together with CePIP2;1 and -2;8 contributed up to 94%. According to their expression patterns, CePIP genes could be divided into five main clusters: Cluster I includes CePIP1;1, -2;1, and -2;8 that were constitutively and highly expressed in all tissues examined; Cluster II includes CePIP2;2, -2;9, and -2;10 that were lowly expressed in all tested tissues; Cluster III includes CePIP1;2 and -2;11 that were preferentially expressed in young leaf and sheath; Cluster IV includes CePIP1;3 and -2;4 that were predominantly expressed in root and rhizome; and Cluster V includes remains that were typically expressed in root (Fig. 3a). Collectively, these results imply expression divergence of most duplicate pairs and three members (i.e. CePIP1;1, -2;1, and -2;8) have evolved to be constitutively co-expressed in most tissues.

    Figure 3.  Expression profiles of CePIP genes in various tissues, different stages of leaf development, and mature leaves of diurnal fluctuation. (a) Tissue-specific expression profiles of 14 CePIP genes. The heatmap was generated using the R package implemented with a row-based standardization. Color scale represents FPKM normalized log2 transformed counts, where blue indicates low expression and red indicates high expression. (b) Expression profiles of CePIP1;1, -2;1, and -2;8 at different stages of leaf development. (c) Expression profiles of CePIP1;1, -2;1, and -2;8 in mature leaves of diurnal fluctuation. Bars indicate SD (N = 3) and uppercase letters indicate difference significance tested following Duncan's one-way multiple-range post hoc ANOVA (p< 0.01). (Ce: C. esculentus; FPKM: Fragments per kilobase of exon per million fragments mapped; PIP: plasma membrane intrinsic protein)

    As shown in Fig. 3a, compared with young leaves, transcriptome profiling showed that CePIP1;2, -2;3, -2;7, -2;8, and -2;11 were significantly down-regulated in mature leaves, whereas CePIP1;3 and -2;1 were up-regulated. To confirm the results, three dominant members, i.e., CePIP1;1, -2;1, and -2;8, were selected for qRT-PCR analysis, which includes three representative stages, i.e., young, mature, and senescing leaves. As shown in Fig. 3b, in contrast to CePIP2;1 that exhibited a bell-like expression pattern peaking in mature leaves, transcripts of both CePIP1;1 and -2;8 gradually decreased during leaf development. These results were largely consistent with transcriptome profiling, and the only difference is that CePIP1;1 was significantly down-regulated in mature leaves relative to young leaves. However, this may be due to different experiment conditions used, i.e., greenhouse vs natural conditions.

    Diurnal fluctuation expression patterns of CePIP1;1, -2;1, and -2;8 were also investigated in mature leaves and results are shown in Fig. 3c. Generally, transcripts of all three genes in the day (8, 12, 16, and 20 h) were higher than that in the night (24 and 4 h). During the day, both CePIP1;1 and -2;8 exhibited an unimodal expression pattern that peaked at 12 h, whereas CePIP2;1 possessed two peaks (8 and 16 h) and their difference was not significant. Nevertheless, transcripts of all three genes at 20 h (onset of night) were significantly lower than those at 8 h (onset of day) as well as 12 h. In the night, except for CePIP2;1, no significant difference was observed between the two stages for both CePIP1;1 and -2;8. Moreover, their transcripts were comparable to those at 20 h (Fig. 3c).

    To reveal the expression patterns of CePIP genes during tuber development, three representative stages, i.e., 40 DAS (early swelling stage), 85 DAS (late swelling stage), and 120 DAS (mature stage), were first profiled using transcriptome data. As shown in Fig. 4a, except for rare expression of CePIP1;2, -2;2, -2;9, and -2;10, most genes exhibited a bell-like expression pattern peaking at 85 DAS, in contrast to a gradual decrease of CePIP2;3 and -2;8. Notably, except for CePIP2;4, other genes were expressed considerably lower at 120 DAS than that at 40 DAS. For qRT-PCR confirmation of CePIP1;1, -2;1, and -2;8, seven stages were examined, i.e., 1, 5, 10, 15, 20, 25, and 35 DAI, which represent initiation, five stages of swelling, and maturation as described before[32]. As shown in Fig. 4b, two peaks were observed for all three genes, though their patterns were different. As for CePIP1;1, compared with the initiation stage (1 DAI), significant up-regulation was observed at the early swelling stage (5 DAI), followed by a gradual decrease except for the appearance of the second peak at 20 DAI, which is something different from transcriptome profiling. As for CePIP2;1, a sudden drop of transcripts first appeared at 5 DAI, then gradually increased until 20 DAI, which was followed by a gradual decrease at two late stages. The pattern of CePIP2;8 is similar to -1;1, two peaks appeared at 5 and 20 DAI and the second peak was significantly lower than the first. The difference is that the second peak of CePIP2;8 was significantly lower than the initiation stage. By contrast, the second peak (20 DAI) of CePIP2;1 was significantly higher than that of the first one (1 DAI). Nevertheless, the expression patterns of both CePIP2;1 and -2;8 are highly consistent with transcriptome profiling.

    Figure 4.  Transcript and protein abundances of CePIP genes during tuber development. (a) Transcriptome-based expression profiling of 14 CePIP genes during tuber development. The heatmap was generated using the R package implemented with a row-based standardization. Color scale represents FPKM normalized log2 transformed counts, where blue indicates low expression and red indicates high expression. (b) qRT-PCR-based expression profiling of CePIP1;1, -2;1, and -2;8 in seven representative stages of tuber development. (c) Relative protein abundance of CePIP1;1, -2;1, and -2;8 in three representative stages of tuber development. Bars indicate SD (N = 3) and uppercase letters indicate difference significance tested following Duncan's one-way multiple-range post hoc ANOVA (p < 0.01). (Ce: C. esculentus; DAI: days after tuber initiation; DAS: days after sowing; FPKM: Fragments per kilobase of exon per million fragments mapped; PIP: plasma membrane intrinsic protein).

    Since protein abundance is not always in agreement with the transcript level, protein profiles of three dominant members (i.e. CePIP1;1, -2;1, and -2;8) during tuber development were further investigated. For this purpose, we first took advantage of available proteomic data to identify CePIP proteins, i.e., leaves, roots, and four stages of tubers (freshly harvested, dried, rehydrated for 48 h, and sprouted). As shown in Supplemental Fig. S2, all three proteins were identified in both leaves and roots, whereas CePIP1;1 and -2;8 were also identified in at least one of four tested stages of tubers. Notably, all three proteins were considerably more abundant in roots, implying their key roles in root water balance.

    To further uncover their profiles during tuber development, 4D-PRM-based protein quantification was conducted in three representative stages of tuber development, i.e., 1, 25, and 35 DAI. As expected, all three proteins were identified and quantified. In contrast to gradual decrease of CePIP2;8, both CePIP1;1 and -2;1 exhibited a bell-like pattern that peaked at 25 DAI, though no significant difference was observed between 1 and 25 DAI (Fig. 4c). The trends are largely in accordance with their transcription patterns, though the reverse trend was observed for CePIP2;1 at two early stages (Fig. 4b & Fig. 4c).

    As predicted by WoLF PSORT, CePIP1;1, -2;1, and -2;8 may function in the cell membrane. To confirm the result, subcellular localization vectors named pNC-Cam1304-CePIP1;1, pNC-Cam1304-CePIP2;1, and pNC-Cam1304-CePIP2;8 were further constructed. When transiently overexpressed in tobacco leaves, green fluorescence signals of all three constructs were confined to cell membranes, highly coinciding with red fluorescence signals of the plasma membrane marker HbPIP2;3-RFP (Fig. 5).

    Figure 5.  (a) Schematic diagram of overexpressing constructs, (b) subcellular localization analysis of CePIP1;1, -2;1, and -2;8 in N. benthamiana leaves. (35S: cauliflower mosaic virus 35S RNA promoter; Ce: C. esculentus; EGFP: enhanced green fluorescent protein; kb: kilobase; NOS: terminator of the nopaline synthase gene; RFP: red fluorescent protein; PIP: plasma membrane intrinsic protein).

    Water balance is particularly important for cell metabolism and enlargement, plant growth and development, and stress responses[2,19]. As the name suggests, AQPs raised considerable interest for their high permeability to water, and plasma membrane-localized PIPs were proven to play key roles in transmembrane water transport between cells[1,18]. The first PIP was discovered in human erythrocytes, which was named CHIP28 or AQP1, and its homolog in plants was first characterized in Arabidopsis, which is known as RD28, PIP2c, or AtPIP2;3[3,7,53]. Thus far, genome-wide identification of PIP genes have been reported in a high number of plant species, including two model plants Arabidopsis and rice[10,11,1317,5456]. By contrast, little information is available on Cyperaceae, the third largest family within the monocot clade that possesses more than 5,600 species[57].

    Given the crucial roles of water balance for tuber development and crop production, in this study, tigernut, a representative Cyperaceae plant producing high amounts of oil in underground tubers[28,30,32], was employed to study PIP genes. A number of 14 PIP genes representing two phylogenetic groups (i.e., PIP1 and PIP2) or 12 orthogroups (i.e., PIP1A, PIP1B, PIP1C, PIP2A, PIP2B, PIP2C, PIP2D, PIP2E, PIP2F, PIP2G, PIP2H, and PIP2I) were identified from the tigernut genome. Though the family amounts are comparative or less than 13–21 present in Arabidopsis, cassava (Manihot esculenta), rubber tree (Hevea brasiliensis), poplar (Populus trichocarpa), C. cristatella, R. breviuscula, banana (Musa acuminata), maize (Zea mays), sorghum (Sorghum bicolor), barley (Hordeum vulgare), and switchgrass (Panicum virgatum), they are relatively more than four to 12 found in eelgrass (Zostera marina), Brachypodium distachyon, foxtail millet (Setaria italic), J. effuses, Aquilegia coerulea, papaya (Carica papaya), castor been (Ricinus communis), and physic nut (Jatropha curcas) (Supplemental Table S4). Among them, A. coerulea represents a basal eudicot that didn't experience the γ WGD shared by all core eudicots[50], whereas eelgrass is an early diverged aquatic monocot that didn't experience the τ WGD shared by all core monocots[56]. Interestingly, though both species possess two PIP1s and two PIP2s, they were shown to exhibit complex orthologous relationships of 1:1, 2:2, 1:0, and 0:1 (Supplemental Table S5). Whereas AcPIP1;1/AcPIP1;2/ZmPIP1;1/ZmPIP1;2 and ZmPIP2;1/AcPIP2;1 belong to PIP1A and PIP2A identified in this study, AcPIP2;2 and ZmPIP2;2 belong to PIP2H and PIP2I, respectively (Supplemental Table S5), implying that the last common ancestor of monocots and eudicots possesses only one PIP1 and two PIP2s followed by clade-specific expansion. A good example is the generation of AtPIP1;1 and -2;6 from AtPIP1;5 and -2;1 via the γ WGD, respectively[17].

    In tigernut, extensive expansion of the PIP subfamily was contributed by WGD (2), transposed (2), tandem (5), and dispersed duplications (3). It's worth noting that, two transposed repeats (i.e., CePIP1;1/-1;3 and CePIP2;1/-2;8) are shared by rice, implying their early origin that may be generated sometime after the split with the eudicot clade but before Cyperaceae-Poaceae divergence. By contrast, two WGD repeats (i.e., CePIP1;1/-1;2 and CePIP2;2/-2;4) are shared by C. cristatella, R. breviuscula, and J. effusus but not rice and Arabidopsis, implying that they may be derived from WGDs that occurred sometime after Cyperaceae-Poaceae split but before Cyperaceae-Juncaceae divergence. The possible WGD is the one that was described in C. littledalei[58], though the exact time still needs to be studied. Interestingly, compared with Arabidopsis (1) and rice (2), tandem/proximal duplications played a more important role in the expansion of PIP genes in tigernut (5) as well as other Cyperaceae species tested (5–6), which were shown to be Cyperaceae-specific or even species-specific. These tandem repeats may play a role in the adaptive evolution of Cyperaceae species as described in a high number of plant species[14,41]. According to comparative genomics analyses, tandem duplicates experienced stronger selective pressure than genes formed by other modes (WGD, transposed duplication, and dispersed duplication) and evolved toward biased functional roles involved in plant self-defense[41].

    As observed in most species such as Arabidopsis[10,1417], PIP genes in all Cyperaceae and Juncaceae species examined in this study, i.e., tigernut, C. cristatella, R. breviuscula, and J. effuses, feature three introns with conserved positions. By contrast, zero to three introns was not only found in rice but also in other Poaceae species such as maize, sorghum, foxtail millet, switchgrass, B. distachyon, and barley[54,55], implying lineage/species-specific evolution.

    Despite the extensive expansion of PIP genes (PIP2) in tigernut even after the split with R. breviuscula, CePIP1;1, -2;1, and -2;8 were shown to represent three dominant members in most tissues examined in this study, i.e., young leaf, mature leaf, sheath, rhizome, shoot apex, and tuber, though the situation in root is more complex. CePIP1;1 was characterized as a transposed repeat of CePIP1;3, which represents the most expressed member in root. Moreover, its recent WGD repeat CePIP1;2 was shown to be lowly expressed in most tested tissues, implying their divergence. The ortholog of CePIP1;1 in rice is OsPIP1;3 (RWC-3), which was shown to be preferentially expressed in roots, stems, and leaves, in contrast to constitutive expression of OsPIP1;1 (OsPIP1a) and -1;2[5961], two recent WGD repeats. Injecting the cRNA of OsPIP1;3 into Xenopus oocytes could increase the osmotic water permeability by 2–3 times[60], though the activity is considerably lower than PIP2s such as OsPIP2;2 and -2;2[6163]. Moreover, OsPIP1;3 was shown to play a role in drought avoidance in upland rice and its overexpression in lowland rice could increase root osmotic hydraulic conductivity, leaf water potential, and relative cumulative transpiration at the end of 10 h PEG treatment[64]. CePIP2;8 was characterized as a transposed repeat of CePIP2;1. Since their orthologs are present in both rice and Arabidopsis (Supplemental Table S3), the duplication event is more likely to occur sometime before monocot-eudicot split. Interestingly, their orthologs in rice, i.e., OsPIP2;1 (OsPIP2a) and -2;6, respectively, are also constitutively expressed[61], implying a conserved evolution with similar functions. When heterologously expressed in yeast, OsPIP2;1 was shown to exhibit high water transport activity[62,6466]. Moreover, root hydraulic conductivity was decreased by approximately four folds in OsPIP2;1 RNAi knock-down rice plants[64]. The water transport activity of OsPIP2;6 has not been tested, however, it was proven to be an H2O2 transporter that is involved in resistance to rice blast[61]. More work especially transgenic tests may improve our knowledge of the function of these key CePIP genes.

    Leaf is a photosynthetic organ that regulates water loss through transpiration. In tigernut, PIP transcripts in leaves were mainly contributed by CePIP1;1, -2;1, and -2;8, implying their key roles. During leaf development, in contrast to gradual decrease of CePIP1;1 and -2;8 transcripts in three stages (i.e. young, mature, and senescing) examined in this study, CePIP2;1 peaked in mature leaves. Their high abundance in young leaves is by cell elongation and enlargement at this stage, whereas upregulation of CePIP2;1 in mature leaves may inform its possible role in photosynthesis[67]. Thus far, a high number of CO2 permeable PIPs have been identified, e.g., AtPIP2;1, HvPIP2;1, HvPIP2;2, HvPIP2;3, HvPIP2;5, and SiPIP2;7[6870]. Moreover, in mature leaves, CePIP1;1, -2;1, and -2;8 were shown to exhibit an apparent diurnal fluctuation expression pattern that was expressed more in the day and usually peaked at noon, which reflects transpiration and the fact that PIP genes are usually induced by light[11,7173]. In rice, OsPIP2;4 and -2;5 also showed a clear diurnal fluctuation in roots that peaked at 3 h after the onset of light and dropped to a minimum 3 h after the onset of darkness[11]. Notably, further studies showed that temporal and dramatic induction of OsPIP2;5 around 2 h after light initiation was triggered by transpirational demand but not circadian rhythm[74].

    As an oil-bearing tuber crop, the main economic goal of tigernut cultivation is to harvest underground tubers, whose development is highly dependent on water available[32,75]. According to previous studies, the moisture content of immature tigernut tubers maintains more than 80.0%, followed by a seed-like dehydration process with a drop of water content to less than 50% during maturation[28,32]. Thereby, the water balance in developmental tubers must be tightly regulated. Like leaves, the majority of PIP transcripts in tubers were shown to be contributed by CePIP1;1, -2;1, and -2;8, which was further confirmed at the protein level. In accordance with the trend of water content during tuber development, mRNA, and protein abundances of CePIP1;1, -2;1, and -2;8 in initiation and swelling tubers were considerably higher than that at the mature stage. High abundances of CePIP1;1, -2;1, and -2;8 at the initiation stage reflects rapid cell division and elongation, whereas upregulation of CePIP1;1 and -2;1 at the swelling stage is in accordance with cell enlargement and active physiological metabolism such as rapid oil accumulation[28,30]. At the mature stage, downregulation of PIP transcripts and protein abundances resulted in a significant drop in the moisture content, which is accompanied by the significant accumulation of late embryogenesis-abundant proteins[23,32]. The situation is highly distinct from other tuber plants such as potato (Solanum tuberosum), which may contribute to the difference in desiccation resistance between two species[32,76]. It's worth noting that, in one study, CePIP2;1 was not detected in any of the four tested stages, i.e., freshly harvested, dried, rehydrated for 48 h, and sprouted tubers[23]. By contrast, it was quantified in all three stages of tuber development examined in this study, i.e., 1, 25, and 35 DAI (corresponding to freshly harvested tubers), which represent initiation, swelling, and maturation. One possible reason is that the protein abundance of CePIP2;1 in mature tubers is not high enough to be quantified by nanoLC-MS/MS, which is relatively less sensitive than 4D-PRM used in this study[30,46]. In fact, nanoLC-MS/MS-based proteomic analysis of 30 samples representing six tissues/stages only resulted in 2,257 distinct protein groups[23].

    Taken together, our results imply a key role of CePIP1;1, -2;1, and -2;8 in tuber water balance, however, the mechanism underlying needs to be further studied, e.g., posttranslational modifications, protein interaction patterns, and transcriptional regulators.

    To our knowledge, this is the first genome-wide characterization of PIP genes in tigernut, a representative Cyperaceae plant with oil-bearing tubers. Fourteen CePIP genes representing two phylogenetic groups or 12 orthogroups are relatively more than that present in two model plants rice and Arabidopsis, and gene expansion was mainly contributed by WGD and transposed/tandem duplications, some of which are lineage or even species-specific. Among these genes, CePIP1;1, -2;1, and -2;8 have evolved to be three dominant members that are constitutively expressed in most tissues, including leaf and tuber. Transcription of these three dominant members in leaves are subjected to development and diurnal regulation, whereas in tubers, their mRNA and protein abundances are positively correlated with the moisture content during tuber development. Moreover, their plasma membrane-localization was confirmed by subcellular localization analysis, implying that they may function in the cell membrane. These findings shall not only provide valuable information for further uncovering the mechanism of tuber water balance but also lay a solid foundation for genetic improvement by regulating these key PIP members in tigernut.

    The authors confirm contribution to the paper as follows: study conception and design, supervision: Zou Z; analysis and interpretation of results: Zou Z, Zheng Y, Xiao Y, Liu H, Huang J, Zhao Y; draft manuscript preparation: Zou Z, Zhao Y. All authors reviewed the results and approved the final version of the manuscript.

    All the relevant data is available within the published article.

    This work was supported by the Hainan Province Science and Technology Special Fund (ZDYF2024XDNY171 and ZDYF2024XDNY156), China; the National Natural Science Foundation of China (32460342, 31971688 and 31700580), China; the Project of Sanya Yazhou Bay Science and Technology City (SCKJ-JYRC-2022-66), China. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

  • The authors declare that they have no conflict of interest.

  • Supplemental Table S1 The GenBank accession numbers of 8 species using in comparative analysis.
    Supplemental Table S2 The GenBank accession numbers of 26 species using in phylogenetic analysis.
    Supplemental Table S3 Genes contained in the chloroplast genome sequence of XG, CKX and HZP.
    Supplemental Table S4 Scattered repetitive sequences in CKX, Scattered repetitive sequences in XG, Scattered repetitive sequences in HZP.
    Supplemental Table S5 Features of SSR in HZP, Features of SSR in XG, Features of SSR in CKX.
    Supplemental Table S6 The pi values in XG, CKX and HZP.
    Supplemental Table S7 The features of indel and snp in XG, CKX and HZP.
    Supplemental Table S8 PCR primers used for amplification of the candidate barcode regions.
    Supplemental Fig. S1 Phylogenetic tree reconstruction of 27 Camellia species based on protein-coding genes by (A) ML methods and (B) MP methods.
    Supplemental Fig. S2 Phylogenetic tree reconstruction of 27 Camellia species based on whole chloroplast genome sequences by (A) ML methods and (B) MP methods.
  • [1]

    Zhu M, Shi T, Chen Y, Luo S, Leng T, et al. 2019. Prediction of fatty acid composition in camellia oil by 1H NMR combined with PLS regression. Food Chemistry 279:339−46

    doi: 10.1016/j.foodchem.2018.12.025

    CrossRef   Google Scholar

    [2]

    Liu L, Feng S, Chen T, Zhou L, Yuan M, et al. 2021. Quality assessment of Camellia oleifera oil cultivated in Southwest China. Separations 8(9):144

    doi: 10.3390/separations8090144

    CrossRef   Google Scholar

    [3]

    Zhang L, Wang L. 2021. Prospect and development status of oil-tea Camellia industry in China. China Oils Fats 46:6−9+27

    doi: 10.19902/j.cnki.zgyz.1003-7969.2021.06.002

    CrossRef   Google Scholar

    [4]

    Yu J, Yan H, Wu Y, Wang Y, Xia P. 2022. Quality evaluation of the oil of Camellia spp. Foods 11:2221

    doi: 10.3390/foods11152221

    CrossRef   Google Scholar

    [5]

    Chen Y. 2008. Oil tea camellia superior germplasm resources. China Forestry Publishing House: Beijing, China

    [6]

    Wang X, Huang L, Chen L, Yang W, Li Y, Ma Z. 2010. The investigation to the variety resources of oil tea plant in Wuzhishan of Hainan. Journal of Hunan Agricultural University (Natural Sciences) 36:1−4

    doi: 10.3724/SP.J.1238.2010.00001

    CrossRef   Google Scholar

    [7]

    Li S, Liu SL, Pei SY, Ning MM, Tang SQ. 2020. Genetic diversity and population structure of Camellia huana (Theaceae), a limestone species with narrow geographic range, based on chloroplast DNA sequence and microsatellite markers. Plant Diversity 42:343−50

    doi: 10.1016/j.pld.2020.06.003

    CrossRef   Google Scholar

    [8]

    Shi SH, Tang SQ, Cheng YQ, Qu LH, Hung-ta C. 1998. Phylogenetic relationships among eleven yellow-flowered camellia species based on random amplified polymorphic DNA. Journal of Systematics and Evolution 36:317

    Google Scholar

    [9]

    Vijayan K, Zhang WJ, Tsou CH. 2009. Molecular taxonomy of Camellia (Theaceae) inferred from nrITS sequences. American Journal of Botany 96:1348−60

    doi: 10.3732/ajb.0800205

    CrossRef   Google Scholar

    [10]

    Yang H, Wei CL, Liu HW, Wu JL, Li ZG, et al. 2016. Genetic divergence between Camellia sinensis and its wild relatives revealed via genome-wide SNPs from RAD sequencing. PLoS One 11:e0151424

    doi: 10.1371/journal.pone.0151424

    CrossRef   Google Scholar

    [11]

    Zhao DW, Yang JB, Yang SX, Kato K, Luo JP. 2014. Genetic diversity and domestication origin of tea plant Camellia taliensis (Theaceae) as revealed by microsatellite markers. BMC Plant Biology 14:14

    doi: 10.1186/1471-2229-14-14

    CrossRef   Google Scholar

    [12]

    Qin S, Rong J, Zhang W, Chen J. 2018. Cultivation history of Camellia oleifera and genetic resources in the Yangtze River Basin. Biodiversity Science 26:384−95

    doi: 10.17520/biods.2017254

    CrossRef   Google Scholar

    [13]

    Chang H, Ren S. 1998. Flora Reipublicae Popularis Sinicae, Tomus 49 (3), Theaceae (1): Theoideae. Beijing: Science Press.

    [14]

    Tianlu M. 2000. Monograph of the Genus Camellia. Kunming: Yunnan Science and Technology Press.

    [15]

    Ming TL, Bartholomew B. 2007. Camellia. In Flora of China, eds. Wu CY, Raven PH, Hong DY. vol. 12. Beijing & St. Louispp: Science Press & Missouri Botanical garden Press. pp. 367–412.

    [16]

    Yao X, Huang Y. 2013. The resource and genetic diversity of Camellia meiocarpa Hu. Beijing, China: Science Press.

    [17]

    Fang Z, Li G, Gu Y, Wen C, Ye H, et al. 2022. Flavour analysis of different varieties of camellia seed oil and the effect of the refining process on flavour substances. LWT 170:114040

    doi: 10.1016/j.lwt.2022.114040

    CrossRef   Google Scholar

    [18]

    Jheng CF, Chen TC, Lin JY, Chen TC, Wu WL, et al. 2012. The comparative chloroplast genomic analysis of photosynthetic orchids and developing DNA markers to distinguish Phalaenopsis orchids. Plant Science 190:62−73

    doi: 10.1016/j.plantsci.2012.04.001

    CrossRef   Google Scholar

    [19]

    Li E, Liu K, Deng R, Gao Y, Liu X, et al. 2023. Insights into the phylogeny and chloroplast genome evolution of Eriocaulon (Eriocaulaceae). BMC Plant Biology 23:32

    doi: 10.1186/s12870-023-04034-z

    CrossRef   Google Scholar

    [20]

    Jiang D, Cai X, Gong M, Xia M, Xing H, et al. 2023. Complete chloroplast genomes provide insights into evolution and phylogeny of Zingiber (Zingiberaceae). BMC genomics 24:30

    doi: 10.1186/s12864-023-09115-9

    CrossRef   Google Scholar

    [21]

    Glass SE, McCourt RM, Gottschalk SD, Lewis LA, Karol KG. 2023. Chloroplast genome evolution and phylogeny of the early-diverging charophycean green algae with a focus on the Klebsormidiophyceae and Streptofilum. Journal of Phycology 59:1133−46

    doi: 10.1111/jpy.13359

    CrossRef   Google Scholar

    [22]

    Wu B, Zhu J, Ma X, Jia J, Luo D, et al. 2023. Comparative analysis of switchgrass chloroplast genomes provides insights into identification, phylogenetic relationships and evolution of different ecotypes. Industrial Crops and Products 205:117570

    doi: 10.1016/j.indcrop.2023.117570

    CrossRef   Google Scholar

    [23]

    Cao Z, Yang L, Xin Y, Xu W, Li Q, et al. 2023. Comparative and phylogenetic analysis of complete chloroplast genomes from seven Neocinnamomum taxa (Lauraceae). Frontiers in Plant Science 14:1205051

    doi: 10.3389/fpls.2023.1205051

    CrossRef   Google Scholar

    [24]

    Chen J, Wang F, Zhao Z, Li M, Liu Z, et al. 2023. Complete chloroplast genomes and comparative analyses of three Paraphalaenopsis (Aeridinae, Orchidaceae) species. International Journal of Molecular Sciences 24:11167

    doi: 10.3390/ijms241311167

    CrossRef   Google Scholar

    [25]

    Xu XM, Liu DH, Zhu SX, Wang ZL, Wei Z, et al. 2023. Phylogeny of Trigonotis in China—with a special reference to its nutlet morphology and plastid genome. Plant Diversity 45:409−21

    doi: 10.1016/j.pld.2023.03.004

    CrossRef   Google Scholar

    [26]

    Liang H, Zhang Y, Deng J, Gao G, Ding C, et al. 2020. The complete chloroplast genome sequences of 14 Curcuma species: insights into genome evolution and phylogenetic relationships within zingiberales. Frontiers in Genetics 11:802

    doi: 10.3389/fgene.2020.00802

    CrossRef   Google Scholar

    [27]

    Chen Z, Liu Q, Xiao Y, Zhou G, Yu P, et al. 2023. Complete chloroplast genome sequence of Camellia sinensis: genome structure, adaptive evolution, and phylogenetic relationships. Journal of Applied Genetics 64:419−29

    doi: 10.1007/s13353-023-00767-7

    CrossRef   Google Scholar

    [28]

    Qiao D, Yang C, Guo Y. 2023. The complete chloroplast genome sequence of Camellia sinensis var sinensis cultivar 'FuDingDaBaiCha'. Mitochondrial DNA Part B 8:100−4

    doi: 10.1080/23802359.2022.2161327

    CrossRef   Google Scholar

    [29]

    Ran Z, Li Z, Xiao X, An M, Yan C. 2024. Complete chloroplast genomes of 13 species of sect. Tuberculata Chang (Camellia L.): Genomic features, comparative analysis, and phylogenetic relationships. BMC Genomics 25:108

    doi: 10.1186/s12864-024-09982-w

    CrossRef   Google Scholar

    [30]

    Luo H, Liao B, Li Y, Huang R, Zhang K, et al. 2023. Characterization of the complete chloroplast genome sequences and phylogenetic relationships of four oil-seed Camellia spp. and related taxa. bioRxiv In Press:2023.10.03.560681

    doi: 10.1101/2023.10.03.560681

    CrossRef   Google Scholar

    [31]

    Murray MG, Thompson WF. 1980. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Research 8:4321−26

    doi: 10.1093/nar/8.19.4321

    CrossRef   Google Scholar

    [32]

    Patel RK, Jain M. 2012. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7:e30619

    doi: 10.1371/journal.pone.0030619

    CrossRef   Google Scholar

    [33]

    Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, et al. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology 19:455−77

    doi: 10.1089/cmb.2012.0021

    CrossRef   Google Scholar

    [34]

    Shi L, Chen H, Jiang M, Wang L, Wu X, et al. 2019. CPGAVAS2, an integrated plastome sequence annotator and analyzer. Nucleic Acids Research 47:W65−W73

    doi: 10.1093/nar/gkz345

    CrossRef   Google Scholar

    [35]

    Librado P, Rozas J. 2009. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25:1451−52

    doi: 10.1093/bioinformatics/btp187

    CrossRef   Google Scholar

    [36]

    Katoh K, Misawa K, Kuma KI, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30:3059−66

    doi: 10.1093/nar/gkf436

    CrossRef   Google Scholar

    [37]

    Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, et al. 2001. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic acids research 29:4633−42

    doi: 10.1093/nar/29.22.4633

    CrossRef   Google Scholar

    [38]

    Beier S, Thiel T, Münch T, Scholz U, Mascher M. 2017. MISA-web: a web server for microsatellite prediction. Bioinformatics 33:2583−85

    doi: 10.1093/bioinformatics/btx198

    CrossRef   Google Scholar

    [39]

    Katoh K, Toh H. 2008. Recent developments in the MAFFT multiple sequence alignment program. Briefings in Bioinformatics 9:286−98

    doi: 10.1093/bib/bbn013

    CrossRef   Google Scholar

    [40]

    Kalyaanamoorthy S, Minh BQ, Wong TKF, Von Haeseler A, Jermiin LS. 2017. ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods 14:587−89

    doi: 10.1038/nmeth.4285

    CrossRef   Google Scholar

    [41]

    Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312−13

    doi: 10.1093/bioinformatics/btu033

    CrossRef   Google Scholar

    [42]

    Kumar S, Stecher G, Tamura K. 2016. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Molecular Biology and Evolution 33:1870−74

    doi: 10.1093/molbev/msw054

    CrossRef   Google Scholar

    [43]

    Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, et al. 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic biology 61:539−42

    doi: 10.1093/sysbio/sys029

    CrossRef   Google Scholar

    [44]

    Wei SJ, Liufu YQ, Zheng HW, Chen HL, Lai YC, et al. 2023. Using phylogenomics to untangle the taxonomic incongruence of yellow-flowered Camellia species (Theaceae) in China. Journal of Systematics and Evolution 61:748−63

    doi: 10.1111/jse.12915

    CrossRef   Google Scholar

    [45]

    Wang Y, Huang J, Xie N, Zhang D, Tong W, et al. 2023. The complete chloroplast genome sequence of Camellia atrothea (Ericales: Theaceae). Mitochondrial DNA Part B 8:536−40

    doi: 10.1080/23802359.2023.2204972

    CrossRef   Google Scholar

    [46]

    Kim KJ, Lee HL. 2005. Widespread occurrence of small inversions in the chloroplast genomes of land plants. Molecules & Cells 19:104−13

    doi: 10.1016/s1016-8478(23)13143-8

    CrossRef   Google Scholar

    [47]

    Wang RJ, Cheng CL, Chang CC, Wu CL, Su TM, et al. 2008. Dynamics and evolution of the inverted repeat-large single copy junctions in the chloroplast genomes of monocots. BMC Evolutionary Biology 8:36

    doi: 10.1186/1471-2148-8-36

    CrossRef   Google Scholar

    [48]

    Huang Y. 2013. Population genetic structure and interspecific introgressive hybridization between Camellia meiocarpa and C. oleifera. Chinese Journal of Applied Ecology 24:2345−52

    Google Scholar

    [49]

    Chen M, Zhang Y, Du Z, Kong X, Zhu X. 2023. Integrative metabolic and transcriptomic profiling in Camellia oleifera and Camellia meiocarpa uncover potential mechanisms that govern triacylglycerol degradation during seed desiccation. Plants 12:2591

    doi: 10.3390/plants12142591

    CrossRef   Google Scholar

    [50]

    Chen J, Guo Y, Hu X, Zhou K. 2022. Comparison of the chloroplast genome sequences of 13 oil-tea camellia samples and identification of an undetermined oil-tea camellia species from Hainan province. Frontiers in Plant Science 12:798581

    doi: 10.3389/fpls.2021.798581

    CrossRef   Google Scholar

    [51]

    Lin P, Yin H, Wang K, Gao H, Liu L, Yao X. 2022. Comparative genomic analysis uncovers the chloroplast genome variation and phylogenetic relationships of Camellia species. Biomolecules 12:1474

    doi: 10.3390/biom12101474

    CrossRef   Google Scholar

    [52]

    Yang JB, Tang M, Li HT, Zhang ZR, Li DZ. 2013. Complete chloroplast genome of the genus Cymbidium: lights into the species identification, phylogenetic implications and population genetic analyses. BMC Evolutionary Biology 13:84

    doi: 10.1186/1471-2148-13-84

    CrossRef   Google Scholar

    [53]

    Köhler M, Reginato M, Souza-Chies TT, Majure LC. 2020. Insights into chloroplast genome evolution across Opuntioideae (Cactaceae) reveals robust yet sometimes conflicting phylogenetic topologies. Frontiers in Plant Science 11:729

    doi: 10.3389/fpls.2020.00729

    CrossRef   Google Scholar

    [54]

    Yang JB, Yang SX, Li HT, Yang J, Li DZ. 2013. Comparative chloroplast genomes of Camellia species. PLoS One 8:e73053

    doi: 10.1371/journal.pone.0073053

    CrossRef   Google Scholar

    [55]

    Liu J. 2010. Collection and conservation on the genetic resources of camellia oleifera for the genetic affinity molecular identification. Master's thesis. Fujian Agriculture and Forestry University, China. www.dissertationtopic.net/doc/343404

    [56]

    Xie Y. 2013. Study on intraspecific type classification, evaluation and genetic relationships of Camellia meiocarpa. PhD thesis. Chinese Academy of Forestry, China. www.dissertationtopic.net/doc/1796220

    [57]

    Zhao DW, Hodkinson TR, Parnell JAN. 2023. Phylogenetics of global Camellia (Theaceae) based on three nuclear regions and its implications for systematics and evolutionary history. Journal of Systematics and Evolution 61:356−68

    doi: 10.1111/jse.12837

    CrossRef   Google Scholar

    [58]

    Zhuang R. 2008. Oil-Tea Camellia in China. Beijing: Science Press.

    [59]

    Patwardhan A, Ray S, Roy A. 2014. Molecular markers in phylogenetic studies - a review. Journal of Phylogenetics & Evolutionary Biology 2:131

    doi: 10.4172/2329-9002.1000131

    CrossRef   Google Scholar

    [60]

    Bachmann K. 1994. Molecular markers in plant ecology. New Phytologist 126:403−18

    doi: 10.1111/j.1469-8137.1994.tb04242.x

    CrossRef   Google Scholar

    [61]

    Jia J. 1996. Molecular germplasm diagnostics and molecular marker-assisted breeding. Scientia Agricultura Sinica 29:1−10

    Google Scholar

    [62]

    Luo C, Chen D, Cheng X, Liu H, Li Y, et al. 2018. SSR analysis of genetic relationship and classification in chrysanthemum germplasm collection. Horticultural Plant Journal 4:73−82

    doi: 10.1016/j.hpj.2018.01.003

    CrossRef   Google Scholar

    [63]

    Li B, Lin F, Huang P, Guo W, Zheng Y. 2020. Development of nuclear SSR and chloroplast genome markers in diverse Liriodendron chinense germplasm based on low-coverage whole genome sequencing. Biological Research 53:21

    doi: 10.1186/s40659-020-00289-0

    CrossRef   Google Scholar

  • Cite this article

    Liang H, Qi H, Wang Y, Sun X, Wang C, et al. 2024. Comparative chloroplast genome analysis of Camellia oleifera and C. meiocarpa: phylogenetic relationships, sequence variation and polymorphic markers. Tropical Plants 3: e023 doi: 10.48130/tp-0024-0022
    Liang H, Qi H, Wang Y, Sun X, Wang C, et al. 2024. Comparative chloroplast genome analysis of Camellia oleifera and C. meiocarpa: phylogenetic relationships, sequence variation and polymorphic markers. Tropical Plants 3: e023 doi: 10.48130/tp-0024-0022

Figures(6)  /  Tables(4)

Article Metrics

Article views(2659) PDF downloads(536)

ARTICLE   Open Access    

Comparative chloroplast genome analysis of Camellia oleifera and C. meiocarpa: phylogenetic relationships, sequence variation and polymorphic markers

Tropical Plants  3 Article number: e023  (2024)  |  Cite this article

Abstract: Tea-oil Camellia, a prominently woody oil crop, serves as a crucial source of edible oil, protein feed, and industrial raw materials. Notably, C. Oleifera and C. meiocarpa yield higher oil production and larger cultivation areas than other Tea-oil Camellia species. However, the taxonomy and phylogenetic relationship between these species remains elusive, complicating their commercial application. Here, we sequenced and analyzed the complete chloroplast genomes of these two species, compared them with related Camellia species, and developed chloroplast DNA markers to distinguish between them. The chloroplast genome of C. Oleifera was 157,009 bp (HZP) and C. meiocarpa was 156,549 bp (CKX) and 156,512 bp (XG) in length. Comparative analysis indicated that distinct differences in the chloroplast genome between HZP and CKX (or XG) than between CKX and XG. The repetitive sequences and interspecific variations among them showed that the differences in the number and distribution in CKX and XG were smaller than those in HZP. Phylogenetic analysis showed that C. meiocarpa was not closely related to C. oleifera. A total of 56 pairs of primers were developed to test the polymorphism among them. After PCR and sequencing verification, variations were detected in the target sequences of 17 primers. The data derived from the chloroplast genomes and the newly developed markers are invaluable for understanding the phylogenetic relationships and assessing the genetic diversity of tea-oil Camellia germplasm resources.

    • Tea-oil Camellia refers to a group of plants within the Camellia genus of the Theaceae family, known for their high oil content in their fruits and their cultivation value[1]. Tea oil is rich in unsaturated fatty acids, comprising up to around 90%, which is higher than olive oil[2]. This makes it a premium edible oil with significant health and medicinal benefits. Besides that, it is collectively referred to as one of the world's four major woody oil crops, along with Elaeis guineensis, Olea europaea and Cocos nucifera[3]. In China, approximately 30 species within the Camellia genus are all referred to as tea-oil Camellia[4]. Due to its strong adaptability, long growth cycle, tolerance to infertile soils, suitability for cultivation in mountainous and hilly areas, tea-oil Camellia is a key woody oil crop actively promoted in China[5]. Currently, the cultivation area of tea-oil Camellia in China is approximately 5.3 million hectares. C. oleifera, followed by C. meiocarpa, represents the majority of this cultivation, primarily in the southern provinces such as Hunan, Jiangxi, Guangxi, Guangdong, Zhejiang and Fujian. In addition, Wang et al. found that C. oleifera and C. meiocarpa are distributed in the tropical regions of China (within Wuzhishan in Hainan, China)[6].

      Due to the complexity of nuclear genomes, diverse ploidy levels, rich phenotypic variations, and the presence of interspecific hybridization, the phylogeny within Tea-oil Camellia presents significant challenges. To clarify the relationships among them, scholars have employed morphological and molecular classification methods to conduct phylogenetic analysis of tea-oil Camellia species[711]. However, the phylogenetic relationships among tea-oil Camellia remain controversial, for example, the relationships between C. meiocarpa and C. oleifera. Initially identified by Mr. Xiansu Hu, C. meiocarpa was considered as a separated species[12]. In the Taxonomy of Chang system, it was considered a variant of C. oleifera, and named C. oleifera var. monosperma[13]. But in the Taxonomy of Ming system[14] and Flora of China[15], C. meiocarpa was merely a cultivated species of C. oleifera, not a distinct taxonomic species. It shares many fundamental characteristics with C. oleifer, such as branches, leaves, flowers, and fruits, with the primary distinction being the smaller size of these features in C. meiocarpa. Moreover, Yao & Huang used microsatellite molecular markers to analyze the difference between C. oleifera and C. meiocarpa and indicated that there was low genetic differentiation between these two species, suggesting that frequent interspecific hybridization and gene introgression blur their low genetic distinctions, supporting the notion that C. meiocarpa is a variant of C. oleifera[16]. However, most producers and researchers still consider C. meiocarpa to have a significant difference in morphology and oil quality, compared to C. oleifera, affirming its status as a distinct species. These controversies have created inconveniences for the breeding and production of tea-oil Camellia. Moreover, the Camellia oil from C. meiocarpa is nutritionally superior to that from C. oleifera, and shoddy goods are often overdue[17]. The strategies of developing DNA markers can differentiate them effectively, based on comparative genomes[18].

      The chloroplast genome is notably conserved and its uniparental (maternal) inheritance has been extensively utilized in classification and phylogenetic studies[1922]. Its lack of recombination and maternal transmission render it an invaluable tool for tracing the phylogenetic relationships among the complexity of nuclear genomes[2325]. Unlike limited genomic segments, the chloroplast genome contains a vast repository of genetic data, providing abundant variation loci information for the study of phylogeny and taxonomy[26]. Currently, despite their significance, there have been no reports on the chloroplast genome of C. meiocarpa, nor has there been a comparative chloroplast genomic analysis conducted between C. oleifera and C. meiocarpa[2730].

      In this study, we report the complete chloroplast genome sequences of C. oleifera and C. meiocarpa, and compared them with other tea-oil Camellia chloroplast genomes. Our objectives were to: 1) reconstruct the phylogenetic relationship between C. meiocarpa and C. oleifera; and 2) develop molecular markers to test the polymorphism within these species. The results are expected to provide a theoretical foundation for variety identification, breeding, and resource utilization.

    • Fresh leaves of C. oleifera (HZP) were collected from Tianyang in Guangxi province (107.073836° E, 24.007963° N, 554 m). In C. meiocarpa, XG was collected from Sanjiang in Guangxi province (109.422086° E, 25.710639° N, 139 m,) and CKX was from the germplasm garden of the Guangxi Forestry Research Institute. Quickly frozen in liquid nitrogen, and stored at ultra-low-temperature refrigerator at −80 °C until use. Total DNA extraction was carried out using the modified CTAB method[31]. Following the protocol provided by Illumina (San Diego, CA, USA), double-stranded (PE) libraries were constructed using sheared low-molecular-weight DNA fragments. The complete chloroplast genomes of the aforementioned materials were sequenced on the Illumina NovaSeq platform using the PE150 sequencing strategy and a 350 bp insert size.

    • The raw reads were filtered for adapter sequences and low-quality reads using the NGSQC Toolkit software (v2.3.3) to obtain high-quality reads[32]. The chloroplast genome was assembled using SPAdes software v3.14[33], and annotation was performed using cpGAVAS2 with manual correction[34]. Subsequently, the sequencing reads were mapped to the reference genome C. luteoflora to validate the assembly results.

    • The eight tea-oil Camellia species from GeneBank (Supplemental Table S1) were used to perform the comparative analysis. mVISTA program (https://genome.lbl.gov/vista/mvista/submit.shtml) was used to visualize the chloroplast genome in Shuffle-LAGAN mode with C. luteoflora as a reference. Moreover, we compared events of IR expansion and contraction among these accessions, analyzing the junction regions between the IR, SSC, and LSC using the online tool CPjsdraw (https://github.com/xul962464/CPJSdraw).

      To identify the mutational hotspot regions for HZP, XG and CKX, nucleotide diversity (Pi) was calculated using DnaSP v5[35]. MAFFT was employed for the alignment of the chloroplast genomes to identify the mutations[36].

    • In the chloroplast genomes of HZP, XG, and CKX, the REPuter[37] software was used to assess and pinpoint forward (F), reverse (R), complemented (C), and palindromic (P) repeats. The repeat identification utilized the following settings: (1) a Hamming distance equal to 3; (2) a minimal repeat size set to 30 bp; (3) a sequence identity of 90% or greater. Simple Sequence Repeats (SSR) loci were identified using MISA[38], with the minimal repeat number set to 10, 6, 5, 5, 5, 5 for mononucleotide (mono-), dinucleotide (di-), trinucleotide (tri-), tetranucleotide (tetra-), pentanucleotide (penta-), and hexanucleotide (hexa-) nucleotide sequences, respectively.

    • Phylogenetic analysis was carried out by utilizing the complete chloroplast genome sequences of HZP, XG, CKX, and other 26 Camellia species with one Polyspora species serving as outgroups (Supplemental Table S2). The nucleotide sequences were aligned using MAFFT version 7 software[39]. ModelFinder[40] was employed to determine the best-fit model with default settings, and the maximum likelihood (ML) analysis was conducted using RAxML[41] with 1,000 bootstrap replications. The Maximum Parsimony (MP) trees were inferred in MEGA7 with default parameters[42]. MrBayes v3.2.7 was used to infer the BI (Bayesian Inference) tree with Markov Chain Monte Carlo (MCMC) method[43]. One million generations and sample every 100 generations. The initial 25% of the phylogenetic tree was removed (burn-in), and the majority-rule consensus tree was finally obtained.

    • Based on SNPs and Indels in the chloroplast genome, polymorphic markers were designed to identify the difference of C. oleifera and C. meiocarpa. The PCR reaction had a total volume of 25 µL, consisting of 12.5 µL 2 × PCR Mix, 1 µL forward and reverse primers (10 pM each), 1 µL genomic DNA, and 9.5 µL ddH2O. The thermal cycling included an initial denaturation at 94 °C for 4 min, followed by 35 cycles of denaturation at 94 °C for 30 s, annealing temperature reference by 50−58 °C for 30 s, extension at 72 °C for 30 s, and a final extension at 72 °C for 7 min. The PCR products were sequenced for further verification. Based on the principle of improving detection efficiency and reducing sequencing costs, the size of sequences less than 800 bp were used for the Single-read sequencing, and paired-end sequencing for the sequences which were more than 800 bp in size.

    • The results (Table 1; Fig. 1) showed that the chloroplast genomes of C. meiocarpa (XG), C. meiocarpa (CKX) and C. oleifera (HZP) had a typical circular tetramerous structure like other related plants[44,45]. The genome sizes were 156,512 bp for XG and 156,549 bp for CKX, differing by only 37 bp between them. Compared to C. oleifera (157,009 bp, HZP), there were differences ranging between 460 and 490 bp. The three chloroplast genomes are divided into four distinctive regions: the LSC (86,263 bp in CKX, 86,224 bp in XG, and 86,637 bp in HZP), SSC (18,400 bp in CKX, 18,402 bp in XG, and 86,637 bp in HZP) and two IRs (25,943 bp in CKX, 25,943 bp in XG, and 26,041 bp in HZP). The overall GC content was nearly identical across the genomes: 37.32% in CKX, 37.33% in XG and 37.29% in HZP. Furthermore, the GC contents were unevenly distributed across regions of the chloroplast genome, with 35.33% in CKX, 35.34% in XG and 35.30% in HZP for the LSC; 30.58% in CKX, 30.57% in XG and 30.52% in HZP for the SSC; and 43.03% in CKX, 43.03% in XG and 42.99% in HZP for the IR regions, respectively (Table 2). These values indicated a conservative nature within the genomes of tea-oil Camellia. Additionally, each of the three genomes encoded the same set of 133 functional genes, including 87 protein-coding genes, eight rRNA genes and 37 tRNA genes. In Supplemental Table S3, a total of 18 genes were duplicated, featuring four rRNA genes (rrn16, rrn23, rrn4.5, and rrn5), two large subunit of ribosomal proteins genes (rpl2 and rpl23), seven tRNA genes (trnA-UGC, trnI-CAU, trnI-GAU, trnL-CAA, trnN-GUU, trnR-ACG, and trnV-GAC), one subunit of NADH dehydrogenes subunit gene(ndhB) and three other genes (ycf2, ycf15, and ycf1).

      Table 1.  Features of C. meiocarpa and C. oleifera chloroplast genomes.

      Genome featureCKXXGHZP
      Genome size (bp)156,549156,512157,009
      LSC length (bp)86,26386,22486,637
      SSC length (bp)18,40018,40218,290
      IR length (bp)25,94325,94326,041
      Number of genes133133133
      Number of protien-coding genes878787
      Number of pseudo222
      Number of tRNA genes373737
      Number of rRNA genes888
      GC content in LSC (%)35.3335.3435.30
      GC content in SSC (%)30.5830.5730.52
      GC content in IR (%)43.0343.0342.99
      Total GC content (%)37.3237.3337.29
      GenBank numberMZ151356MZ151355MZ151357

      Figure 1. 

      Chloroplast genome map of C. meiocarpa and C. oleifera.

      Table 2.  Features of repetitive sequences in C. meiocarpa and C. oleifera.

      ZHPXGCKX
      Total number494949
      Forward151617
      Palindrome222020
      Reverse999
      Complementary343
      Gene NtrnS-GCU, trnG-GCC, trnS-UGA, trnfM-CAU,
      ndhC, trnV-UACa, petD, ycf2, ndhA, ycf1,
      rpoC2b, rpoBc, trnA-UGCd
      SSR Loci (N)524849
      P1 Loci (N)514749
      Pc Loci (N)110
      LSC403536
      IRA222
      SSC899
      IRB222
      a: special in ZHP; b, c: special in XG; d: special in CKX.
    • The REPuter software results showed that 49 scattered repetitive sequences were detected in HZP, XG, and CKX (Fig. 2a; Supplemental Table S4). In Fig. 2a, the repetitive sequences ranging from 15−19 bp were most prevalent, followed by those ranging from 20−24 bp. These two categories respectively constituted 77.55% for CKX and XG, and 75.51% for HZP. The LSC region had the highest distribution of long repetitive sequences, accounting for 59.18% in CKX, XG, and HZP. However, no repetitive sequences between 25−29 bp and 35−39 bp were observed, and the 30-34 bp sequences appeared exclusively in the LSC region. The IRA region followed in sequence distribution. Besides, the 25−29 bp and 35−39 bp repetitive sequences only be found in the IR region. We also identified four repeat types: Forward, Palindrome, Reverse, and Complementary in CKX, XG, and HZP (Table 2 & Supplemental Table S4). Among them, the palindrome type was the most common, comprising 40.82% in both CKX and XG and 44.90% in HZP, while complementary repeats were the least frequent. Some of the repetitive sequences were located in different genes, including trnS-GCU, trnG-GCC, trnS-UGA, trnfM-CAU, ndhC, trnV-UAC, petD, ycf2, ndhA, ycf1, rpoC2, rpoB, and trnA-UGC.

      Figure 2. 

      Comparison of repetitive sequences in C. meiocarpa and C. oleifera. (a) Number of scattered repetitive by length in different regions. (b) The number and distribution of Simple Sequence Repeats (SSR)s. (c) Frequency of SSRs in A and T.

      The number and distribution of SSRs were also identified as shown in Fig. 2b and Supplemental Table S5. In Fig. 2c, these SSRs, categorized into single-base repeats (repeating 10 or more times), double-base repeats (six or more times), and 3-6 base repeats (five or more times), were found as follows: 49 SSRs each in CKX and XG, and 53 in HZP. Predominantly, these were of the P1 type, with a single C1 type identified in both HZP and XG. The majority were located within the LSC region, representing 73.47% in CKX and XG, and 75.47% in HZP. The SSC region had fewer, with only four SSRs repeated in the IR region. All SSR loci were single-base and formed by A/T. In CK and XG, A and T accounted for 23 (46.94%) and 26 (53.06%), respectively. But in HZP, A and T accounted for 24 (45.28%) and 29 (54.72%), respectively. Notably, 10−12 single-base repeats were most predominant, with 34 in CKX and XG (69.39%), and less than 39 in HZP (73.58%). This data underscores distinct differences in the repetitive sequence patterns between HZP and the CKX/XG genomes.

    • While chloroplast genomes exhibit high conservation in terms of genomic structure and size, the variations in their lengths are commonly attributed to changes in the position of the IR/SC junctions, caused by the expansion and contraction of these boundary regions[46,47]. The junction regions of the 11 tea-oil Camellia chloroplast genomes were examined for comparison. In Fig. 3, across these species, the arrangement of genes at each junction point within the IR regions remained consistent. Notably, the gene rps19 spanned the LSC/IRb region consistently with lengths of 233 bp in the LSC and 46 bp in the IRb across all species. Conversely, the gene rpl2 located in the IRb region showed contractions, with base number variations ranging from 100 to 106 bp across the species. Similarly, the gene ycf1 straddled the SSC/IRa boundary with varying lengths, 4,541 to 4,653 bp in the SSC and 963 to 1,069 bp in the IRa. The gene ndhF exhibited base contractions varying from 5 to 68 bp. Additionally, the gene trnN in the IRa region was consistently positioned 1,275−1,381bp from the SSC/IRa boundary. In particular, the distances of trnH to the SSC/IR junction were 1,275 bp for CKX and XG, and 1,381 bp for HZP, indicating that trnH is located at the edge of the LSC region, merely 1 bp from the SSC/IRa boundary. This analysis highlights that the variations in the expansion and contraction of the IR regions are more pronounced between HZP and CKX (or XG) than between CKX and XG, illustrate distinct genomic adaptations among these tea-oil Camellia species.

      Figure 3. 

      Comparison of the Large single copy (LSC), Inverted repeat (IR), Small single copy (SSC) junction positions among 11 tea-oil Camellia species.

    • To explore the interspecific variation in chloroplast genome sequences, the identity percentage was graphically represented for the 11 tea-oil Camellia accessions utilizing the mVISTA program with C. luteoflora as the reference. In Fig. 4, the divergence in the SSC region compared to the LSC and IR regions, with non-coding regions exhibiting greater divergence than coding regions. The overall alignment revealed a high degree of sequence similarity among the species. Compared to HZP, the variation of the chloroplast genome between CKX and XG was closer. To further understand the variation between C. meiocarpa (CKX and XG) and C. oleifera (HZP), we calculated nucleotide diversity (Pi) values within them. The results showed (Fig. 5a; Supplemental Table S6) that Pi values were low (ranging from 0 to 0.011, average value was 0.0006). Specifically, the SSC region indicated the highest level of variation (average Pi value of 0.00142), followed by LSC (average Pi value of 0.00060), and the lowest was in IRB (average Pi value of 0.00013). The ycf1 had the most mutation sites (72, with an average Pi value of 0.00294), and psbM had the highest level of average Pi. Interestingly, 50 genes exhibited zero nucleotide diversity (Fig. 5b; Supplemental Table S6). Furthermore, we also detected 210 variants, including 72 Indel sites and 138 SNP sites among the three chloroplast genomes (Table 3; Supplemental Table S7). Most Indels were 1 bp in length, constituting 38.89% of all Indel sites, followed by 2 bp lengths at 16.67%, and a single occurrence of a 9 bp Indel. Among the SNPs, transition from G to A were most frequent (26.09%), followed by C to T changes (23.19%), with C to G being the least common (3.62%). The majority of these variations occurred in intergenic regions (118 sites), with significant occurrences also noted in 21 genes, such as accD, atpB, atpF, ccsA, clpP, infA, matK, ndhA and ycf1 et al. In statistics, a total of 140 Indel and SNP sites were located in LSC, 49 sites in SSC, 13 sites in IRA and eight sites in IRB. The gene ycf1 had the highest number of variants (22), while the intergenic region between trnE-UUC and trnT-GGU, along with petN-psbM, contained the most variant sites. These findings underscore the genomic organizational differences and the variability between C. oleifera (HZP) and C. meiocarpa (CKX or XG), highlighting distinct evolutionary trajectories within these species.

      Figure 4. 

      Identity plots comparing the chloroplast genomes of 11 Camellia accessions. The vertical scale indicates the percentage of identity, ranging from 50 to 100%. The horizontal axis indicates the coordinates within the chloroplast genome. Genome regions are color coded, including protein-coding, rRNA, tRNA, intron, and conserved non-coding sequences (CNS).

      Figure 5. 

      The nuclear divergence in C. meiocarpa and C. oleifera chloroplast genomes by (a) sliding window analysis of the whole genomes; (b) gene regions.

      Table 3.  Indel and SNP types among three chloroplast genomes.

      Indel (bp)123456910−2021−Total
      Number (N)28125210516372
      Proportion (%)38.8916.676.942.781.396.941.398.334.17
      SNP typeG/AC/TA/CG/TC/GT/A
      Number (N)36322727511138
      Proportion (%)26.0923.1919.5719.573.627.97
    • In this study, the chloroplast genome information from 27 published species of the Camellia genus within the Theaceae family were combined to reconstruct a phylogenetic tree, thereby inferring the phylogenetic relationships among tea-oil Camellia with two Polyspora species serving as outgroup species (Fig. 6; Supplemental Fig. S1). In Fig. 6, CKX, XG, and HZP formed a cluster within one clade (PP = 1), aligning closely with C. japonica and C. chekiangobleosa in group I (PP = 0.74, Fig. 6a); and in Fig. 6b, CKX, and XG also clustered in one clade and then with C. japonica, C. chekiangobleosa and C. polyodonta (PP = 1), Group I and Group II were clustered in Clade A (PP = 1). Within Group II, C. oleifera (HZP) formed the basal clade, subsequently clustering with C. azalea, C. granthamiana, C. gauchowensis, C. vietnamensis, and C. suaveolens (PP = 1). Despite these classifications, the phylogenetic relationships within Camellia remained complex; for instance, C. crapnelliana was identified as the basal clade in Group I (PP = 1, as depicted in Fig. 6a), yet it appeared as a sister taxon to C. gigantocaroa in Fig. 6b (PP = 0.99). Nonetheless, MP and ML trees (Supplemental Figs S1 & S2) still consistently supported the notion that C. meiocarpa (CKX and XG) was not closely related to C. oleifera (HZP).

      Figure 6. 

      Phylogenetic tree reconstruction of 27 Camellia species based on (a) protein-coding genes and (b) whole chloroplast genome sequences by the Bayesian method.

    • Based on the above results, 56 primers (Supplemental Table S8) were designed to include as many polymorphic sites as possible. The lengths of these target sequences ranged from 99 to 1,553 bp, covering 128 polymorphic sites, which included 89 SNPs and 39 Indels. Each primer pair was capable of detecting 1 to 9 polymorphic sites, with primer ZDJ78 identifying up to nine sites. Specifically, seven primers ZDJ05, ZDJ43, ZDJ64, ZDJ66, ZDJ67, ZDJ68, and ZDJ75 were each able to detect four polymorphic sites. A total of 20 pairs of primers (10 targeted regions in genes and 10 in intergenic regions) had only one mutation site, including 10 SNPs and 10 Indels. These 56 pairs of primers were distributed across 29 genes and 32 intergenic regions, with ycf1 having the highest number of markers (five markers). Sanger sequencing was employed to further verify these regions. We confirmed that 17 of these primers were effective for assessing the polymorphic sites. For example, ZDJ76 detected three polymorphic sites (two SNP sites and one Indel site), ZDJ01 detected two polymorphic sites (one SNP site and one Indel site), and a series of primers: ZDJ03, ZDJ15, ZDJ51, ZDJ54, ZDJ55, ZDJ59, ZDJ69, ZDJ72, ZDJ77, ZDJ80, ZDJ83 and ZDJ85, each detected one SNP site. Additionally, ZDJ45, ZDJ60 and ZDJ84 each identified one Indel site (see Table 4 for detailed results).

      Table 4.  The SNP and Indel in the targeted regions.

      PrimersLociSNPIndel
      ZDJ01TCCACTATTT[C/A]AATTATAAAA10
      ZDJ01CAACCCATAA[C/-]CCATAAAAAT01
      ZDJ03CCCAAAAAAT[G/A]GATTTTGGTT10
      ZDJ15TCAATGGCCC[T/C]CCTACGTAGT10
      ZDJ45TCCCATATAT[T/-]AAATATTAAA01
      ZDJ51ATTGAAAGCT[A/G]GGATTTCTAG10
      ZDJ54AATCCTTGTT[T/G]CGGAGTCGAT10
      ZDJ55ACCAAAAAAT[A/C]TTTTTTGCTT10
      ZDJ59TTCATCTATT[T/C]CATGACCGGA10
      ZDJ60GACCAAGAAG[G/-]ATTCTCTTTC01
      ZDJ69ATAAAAAATT[A/T]CCCCCTGCAA10
      ZDJ72AAAATCATGT[G/A]TTGGTCCAGA10
      ZDJ76TTCAAAATGG[C/-]TTTCAAATTA01
      ZDJ76AAAGAATAGT[A/C]AATTTTTGCA10
      ZDJ76AGAATAATTT[G/T]AATCTTAAAA10
      ZDJ77GTATAACCCC[C/T]TTTTGCTTTC10
      ZDJ80TAAGAATGGG[G/T]GACGGTATTC10
      ZDJ83GAATTCTGTG[A/G]AAAGCCGTAT10
      ZDJ84AAGAGAATCC[T/-]TCTTGGTCGT01
      ZDJ85TCCGGTCATG[A/G]AATAGATGAA10
      In Loci, the variant in left side was C. oleifera and the right side was C. meiocarpa.
    • The taxonomic status and phylogenetic relationships of C. meiocarpa and C. oleifera continue to be hotly debated, significantly affecting germplasm innovation, breeding of new varieties, and industrial development. In the production process of tea-oil Camellia, the fruits of C. meiocarpa are smaller and bear a single seed. Compared to C. oleifera, it exhibits advantages such as a thin fruit peel, high oil content, high seed extraction rate, strong adaptability, disease resistance, and a relatively stable yield. Currently, C. meiocarpa occupies the second largest cultivation area after C. oleifera, leading some researchers to recognize it as a distinct species[48,49]. In this study, a reference-quality chloroplast genome for both C. meiocarpa and C. oleifera was assembled and annotated, revealing a typical quadripartite structure similar in size, gene count, and GC content to other tea-oil Camellia[50,51]. This comparative genomic analysis provides new insights into the phylogeny of tea-oil Camellia, suggesting that despite complex morphological classifications, their chloroplast genomes are relatively conserved[5254].

      Whether C. meiocarpa should be considered as a variety of C. oleifera remains controversial in previous studies[48,55,56]. Here, we were committed to clarifying the relationship between C. meiocarpa and C. oleifera amid ongoing controversies. In morphology, the distinct morphological features such as the number of seeds per fruit and the size of flowers, fruits, and leaves differentiate the species, with C. meiocarpa generally having 1−3 seeds per fruit and smaller morphological features compared to C. oleifera's typically four or more seeds. In cytology, C. meiocarpa is tetraploid, while C. oleifera is hexaploidy[12]. Recent phylogenetic trees constructed from three nuclear regions placed C. meiocarpa with C. vietnamensis, distinct from C. oleifera, which forms the basal clade[57]. The present findings from the chloroplast genomes indicate significant genomic differences, with over 450 bp variation in size between C. meiocarpa (XG and CKX) and C. oleifera (HZP). The analysis of genomic structures and variant sites indicated that genetic divergence between XG and CKX is less pronounced than between either of these and HZP. The phylogenetic trees (Fig. 6; Supplemental Figs S1 & S2) showed C. meiocarpa and C. oleifera did not group together. Instead, XG and CKX clustered closely, distinctly separate from HZP. Combining the evidence of morphology and cytology, we supported the opinion that C. meiocarpa is an independent species[58]. It facilitates a better understanding and innovative utilization of C. meiocarpa and C. oleifera by taxonomists and breeders. This approach is also beneficial for the development of the Camellia oil industry.

    • In the production practice of Camellia oil, the seedlings of C. meiocarpa and C. oleifera are hard to distinguish. Many substitutes and fake seedlings will bring heavy losses in yield and quality of Camellia oil. The application of molecular markers can help to solve this problem by enabling the rapid and accurate identification of specific polymorphisms[59,60]. In contrast to classification systems based on morphological traits, molecular markers provide insights into genetic differences at the DNA level and prove effective in assessing genetic diversity within breeding programs[61]. Among these, chloroplast DNA markers have shown exceptional utility, emerging as a superior tool for the identification and classification of complex species[62]. The diversity of chloroplast genomes is the base for the polymorphic DNA marker development[63]. However, the markers have still not yet been developed for C. meiocarpa and C. oleifera, and that is seriously affecting the production of Tea-oil and appraisal of plasm resources of Tea-oil Camellia. Although the chloroplast genomes of these species show relative conservation, the presence of numerous variations, such as SNPs and Indels, provide a rich source for marker development. In this study, 56 pairs of primers were developed to test polymorphisms in both species. PCR and sequencing results showed that only 17 primers existed mutations, demonstrating their potential to aid in resource evaluation and differentiation between C. monosperma and C. oleifera. The above analysis results provided references for the classification and evaluation between these two species as well as for practical production.

    • The present study primarily investigated the chloroplast genomes of C. meiocarpa and C. oleifera as well as conducted a comparative analysis with other related species within tea-oil Camellia. The genomic size, gene structure, and organization were observed to be conservative and consistent with previous studies in Camellia. Based on the evidence of the chloroplast genome, we supported the idea proposed by Xiansu Hu, that C. meiocarpa is an independent species. The development of 17 primers could be used for the resource assessment of Camellia, facilitating molecular phylogenetic analysis, innovation, utilization of tea-oil Camellia germplasm resources, and their production practice. The present study provided high-quality chloroplast genomes and reliable molecular marker resources for future tea-oil Camellia research.

    • The authors confirm contribution to the paper as follows: study conception and design, project supervision: Zheng D; draft manuscript preparation: Liang H, Qi H; genomes analysis and annotation: Liang H; samples collection and experiments: Qi H, Sun X, Wang C, Xia T, Chen J; data analysis: Wang Y, Ye H, Feng X, Xie S, Gao Y; manuscript revision: Zheng D, Liang H. All authors reviewed the results and approved the final version of the manuscript.

    • The three chloroplast genome sequences of Camellia are deposited in the GenBank of the National Center for Biotechnology Information (NCBI) repository, accession numbers MZ151355 (XG), MZ151356 (CKX) and MZ151357 (HZP).

      • This study was supported by the Project of Sanya Yazhou Bay Science and Technology City (Grant No. SCKJ-JYRC-202258), Southern Breeding Project of Sanya National Southen Breeding Research Academy of Chinese Academy of Agricultural Sciences (Grant No. YYLH10), the National Natural Science Foundation of China (31860082), Hainan Province Science and Technology Special Fund (FW20230002), Scientific and technological innovation team of Hainan Academy of Agricultural Sciences (HAAS2023TDYD05), introduce talents to initiate scientific research projects of Hainan Academy of Agricultural Sciences (HAAS2023RCQD13).

      • The authors declare that they have no conflict of interest.

      • Received 21 March 2024; Accepted 26 April 2024; Published online 24 July 2024

      • Compared to C. oleifera (HZP), there were differences ranging between 460 bp (CKX) and 490 bp (XG) in C. meiocarpa.

        C. meiocarpa was considered as a separated species.

        The development of 17 primers could be used for the resource assessment of Camellia.

      • # Authors contributed equally: Heng Liang, Huasha Qi

      • Supplemental Table S1 The GenBank accession numbers of 8 species using in comparative analysis.
      • Supplemental Table S2 The GenBank accession numbers of 26 species using in phylogenetic analysis.
      • Supplemental Table S3 Genes contained in the chloroplast genome sequence of XG, CKX and HZP.
      • Supplemental Table S4 Scattered repetitive sequences in CKX, Scattered repetitive sequences in XG, Scattered repetitive sequences in HZP.
      • Supplemental Table S5 Features of SSR in HZP, Features of SSR in XG, Features of SSR in CKX.
      • Supplemental Table S6 The pi values in XG, CKX and HZP.
      • Supplemental Table S7 The features of indel and snp in XG, CKX and HZP.
      • Supplemental Table S8 PCR primers used for amplification of the candidate barcode regions.
      • Supplemental Fig. S1 Phylogenetic tree reconstruction of 27 Camellia species based on protein-coding genes by (A) ML methods and (B) MP methods.
      • Supplemental Fig. S2 Phylogenetic tree reconstruction of 27 Camellia species based on whole chloroplast genome sequences by (A) ML methods and (B) MP methods.
      • Copyright: © 2024 by the author(s). Published by Maximum Academic Press on behalf of Hainan University. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.
    Figure (6)  Table (4) References (63)
  • About this article
    Cite this article
    Liang H, Qi H, Wang Y, Sun X, Wang C, et al. 2024. Comparative chloroplast genome analysis of Camellia oleifera and C. meiocarpa: phylogenetic relationships, sequence variation and polymorphic markers. Tropical Plants 3: e023 doi: 10.48130/tp-0024-0022
    Liang H, Qi H, Wang Y, Sun X, Wang C, et al. 2024. Comparative chloroplast genome analysis of Camellia oleifera and C. meiocarpa: phylogenetic relationships, sequence variation and polymorphic markers. Tropical Plants 3: e023 doi: 10.48130/tp-0024-0022

Catalog

  • About this article

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return