2023 Volume 3
Article Contents
About this article
ARTICLE   Open Access    

LncRNA evolution and DNA methylation variation participate in photosynthesis pathways of distinct lineages of Populus

More Information
  • During the independent process of evolution in plants, photosynthesis appears to have been under convergent evolution to adapt to specific selection pressure in their geographical regions. However, it is unclear how lncRNA regulation and DNA methylation are involved in the phenotypic convergence in distinct lineages. Here, we present a large-scale comparative study of lncRNA transcription profile and whole-genome bisulfite sequencing (WGBS) data in two unrelated Populus species, selected from three relatively overlapping geographical regions. The results indicated that 39.75% lncRNAs of Populus tomentosa were shown to have homologous sequences in the 46.99% lncRNA of Populus simonii. Evolutionary analysis revealed that lncRNAs showed a rapid gain rate in the Populus lineage. Furthermore, co-expression networks in two Populus species identified eight lncRNAs that have the potential to simultaneously cis- or trans-regulate eight photosynthetic-related genes. These photosynthetic lncRNAs and genes were predominantly expressed in accessions from the southern region, indicating a conserved spatial expression in photosynthetic pathways in Populus. We also detected that most lncRNA targeted photosynthetic genes hypomethylated in promoter regions of Southern accessions compared with Northern accessions. Geographical DMRs correlated with genetic SNP variations in photosynthetic genes among Populus from the three geographic regions, indicating that DNA methylation coordinated with lncRNAs in convergent evolution of photosynthesis in Populus. Our results shed light on the evolutionary forces acting on patterns of lncRNA and DNA methylation, and provide a better understanding of the genetic and epigenetic mechanism in photosynthetic convergence evolution.
  • Columnar cacti are plants of the Cactaceae family distributed across arid and semi-arid regions of America, with ecological, economic, and cultural value[1]. One trait that makes it possible for the columnar cactus to survive in the desert ecosystem is its thick epidermis covered by a hydrophobic cuticle, which limits water loss in dry conditions[1]. The cuticle is the external layer that covers the non-woody aerial organs of land plants. The careful control of cuticle biosynthesis could produce drought stress tolerance in relevant crop plants[2]. In fleshy fruits, the cuticle maintains adequate water content during fruit development on the plant and reduces water loss in fruit during postharvest[3]. Efforts to elucidate the molecular pathway of cuticle biosynthesis have been carried out for fleshy fruits such as tomato (Solanum lycopersicum)[4], apple (Malus domestica)[5], sweet cherry (Prunus avium)[6], mango (Mangifera indica)[7], and pear (Pyrus 'Yuluxiang')[8].

    The plant cuticle is formed by the two main layers cutin and cuticular waxes[3]. Cutin is composed mainly of oxygenated long-chain (LC) fatty acids (FA), which are synthesized by cytochrome p450 (CYP) enzymes. CYP family 86 subfamily A (CYP86A) enzymes carry out the terminal (ω) oxidation of LC-FA[9]. Then, CYP77A carries out the mid-chain oxidation to synthesize the main cutin monomers. In Arabidopsis, AtCYP77A4 and AtCYP77A6 carry out the synthesis of mid-chain epoxy and mid-chain dihydroxy LC-FA, respectively[10,11]. AtCYP77A6 is required for the cutin biosynthesis and the correct formation of floral surfaces[10]. The expression of CYP77A19 (KF410855) and CYP77A20 (KF410856) from potato (Solanum tuberosum) restored the petal cuticular impermeability in Arabidopsis null mutant cyp77a6-1, tentatively by the synthesis of cutin monomers[12]. In eggplant (Solanum torvum), the over-expression of StoCYP77A2 leads to resistance to Verticillium dahlia infection in tobacco plants[13]. Although the function of CYP77A2 in cutin biosynthesis has not yet been tested, gene expression analysis suggests that CaCYP77A2 (A0A1U8GYB0) could play a role in cutin biosynthesis during pepper fruit development[14].

    It has been hypothesized that the export of cuticle precursors is carried out by ATP binding cassette subfamily G (ABCG) transporters. ABCG11/WBC11, ABCG12, and ABCG13 are required for the load of cuticle lipids in Arabidopsis[1517], but ABCG13 function appears to be specific to the flower epidermis[18]. The overexpression of TsABCG11 (JQ389853) from Thellungiella salsugineum increases cuticle amounts and promotes tolerance to different abiotic stresses in Arabidopsis[19].

    Once exported, the cutin monomers are polymerized on the surface of epidermal cells. CD1 code for a Gly-Asp-Ser-Leu motif lipase/esterase (GDSL) from tomato required for the cutin formation through 2-mono(10,16-dihydroxyhexadecanoyl)glycerol esterification[20]. GDSL1 from tomato carries out the ester bond cross-links of cutin monomers located at the cuticle layers and is required for cuticle deposition in tomato fruits[21]. It has been shown that the transcription factor MIXTA-like reduces water loss in tomato fruits through the positive regulation of the expression of CYP77A2, ABCG11, and GDSL1[22]. Despite the relevant role of cuticles in maintaining cactus homeostasis in desert environments[1], the molecular mechanism of cuticle biosynthesis has yet to be described for cactus fruits.

    Stenocereus thurberi is a columnar cactus endemic from the Sonoran desert (Mexico), which produces an ovoid-globose fleshy fruit named sweet pitaya[23]. In its mature state, the pulp of sweet pitaya contains around 86% water with a high content of antioxidants and natural pigments such as betalains and phenolic compounds, which have nutraceutical and industrial relevance[23]. Due to the arid environment in which pitaya fruit grows, studying its molecular mechanism of cuticle biosynthesis can generate new insights into understanding species' adaptation mechanisms to arid environments. Nevertheless, sequences of transcripts from S. thurberi in public databases are scarce.

    RNA-sequencing technology (RNA-seq) allows the massive generation of almost all the transcripts from non-model plants, even if no complete assembled genome is available[24]. Recent advances in bioinformatic tools has improved our capacity to identify long non-coding RNA (lncRNA), which have been showed to play regulatory roles in relevant biological processes, such as the regulation of drought stress tolerance in plants[25], fruit development, and ripening[2629].

    In this study, RNA-seq data were obtained for the de novo assembly and characterization of the S. thurberi fruit peel transcriptome. As a first approach, three transcripts, StCYP77A, StABCG11, and StGDSL1, tentatively involved in cuticle biosynthesis, were identified and quantified during sweet pitaya fruit development. Due to no gene expression analysis having been carried out yet for S. thurberi, stably expressed constitutive genes were identified for the first time.

    Sweet pitaya fruits (S. thurberi) without physical damage were hand harvested from plants in a native conditions field located at Carbó, Sonora, México. They were collocated in a cooler containing dry ice and transported immediately to the laboratory. The superficial part of the peels (~1 mm deep) was removed carefully from the fruits using a scalpel. Peel samples from three fruits were pooled according to their tentative stage of development defined by their visual characteristics, frozen in liquid nitrogen, and pulverized to create a single biological replicate. Four samples belonging to four different plants were analyzed. All fruits harvested were close to the ripening stage. Samples named M1 and M2 were turning from green to ripe [~35−40 Days After Flowering (DAF)], whereas samples M3 and M4 were turning from ripe to overripe (~40−45 DAF).

    Total RNA was isolated from the peels through the Hot Borate method[30]. The concentration and purity of RNA were determined in a spectrophotometer Nanodrop 2000 (Thermo Fisher) by measuring the 260/280 and 260/230 absorbance ratios. RNA integrity was evaluated through electrophoresis in agarose gel 1% and a Bioanalyzer 2100 (Agilent). Pure RNA was sequenced in the paired-end mode in an Illumina NextSeq 500 platform at the University of Arizona Genetics Core Facility. Four RNA-seq libraries, each of them from each sample, were obtained, which include a total of 288,199,704 short reads with a length of 150 base pairs (bp). The resulting sequence data can be accessed at the Sequence Read Archive (SRA) repository of the NCBI through the BioProject ID PRJNA1030439. Libraries are named corresponding to the names of samples M1, M2, M3, and M4.

    FastQC software (www.bioinformatics.babraham.ac.uk/projects/fastqc) was used for short reads quality analysis. Short reads with poor quality were trimmed or eliminated by Trimmomatic (www.usadellab.org/cms/?page=trimmomatic) with a trailing and leading of 25, a sliding window of 4:25, and a minimum read length of 80 bp. A total of 243,194,888 reads with at least a 25 quality score on the Phred scale were used to carry out the de novo assembly by Trinity (https://github.com/trinityrnaseq/trinityrnaseq/wiki) with the following parameters: minimal k-mer coverage of 1, normalization of 50, and minimal transcript length of 200 bp.

    Removal of contaminating sequences and ribosomal RNA (rRNA) was carried out through SeqClean. To remove redundancy, transcripts with equal or more than 90% of identity were merged through CD-hit (www.bioinformatics.org/cd-hit/). Alignment and quantification in terms of transcripts per million (TPM) were carried out through Bowtie (https://bowtie-bio.sourceforge.net/index.shtml) and RSEM (https://github.com/deweylab/RSEM), respectively. Transcripts showing a low expression (TPM < 0.01) were discarded. Assembly quality was evaluated by calculating the parameters N50 value, mean transcript length, TransRate score, and completeness. The statistics of the transcriptome were determined by TrinityStats and TransRate (https://hibberdlab.com/transrate/). The transcriptome completeness was determined through a BLASTn alignment (E value < 1 × 10−3) by BUSCO (https://busco.ezlab.org/) against the database of conserved orthologous genes from Embryophyte.

    To predict the proteins tentatively coded in the S. thurberi transcriptome, the best homology match of the assembled transcripts was found by alignment to the Swiss-Prot, RefSeq, nr-NCBI, PlantTFDB, iTAK, TAIR, and ITAG databases using the BLAST algorithm with an E value threshold of 1 × 10−10 for the nr-NCBI database and of 1 × 10−5 for the others[3134]. An additional alignment was carried out to the protein databases of commercial fruits Persea americana, Prunus persica, Fragaria vesca, Citrus cinensis, and Vitis vinifera to proteins of the cactus Opuntia streptacantha, and the transcriptomes of the cactus Hylocereus polyrhizus, Pachycereus pringlei, and Selenicereus undatus. The list of all databases and the database websites of commercial fruits and cactus are provided in Supplementary Tables S1 & S2. The open reading frame (ORF) of the transcripts and the protein sequences tentative coded from the sweet pitaya transcriptome was predicted by TransDecoder (https://github.com/TransDecoder/TransDecoder/wiki), considering a minimal ORF length of 75 amino acids (aa). The search for protein domains was carried out by the InterPro database (www.ebi.ac.uk/interpro). Functional categorization was carried out by Blast2GO based on GO terms and KEGG metabolic pathways[35].

    LncRNA were identified based on the methods reported in previous studies[25,29,36]. Transcripts without homology to any protein from Swiss-Prot, RefSeq, nr-NCBI, PlantTFDB, iTAK, TAIR, ITAG, P. americana, P. persica, F. vesca, C. cinensis, V. vinifera, and O. streptacantha databases, without a predicted ORF longer than 75 aa, and without protein domains in the InterPro database were selected to identify tentative lncRNA.

    Transcripts coding for signal peptide or transmembrane helices were identified by SignalP (https://services.healthtech.dtu.dk/services/SignalP-6.0/) and TMHMM (https://services.healthtech.dtu.dk/services/TMHMM-2.0/), respectively, and discarded. Further, transcripts corresponding to other non-coding RNAs (ribosomal RNA and transfer RNA) were identified through Infernal by using the Rfam database[37] and discarded. The remaining transcripts were analyzed by CPC[38], and CPC2[39] to calculate their coding potential. Transcripts with a coding potential score lower than −1 for CPC and a coding probability lower than 0.1 for CPC2 were considered lncRNA. To characterize the identified lncRNA, the length and abundance of coding and lncRNA were calculated. Bowtie and RSEM were used to align and quantify raw counts, respectively. The edgeR package[40] was used to normalize raw count data in terms of counts per million (CPM) for both coding and lncRNA.

    To obtain the transcript's expression, the aligning of short reads and quantifying of transcripts were carried out through Bowtie and RSEM software, respectively. A differential expression analysis was carried out between the four libraries by edgeR package in R Studio. Only the transcripts with a count equal to or higher than 0.5 in at least one sample were retained for the analysis. Transcripts with log2 Fold Change (log2FC) between +1 and −1 and with a False Discovery Rate (FDR) lower than 0.05 were taken as not differentially expressed (NDE).

    For the identification of the tentative reference genes two strategies were carried out as described below: i) The NDE transcripts were aligned by BLASTn (E value < 1 × 10−5) to 43 constitutive genes previously reported in fruits from the cactus H. polyrhizus, S. monacanthus, and S. undatus[4143] to identify possible homologous constitutive genes in S. thurberi. Then, the homologous transcripts with the minimal coefficient of variation (CV) were selected; ii) For all the NDE transcripts, the percentile 95 value of the mean CPM and the percentile 5 value of the CV were used as filters to recover the most stably expressed transcripts, based on previous studies[44]. Finally, transcripts to be tested by quantitative reverse transcription polymerase chain reaction (qRT-PCR) were selected based on their homology and tentative biological function.

    The fruit harvesting was carried out as described above. Sweet pitaya fruit takes about 43 d to ripen, therefore, open flowers were tagged, and fruits with 10, 20, 30, 35, and 40 DAF were collected to cover the pitaya fruit development process (Supplementary Fig. S1). The superficial part of the peels (~1 mm deep) was removed carefully from the fruits using a scalpel. Peel samples from three fruits were pooled according to their stage of development defined by their DAF, frozen in liquid nitrogen, and pulverized to create a single biological replicate. One biological replicate consisted of peels from three fruits belonging to the same plant. Two to three biological replicates were evaluated for each developmental stage. Two technical replicates were analyzed for each biological replicate. RNA extraction, quantification, RNA purity, and RNA integrity analysis were carried out as described above.

    cDNA was synthesized from 100 ng of RNA by QuantiTect Reverse Transcription Kit (QIAGEN). Primers were designed using the PrimerQuest™, UNAFold, and OligoAnalyzer™ tools from Integrated DNA Technologies (www.idtdna.com/pages) and following the method proposed by Thornton & Basu[45]. Transcripts quantification was carried out in a QIAquant 96 5 plex according to the PowerUp™ SYBR™ Green Master Mix protocol (Applied Biosystems), with a first denaturation step for 2 min at 95 °C, followed by 40 cycles of denaturation step at 95 °C for 15 s, annealing and extension steps for 30 s at 60 °C.

    The Cycle threshold (Ct) values obtained from the qRT-PCR were analyzed through the algorithms BestKeeper, geNorm, NormFinder, and the delta Ct method[46]. RefFinder (www.ciidirsinaloa.com.mx/RefFinder-master/) was used to integrate the stability results and to find the most stable expressed transcripts in sweet pitaya fruit peel during development. The pairwise variation value (Vn/Vn + 1) was calculated through the geNorm algorithm in R Studio software[47].

    An alignment of 17 reported cuticle biosynthesis genes from model plants were carried out by BLASTx against the predicted proteins from sweet pitaya. Two additional alignments of 17 charaterized cuticle biosynthesis proteins from model plants against the transcripts and predicted proteins of sweet pitaya were carried out by tBLASTn and BLASTp, respectively. An E value threshold of 1 × 10−5 was used, and the unique best hits were recovered for all three alignments. The sequences of the 17 characterized cuticle biosynthesis genes and proteins from model plants are showed in Supplementary Table S3. The specific parameters and the unique best hits for all the alignments carried out are shown in Supplementary Tables S4S8.

    Cuticle biosynthesis-related transcripts tentatively coding for a cytochrome p450 family 77 subfamily A (CYP77A), a Gly-Asp-Ser-Leu motif lipase/esterase 1 (GDSL1), and an ATP binding cassette transporter subfamily G member 11 (ABCG11) were identified by best bi-directional hit according to the functional annotation described above. Protein-conserved domains, signal peptide, and transmembrane helix were predicted through InterProScan, SignalP 6.0, and TMHMM, respectively. Alignment of the protein sequences to tentative orthologous of other plant species was carried out by the MUSCLE algorithm[48]. A neighbor-joining (NJ) phylogenetic tree with a bootstrap of 1,000 replications was constructed by MEGA11[49].

    Fruit sampling, primer design, RNA extraction, cDNA synthesis, and transcript quantification were performed as described above. Relative expression was calculated according to the 2−ΔΔCᴛ method[50]. The sample corresponding to 10 DAF was used as the calibrator. The transcripts StEF1a, StTUA, StUBQ3, and StEF1a + StTUA were used as normalizer genes.

    Normality was assessed according to the Shapiro-Wilk test. Significant differences in the expression of the cuticle biosynthesis-related transcripts between fruit developmental stages were determined by one-way ANOVA based on a completely randomized sampling design and a Tukey honestly significant difference (HSD) test, considering a p-value < 0.05 as significant. Statistical analysis was carried out through the stats package in R Studio.

    RNA was extracted from the peels of ripe sweet pitaya fruits (S. thurberi) from plants located in the Sonoran Desert, Mexico. Four cDNA libraries were sequenced in an Illumina NextSeq 500 platform at the University of Arizona Genetics Core Facility. A total of 288,199,704 reads with 150 base pairs (bp) in length were sequenced in paired-end mode. After trimming, 243,194,888 (84.38%) cleaned short reads with at least 29 mean quality scores per read in the Phred scale and between 80 to 150 bp in length were obtained to carry out the assembly. After removing contaminating sequences, redundancy, and low-expressed transcripts, the assembly included 174,449 transcripts with an N50 value of 2,110 bp. Table 1 shows the different quality variables of the S. thurberi fruit peel transcriptome. BUSCO score showed that 85.4% are completed transcripts, although out of these, 37.2% were found to be duplicated. The resulting sequence data can be accessed at the SRA repository of the NCBI through the BioProject ID PRJNA1030439.

    Table 1.  Quality metrics of the Stenocereus thurberi fruit peel transcriptome.
    Metric Data
    Total transcripts 174,449
    N50 2,110
    Smallest transcript length (bp) 200
    Largest transcript length (bp) 19,114
    Mean transcript length (bp) 1,198.69
    GC (%) 41.33
    Total assembled bases 209,110,524
    TransRate score 0.05
    BUSCO score (%) C: 85.38 (S:48.22, D:37.16),
    F: 10.69, M: 3.93.
    Values were calculated through the TrinityStats function of Trinity and TransRate software. Completeness analysis was carried out through BUSCO by aligning the transcriptome to the Embryophyte database through BLAST with an E value threshold of 1 × 10−3. Complete (C), single (S), duplicated (D), fragmented (F), missing (M).
     | Show Table
    DownLoad: CSV

    A summary of the homology search in the main public protein database for the S. thurberi transcriptome is shown in Supplementary Table S1. From these databases, the higher homologous transcripts were found in RefSeq with 93,993 (53.87 %). Based on the E value distribution, for 41,685 (44%) and 68,853 (49%) of the hits, it was found a strong homology (E value lower than 1 × 10−50) to proteins in the Swiss-Prot and RefSeq databases, respectively (Supplementary Fig. S2a & b). On the other hand, 56,539 (52.34%) and 99,599 (71.11%) of the matches showed a percentage of identity higher than 60% in the Swiss-Prot and RefSeq databases, respectively (Supplementary Fig. S2c & d).

    Figure 1 shows the homology between transcripts from S. thurberi and proteins of commercial fruits, as well as proteins and transcripts of cacti. Transcripts from S. thurberi homologous to proteins from fruits of commercial interest avocado (P. americana), peach (P. persica), strawberry (F. vesca), orange (C. sinensis), and grapefruit (V. vinifera) ranged from 77,285 (44.30%) to 85,421 (48.96%), with 70,802 transcripts homologous to all the five fruit protein databases (Fig. 1a).

    Transcripts homologous to transcripts or proteins from the cactus dragon fruit (H. polyrhizus), prickly pear cactus (O. streptacantha), Mexican giant cardon (P. pringlei), and pitahaya (S. undatus) ranged from 76,238 (43.70%) to 114,933 (65.88%), with 64,009 transcripts homologous to all the four cactus databases (Fig. 1b). Further, out of the total of transcripts, 44,040 transcripts (25.25%) showed homology only to sequences from cactus, but not for model plants Arabidopsis, tomato, or the commercial fruits included in this study (Fig. 1c).

    Figure 1.  Venn diagram of the homology search results against model plants databases, commercial fruits, and cactus. The number in the diagram corresponds to the number of transcripts from S. thurberi homologous to sequences from that plant species. (a) Homologous to sequences from Fragaria vesca (Fa), Persea americana (Pa), Prunus persica (Pp), Vitis vinifera (Vv), and Citrus sinensis (Cs). (b) Homologous to sequences from Opuntia streptacantha (Of), Selenicereus undatus (Su), Hylocereus polyrhizus (Hp), and Pachycereus pringlei (Pap). (c) Homologous to sequences from Solanum lycopersicum (Sl), Arabidopsis thaliana (At), from the commercial fruits (Fa, Pa, Pp, Vv, and Cs), or the cactus included in this study (Of, Su, Hp, and Pap). Homologous searching was carried out by BLAST alignment (E value < 1 × 10−5). The Venn diagrams were drawn by ggVennDiagram in R Studio.

    A total of 45,970 (26.35%), 58,704 (33.65%), and 48,186 (27.65%) transcripts showed homology to transcription factors, transcriptional regulators, and protein kinases in the PlantTFDB, iTAK-TR, and iTAK-PK databases, respectively (Supplementary Tables S1, S9S11). For the PlantTFDB, the homologous transcripts belong to 57 transcriptional factors (TF) families (Fig. 2 & Supplementary Table S9), from which, the most frequent were the basic-helix-loop-helix (bHLH), myeloblastosis-related (MYB-related), NAM, ATAF, and CUC (NAC), ethylene responsive factor (ERF), and the WRKY domain families (WRKY) (Fig. 2).

    Figure 2.  Transcription factor (TF) families distribution of S. thurberi fruit peel transcriptome. The X-axis indicates the number of transcripts with hits to each TF family. Alignment to the PlantTFDB database by BLASTx was carried out with an E value threshold of 1 × 10−5. The bar graph was drawn by ggplot2 in R Studio.

    Based on the homology found and the functional domain searches, gene ontology terms (GO) were assigned to 68,559 transcripts (Supplementary Table S12). Figure 3 shows the top 20 GO terms assigned to the S. thurberi transcriptome, corresponding to the Biological Processes (BP) and Molecular Function (MF) categories. For BP, organic substance metabolic processes, primary metabolic processes, and cellular metabolic processes showed a higher number of transcripts (Supplementary Table S13). Further, for MF, organic cyclic compound binding, heterocyclic compound binding, and ion binding were the processes with the higher number of transcripts. S. thurberi transcripts were classified into 142 metabolic pathways from the KEGG database (Supplementary Table S14). The pathways with the higher number of transcripts recorded were pyruvate metabolism, glycerophospholipid metabolism, glycolysis/gluconeogenesis, and citrate cycle. Further, among the top 20 KEEG pathways, the cutin, suberin, and wax biosynthesis include more than 30 transcripts (Fig. 4).

    Figure 3.  Top 20 Gene Ontology (GO) terms assigned to the S. thurberi fruit peel transcriptome. Bars indicate the number of transcripts assigned to each GO term. Assignment of GO terms was carried out by Blast2GO with default parameters. BP and MF mean Biological Processes and Molecular Functions GO categories, respectively. The graph was drawn by ggplot2 in R Studio.
    Figure 4.  Top 20 KEGG metabolic pathways distribution in the S. thurberi fruit peel transcriptome. Bars indicate the number of transcripts assigned to each KEGG pathway. Assignment of KEGG pathways was carried out in the Blast2GO suite. The bar graph was drawn by ggplot2 in R Studio.

    Out of the total of transcripts, 43,391 (24.87%) were classified as lncRNA (Supplementary Tables S15 & S16). Figure 5 shows a comparison of the length (Fig. 5a) and expression (Fig. 5b) of lncRNA and coding RNA. Both length and expression values were higher in coding RNA than in lncRNA. In general, coding RNA ranged from 201 to 18,629 bp with a mean length of 1,507.18, whereas lncRNA ranged from 200 to 5,198 bp with a mean length of 481.51 (Fig. 5a). The higher expression values recorded from coding RNA and lncRNA were 12.83 and 9.45 log2(CPM), respectively (Fig. 5b).

    Figure 5.  Comparison of coding RNA and long non-coding RNA (lncRNA) from S. thurberi transcriptome. (a) Box plot of transcript length distribution. The Y-axis indicates the length of each transcript in base pairs. (b) Box plot of expression levels. The Y-axis indicates the log2 of the count per million of reads (log2(CPM)) recorded for each transcript. Expression levels were calculated by the edgeR package in R studio. (a), (b) The lines inside the boxes indicate the median. The higher and lower box limits represent the 75th and 25th percentiles, respectively. The box plots were drawn by ggplot2 in R Studio.

    To identify the transcripts without significant changes in expression between the four RNA-seq libraries, a differential expression analysis was carried out. Of the total of transcripts, 4,980 were not differentially expressed (NDE) at least in one paired comparison between the libraries (Supplementary Tables S17S20). Mean counts per million of reads (CPM) and coefficient of variation (CV)[44] were calculated for these NDE transcripts. Transcripts with a CV value lower than 0.113, corresponding with the percentile 5 of the CV, and a mean CPM higher than 1,138.06, corresponding with the percentile 95 of the mean CPM were used as filters to identify the most stably expressed transcripts (Supplementary Table S21). Based on its homology and its tentative biological function, five transcripts were selected to be tested as tentative reference genes. Besides, three NDE transcripts homologous to previously identified stable expressed reference genes in other species of cactus fruit[4143] were selected (Supplementary Table S22). Homology metrics for the eight tentative reference genes selected are shown in Supplementary Table S23. The primer sequences used to amplify the transcripts by qRT-PCR and their nucleotide sequence are shown in Supplementary Tables S24 & S25, respectively.

    The amplification specificity of the eight candidate reference genes determined by melting curves analysis is shown in Supplementary Fig. S3. For the eight tentative reference transcripts selected, the cycle threshold (Ct) values were recorded during sweet pitaya fruit development by qRT-PCR (Supplementary Table S26). The Ct values obtained ranged from 16.85 to 30.26 (Fig. 6a). Plastidic ATP/ADP-transporter (StTLC1) showed the highest Ct values with a mean of 27.34 (Supplementary Table S26). Polyubiquitin 3 (StUBQ3) showed the lowest Ct values in all five sweet pitaya fruit developmental stages (Fig. 6a).

    Figure 6.  Expression stability analysis of tentative reference genes. (a) Box plot of cycle threshold (Ct) distribution of candidate reference genes during sweet pitaya fruit development (10, 20, 30, 35, and 40 d after flowering). The black line inside the box indicates the median. The higher and lower box limits represent the 75th and 25th percentiles, respectively. (b) Bar chart of the geometric mean (geomean) of ranking values calculated by RefFinder for each tentative reference gene (X-axis). The lowest values indicate the best reference genes. (c) Bar chart of the pairwise variation analysis and determination of the optimal number of reference genes by the geNorm algorithm. A pairwise variation value lower than 0.15 indicates that the use of Vn/Vn + 1 reference genes is reliable for the accurate normalization of qRT-PCR data. The Ct data used in the analysis were calculated by qRT-PCR in a QIAquant 96 5 plex (QIAGEN) according to the manufacturer's protocol. The box plot and the bar graphs were drawn by ggplot2 and Excel programs, respectively. Abbreviations: Actin 7 (StACT7), alpha-tubulin (StTUA), elongation factor 1-alpha (StEF1a), COP1-interactive protein 1 (StCIP1), plasma membrane ATPase 4 (StPMA4), BEL1-like homeodomain protein 1 (StBLH1), polyubiquitin 3 (StUBQ3), and plastidic ATP/ADP-transporter (StTLC1).

    The best stability values calculated by NormFinder were 0.45, 0.51, 0.97, and 0.99, corresponding to the transcripts elongation factor 1-alpha (StEF1a), alpha-tubulin (StTUA), plastidic ATP/ADP-transporter (StTLC1), and actin 7 (StACT7), respectively (Supplementary Table S27). For BestKeeper, the most stable expressed transcripts were StUBQ3, StTUA, and StEF1a, with values of 0.72, 0.75, and 0.87, respectively. In the case of the delta Ct method[51], the transcripts StEF1a, StTUA, and StTLC1 showed the best stability.

    According to geNorm analysis, the most stable expressed transcripts were StTUA, StEF1a, StUBQ3, and StACT7, with values of 0.74, 0.74, 0.82, and 0.96, respectively. All the pairwise variation values (Vn/Vn + 1) were lower than 0.15, ranging from 0.019 for V2/V3 to 0.01 for V6/V7 (Fig. 6c). The V value of 0.019 obtained for V2/V3 indicates that the use of the best two reference genes StTUA and StEF1a is reliable enough for the accurate normalization of qRT-PCR data, therefore no third reference gene is required[47]. Except for BestKeeper analysis, StEF1a and StTUA were the most stable transcripts for all of the methods carried out in this study (Supplementary Table S27). The comprehensive ranking analysis indicates that StEF1a, followed by StTUA and StUBQ3, are the most stable expressed genes and are stable enough to be used as reference genes in qRT-PCR analysis during sweet pitaya fruit development (Fig. 6b).

    Three cuticle biosynthesis-related transcripts TRINITY_DN17030_c0_g1_i2, TRINITY_DN15394_c0_g1_i1, and TRINITY_DN23528_c1_g1_i1 tentatively coding for the enzymes cytochrome p450 family 77 subfamily A (CYP77A), Gly-Asp-Ser-Leu motif lipase/esterase 1 (GDSL1), and an ATP binding cassette transporter subfamily G member 11 (ABCG11/WBC11), respectively, were identified and quantified. The nucleotide sequence and predicted amino acid sequences of the three transcripts are shown in Supplementary File 1. The best homology match for StCYP77A (TRINITY_DN17030_c0_g1_i2) was for AtCYP77A4 (AT5G04660) from Arabidopsis and SmCYP77A2 (P37124) from eggplant (Solanum melongena) in the TAIR and Swiss-Prot databases, respectively (Supplementary Table S23).

    TransDecoder, InterPro, and TMHMM analysis showed that StCYP77A codes a polypeptide of 518 amino acids (aa) in length that comprises a cytochrome P450 E-class domain (IPR002401) and a transmembrane region (residues 10 to 32). The phylogenetic tree constructed showed that StCYP77A is grouped in a cluster with all the CYP77A2 proteins included in this analysis, being closer to CYP77A2 (XP_010694692) from B. vulgaris and Cgig2_012892 (KAJ8441854) from Carnegiea gigantean (Supplementary Fig. S4).

    StGDSL1 (TRINITY_DN15394_c0_g1_i1) alignment showed that it is homologous to a GDSL esterase/lipase from Arabidopsis (Q9LU14) and tomato (Solyc03g121180) (Supplementary Table S23). TransDecoder, InterPro, and SignalP analysis showed that StGDSL1 codes a polypeptide of 354 aa in length that comprises a GDSL lipase/esterase domain IPR001087 and a signal peptide with a cleavage site between position 25 and 26 (Supplementary Fig. S5).

    Supplementary Figure S6 shows the analysis carried out on the predicted amino acid sequence of StABCG11 (TRINITY_DN23528_c1_g1_i1). The phylogenetic tree constructed shows three clades corresponding to the ABCG13, ABCG12, and ABCG11 protein classes with bootstrap support ranging from 40% to 100% (Supplementary Fig. S6a). StABCG11 is grouped with all the ABCG11 transporters included in this study in a well-separated clade, being closely related to its tentative ortholog from C. gigantean Cgig2_004465 (KAJ8441854). InterPro and TMHMM results showed that the StABCG11 sequence contains an ABC-2 type transporter transmembrane domain (IPR013525; PF01061.27) with six transmembrane helices (Supplementary Fig. S6b).

    The predicted protein sequence of StABCG11 is 710 aa in length, holding the ATP binding domain (IPR003439; PF00005.30) and the P-loop containing nucleoside triphosphate hydrolase domain (IPR043926; PF19055.3) of the ABC transporters of the G family. Multiple sequence alignment shows that the Walker A and B motif sequence and the ABC signature[15] are conserved between the ABCG11 transporters from Arabidopsis, tomato, S. thurberi, and C. gigantean (Supplementary Fig. S6c).

    According to the results of the expression stability analysis (Fig. 6), four normalization strategies were tested to quantify the three cuticle biosynthesis-related transcripts during sweet pitaya fruit development. The four strategies consist of normalizing by StEF1a, StTUA, StUBQ3, or StEF1a+StTUA. Primer sequences used to quantify the transcripts StCYP77A (TRINITY_DN17030_c0_g1_i2), StGDSL1 (TRINITY_DN15394_c0_g1_i1), and StABCG11 (TRINITY_DN23528_c1_g1_i1) by qRT-PCR during sweet pitaya fruit development are shown in Supplementary Table S24.

    The three cuticle biosynthesis-related transcripts showed differences in expression during sweet pitaya fruit development (Supplementary Table S28). The same expression pattern was recorded for the three cuticle biosynthesis transcripts when normalization was carried out by StEF1a, StTUA, StUBQ3, or StEF1a + StTUA (Fig. 7). A higher expression of StCYP77A and StGDSL1 are shown at the 10 and 20 DAF, showing a decrease at 30, 35, and 40 DAF. StABCG11 showed a similar behavior, with a higher expression at 10 and 20 DAF and a reduction at 30 and 35 DAF. Nevertheless, unlike StCYP77A and StGDSL1, a significant increase at 40 DAF, reaching the same expression as compared with 10 DAF, is shown for StABCG11 (Fig. 7).

    Figure 7.  Expression analysis of cuticle biosynthesis-related transcripts StCYP77A, StGDSL1, and StABCG11 during sweet pitaya (Stenocereus thurberi) fruit development. Relative expression was calculated through the 2−ΔΔCᴛ method using elongation factor 1-alpha (StEF1a), alpha-tubulin (StTUA), polyubiquitin 3 (StUBQ3), or StEF1a + StTUA as normalizing genes at 10, 20, 30, 35, and 40 d after flowering (DAF). The Y-axis and error bars represent the mean of the relative expression ± standard error (n = 4−6) for each developmental stage in DAF. The Ct data for the analysis was recorded by qRT-PCR in a QIAquant 96 5 plex (QIAGEN) according to the manufacturer's protocol. The graph line was drawn by ggplot2 in R Studio. Abbreviations: cytochrome p450 family 77 subfamily A (StCYP77A), Gly-Asp-Ser-Leu motif lipase/esterase 1 (StGDSL1), and ATP binding cassette transporter subfamily G member 11 (StABCG11).

    Characteristics of a well-assembled transcriptome include an N50 value closer to 2,000 bp, a high percentage of conserved transcripts completely assembled (> 80%), and a high proportion of reads mapping back to the assembled transcripts[52]. In the present study, the first collection of 174,449 transcripts from S. thurberi fruit peel are reported. The generated transcriptome showed an N50 value of 2,110 bp, a TransRate score of 0.05, and a GC percentage of 41.33 (Table 1), similar to that reported for other de novo plant transcriptome assemblies[53]. According to BUSCO, 85.4% of the orthologous genes from the Embryophyta databases completely matched the S. thurberi transcriptome, and only 3.9% were missing (Table 1). These results show that the S. thurberi transcriptome generated is not fragmented, and it is helpful in predicting the sequence of almost all the transcripts expressed in sweet pitaya fruit peel[24].

    The percentage of transcripts homologous found, E values, and identity distribution (Supplementary Tables S1 & S2; Supplementary Fig. S2) were similar to that reported in the de novo transcriptome assembly for non-model plants and other cactus fruits[4143,54] and further suggests that the transcriptome assembled of S. thurberi peel is robust[52]. Of the total of transcripts, 70,802 were common to all the five commercial fruit protein databases included in this study, which is helpful for the search for conserved orthologous involved in fruit development and ripening (Fig. 2a). A total of 34,513 transcripts (20%) show homology only to sequences in the cactus's databases, but not in the others (Supplementary Tables S1 & S2; Fig. 1c). This could suggest that a significant conservation of sequences among plants of the Cactaceae family exists which most likely are to have a function in this species adaptation to desert ecosystems.

    To infer the biological functionality represented by the S. thurberi fruit peel transcriptome, gene ontology (GO) terms and KEGG pathways were assigned. Of the main metabolic pathways assigned, 'glycerolipid metabolism' and 'cutin, suberine, and wax biosynthesis' suggests an active cuticle biosynthesis in pitaya fruit peel (Fig. 4). In agreement with the above, the main GO terms assigned for the molecular function (MF) category were 'organic cyclic compound binding', 'transmembrane transporter activity', and 'lipid binding' (Fig. 3). For the biological processes (BP) category, the critical GO terms for the present research are 'cellular response to stimulus', 'response to stress', 'anatomical structure development', and 'transmembrane transport', which could suggest the active development of the fruit epidermis and cuticle biosynthesis for protection to stress.

    The most frequent transcription factors (TF) families found in S. thurberi transcriptome were NAC, WRKY, bHLH, ERF, and MYB-related (Fig. 2), which had been reported to play a function in the tolerance to abiotic stress in plants[55,56]. Although the role of NAC, WRKY, bHLH, ERF, and MYB TF in improving drought tolerance in relevant crop plants has been widely documented[57,58], their contribution to the adaptation of cactus to arid ecosystems has not yet been elucidated and further experimental pieces of evidence are needed.

    It has been reported that the heterologous expression of ERF TF from Medicago truncatula induces drought tolerance and cuticle wax biosynthesis in Arabidopsis leaf[59]. In tomato fruits, the gene SlMIXTA-like which encodes a MYB transcription factor avoids water loss through the positive regulation of genes related to the biosynthesis and transport of cuticle compounds[22]. Despite the relevant role of cuticles in maintaining cactus physiology in desert environments, experimental evidence showing the role of the different TF-inducing cuticle biosynthesis has yet to be reported for cactus fruits.

    Out of the transcripts, 43,391 were classified as lncRNA (Supplementary Tables S15 & S16). This is the first report of lncRNA identification for the species S. thurberi. In fruits, 3,679 lncRNA has been identified from tomato[26], 3,330 from peach (P. persica)[29], 3,857 from melon (Cucumis melo)[28], 2,505 from hot pepper (Capsicum annuum)[27], and 3,194 from pomegranate (Punica granatum)[36]. Despite the stringent criteria to classify the lncRNA of sweet pitaya fruit (S. thurberi), a higher number of lncRNAs are shown when compared with previous reports. This finding is most likely due to the higher level of redundancy found during the transcriptome analysis. To reduce this redundancy, further efforts to achieve the complete genome assembly of S. thurberi are needed.

    Previous studies showed that lncRNA is shorter and has lower expression levels than coding RNA[6062]. In agreement with those findings, both the length and expression values of lncRNA from S. thurberi were lower than coding RNA (Fig. 5). It has been suggested that lncRNA could be involved in the biosynthesis of cuticle components in cabbage[61] and pomegranate[36] and that they could be involved in the tolerance to water deficit through the regulation of cuticle biosynthesis in wild banana[60]. Nevertheless, the molecular mechanism by which lncRNA may regulate the cuticle biosynthesis in S. thurberi fruits has not yet been elucidated.

    A relatively constant level of expression characterizes housekeeping genes because they are involved in essential cellular functions. These genes are not induced under specific conditions such as biotic or abiotic stress. Because of this, they are very useful as internal reference genes for qRT-PCR data normalization[63]. Nevertheless, their expression could change depending on plant species, developmental stages, and experimental conditions[64]. Reliable reference genes for a specific experiment in a given species must be identified to carry out an accurate qRT-PCR data normalization[63]. An initial screening of the transcript expression pattern through RNA-seq improves the identification of stably expressed transcripts by qRT-PCR[44,64].

    Identification of stable expressed reference transcripts during fruit development has been carried out in blueberry (Vaccinium bracteatum)[65], kiwifruit (Actinidia chinensis)[66], peach (P. persica)[67], apple (Malus domestica)[68], and soursop (Annona muricata)[69]. These studies include the expression stability analysis through geNorm, NormFinder, and BestKeeper algorithms[68,69], some of which are supported in RNA-seq data[65,66]. Improvement of expression stability analysis by RNA-seq had led to the identification of non-previously reported reference genes with a more stable expression during fruit development than commonly known housekeeping genes in grapevine (V. vinifera)[44], pear (Pyrus pyrifolia and P. calleryana)[64], and pepper (C. annuum)[70].

    For fruits of the Cactaceae family, only a few studies identifying reliable reference genes have been reported[4143]. Mainly because gene expression analysis has not been carried out previously for sweet pitaya (S. thurberi), the RNA-seq data generated in this work along with geNorm, NormFinder, BestKeeper, and RefFinder algorithms were used to identify reliable reference genes. The comprehensive ranking analysis showed that out of the eight candidate genes tested, StEF1a followed by StTUA and StUBQ3 were the most stable (Fig. 6b). All the pairwise variation values (Vn/Vn + 1) were lower than 0.15 (Fig. 6c), which indicates that StEF1a, StTUA, and StUBQ3 alone or the use of StEF1a and StTUA together are reliable enough to normalize the gene expression data generated by qRT-PCR.

    The genes StEF1a, StTUA, and StUBQ3 are homologous to transcripts found in the cactus species known as dragonfruit (Hylocereus monacanthus and H. undatus)[41], which have been tested as tentative reference genes during fruit development. EF1a has been proposed as a reliable reference gene in the analysis of changes in gene expression of dragon fruit (H. monacanthus and H. undatus)[41], peach (P. persica)[67], apple (M. domestica)[68], and soursop (A. muricata)[69]. According to the expression stability analysis carried out in the present study (Fig. 6) four normalization strategies were designed. The same gene expression pattern was recorded for the three target transcripts evaluated when normalization was carried out by the genes StEF1a, StTUA, StUBQ3, or StEF1a + StTUA (Fig. 7). Further, these data indicates that these reference genes are reliable enough to be used in qRT-PCR experiments during fruit development of S. thurberi.

    The plant cuticle is formed by two main layers: the cutin, composed mainly of mid-chain oxygenated LC fatty acids, and the cuticular wax, composed mainly of very long-chain (VLC) fatty acids, and their derivates VLC alkanes, VLC primary alcohols, VLC ketones, VLC aldehydes, and VLC esters[3]. In Arabidopsis CYP77A4 and CYP77A6 catalyze the synthesis of midchain epoxy and hydroxy ω-OH long-chain fatty acids, respectively[10,11], which are the main components of fleshy fruit cuticle[3].

    The functional domain search carried out in the present study showed that StCYP77A comprises a cytochrome P450 E-class domain (IPR002401) and a membrane-spanning region from residues 10 to 32 (Supplementary Fig. S4). This membrane-spanning region has been previously characterized in CYP77A enzymes from A. thaliana and Brassica napus[11,71]. It suggests that the protein coded by StCYP77A could catalyze the oxidation of fatty acids embedded in the endoplasmic reticulum membrane of the epidermal cells of S. thurberi fruit. Phylogenetic analysis showed that StCYP77A was closer to proteins from its phylogenetic-related species B. vulgaris (BvCYP772; XP_010694692) and C. gigantea (Cgig2_012892) (Supplementary Fig. S4). StCYP77A, BvCYP77A2, and Cgig2_012892 were closer to SlCYP77A2 and SmCYP77A2 than to CYP77A4 and CYP77A6 proteins, suggesting that StCYP77A (TRINITY_DN17030_c0_g1_i2) could correspond to a CYP77A2 protein.

    Five CYP77A are present in the Arabidopsis genome, named CYP77A4, CYP77A5, CYP77A6, CYP77A7, and CYP77A9, but their role in cuticle biosynthesis has only been reported for CYP77A4 and CYP77A6[72]. It has been suggested that CYP77A2 from eggplant (S. torvum) could contribute to the defense against fungal phytopathogen infection by the synthesis of specific compounds[13]. In pepper fruit (C. annuum), the expression pattern of CYP77A2 (A0A1U8GYB0) and ABCG11 (LOC107862760) suggests a role of CYP77A2 and ABCG11 in cutin biosynthesis at the early stages of pepper fruit development[14].

    In the case of the protein encoded by StGDSL1 (354 aa), the length found in this work is similar to the length of its homologous from Arabidopsis (AT3G16370) and tomato (Solyc03g121180) (Supplementary Fig. S5). A GDSL1 protein named CD1 polymerizes midchain oxygenated ω-OH long-chain fatty acids to form the cutin polyester in the extracellular space of tomato fruit peel[20,21]. It has been suggested that the 25-amino acid N-signal peptide found in StGDSL1 (Supplementary Fig. S5), previously reported in GDSL1 from Arabidopsis, B. napus, and tomato, plays a role during the protein exportation to the extracellular space[21,73].

    A higher expression of StCYP77A, StGDSL1, and StABCG11 is shown at the 10 and 20 DAF of sweet pitaya fruit development (Fig. 7), suggesting the active cuticle biosynthesis at the early stages of sweet pitaya fruit development. In agreement with that, two genes coding for GDSL lipase/hydrolases from tomato named SGN-U583101 and SGN-U579520 are highly expressed in the early stages and during the expansion stages of tomato fruit development, but their expression decreases in later stages[74]. It has been shown that the expression of GDSL genes, like CD1 from tomato, is higher in growing fruit[20,21]. Like tomato, the increase in expression of StCYP77A and StGDSL1 shown in pitaya fruit development could be due to an increase in cuticle deposition during the expansion of the fruit epidermis[20].

    The phylogenetic analysis, the functional domains, and the six transmembrane helices found in the StABCG11 predicted protein (Supplementary Fig. S6), suggests that it is an ABCG plasma membrane transporter of sweet pitaya[15]. Indeed, an increased expression of StABCG11 at 40 DAF was recorded in the present study (Fig. 7). Further, this data strongly suggests that it could be playing a relevant role in the transport of cuticle components at the beginning and during sweet pitaya fruit ripening.

    In Arabidopsis, ABCG11 (WBC11) exports cuticular wax and cutin compounds from the plasma membrane[15,75]. It has been reported that a high expression of the ABC plasma membrane transporter from mango MiWBC11 correlates with a higher cuticle deposition during fruit development[7]. The expression pattern for StABCG11, StCYP77A, and StGDSL1 suggests a role of StABCG11 as a cutin compound transporter in the earlier stages of sweet pitaya fruit development (Fig. 7). Further, its increase at 40 DAF suggests that it could be transporting cuticle compounds other than oxygenated long-chain fatty acids, or long-chain fatty acids that are not synthesized by StCYP77A and StGDSL1 in the later stages of fruit development.

    Like sweet pitaya, during sweet cherry fruit (Prunus avium) development, the expression of PaWCB11, homologous to AtABCG11 (AT1G17840), increases at the earlier stages of fruit development decreases at the intermediate stages, and increases again at the later stages[76]. PaWCB11 expression correlated with cuticle membrane deposition at the earlier and intermediate stages of sweet cherry fruit development but not at the later[76]. The increased expression of StABCG11 found in the present study could be due to the increased transport of cuticular wax compounds, such as VLC fatty acids and their derivates, in the later stages of sweet pitaya development[15,75].

    Cuticular waxes make up the smallest amount of the fruit cuticle. Even so, they mainly contribute to the impermeability of the fruit's epidermis[3]. An increase in the transport of cuticular waxes at the beginning of the ripening stage carried out by ABCG transporters could be due to a greater need to avoid water loss and to maintain an adequate amount of water during the ripening of the sweet pitaya fruit. Nevertheless, further expression analysis of cuticular wax biosynthesis-related genes, complemented with chemical composition analysis of cuticles could contribute to elucidating the molecular mechanism of cuticle biosynthesis in cacti and their physiological contribution during fruit development.

    In this study, the transcriptome of the sweet pitaya (S. thurberi) fruit peel was assembled for the first time. The reference genes found here are a helpful tool for further gene expression analysis in sweet pitaya fruit. Transcripts tentatively involved in cuticle compound biosynthesis and transport are reported for the first time in sweet pitaya. The results suggest a relevant role of cuticle compound biosynthesis and transport at the early and later stages of fruit development. The information generated will help to improve the elucidation of the molecular mechanism of cuticle biosynthesis in S. thurberi and other cactus species in the future. Understanding the cuticle's physiological function in the adaptation of the Cactaceae family to harsh environmental conditions could help design strategies to increase the resistance of other species to face the increase in water scarcity for agricultural production predicted for the following years.

    The authors confirm contribution to the paper as follows: study conception and design: Tiznado-Hernández ME, Tafolla-Arellano JC, García-Coronado H, Hernández-Oñate MÁ; data collection: Tiznado-Hernández ME, Tafolla-Arellano JC, García-Coronado H, Hernández-Oñate MÁ; analysis and interpretation of results: Tiznado-Hernández ME, García-Coronado H, Hernández-Oñate MÁ, Burgara-Estrella AJ; draft manuscript preparation: Tiznado-Hernández ME, García-Coronado H. All authors reviewed the results and approved the final version of the manuscript.

    All data generated or analyzed during this study are included in this published article and its supplementary information files. The sequence data can be accessed at the Sequence Read Archive (SRA) repository of the NCBI through the BioProject ID PRJNA1030439.

    The authors wish to acknowledge the financial support of Consejo Nacional de Humanidades, Ciencias y Tecnologías de México (CONAHCYT) through project number 579: Elucidación del Mecanismo Molecular de Biosíntesis de Cutícula Utilizando como Modelo Frutas Tropicales. We appreciate the University of Arizona Genetics Core and Illumina for providing reagents and equipment for library sequencing. The author, Heriberto García-Coronado (CVU 490952), thanks the CONAHCYT (acronym in Spanish) for the Ph.D. scholarship assigned (749341). The author, Heriberto García-Coronado, thanks Dr. Edmundo Domínguez-Rosas for the technical support in bioinformatics for identifying long non-coding RNA.

  • The authors declare that they have no conflict of interest.

  • Supplemental Table S1 Phenotypic variation among ten accessions from S, NW and NE climate regions in Populus tomentosa and Populus simonii.
    Supplemental Table S2 RNA-sequencing data statistics.
    Supplemental Table S3 Phenotypic variation among ten accessions from S, NW and NE climate regions in Populus tomentosa and Populus simonii.
    Supplemental Table S4 Differentially expressed genes and lncRNAs in photosynthetic modules of Populus tomentosa and Populus simonii.
    Supplemental Table S5 Methylome data bisulfite conversion rate and sequencing statistics.
    Supplemental Table S6 Data description of methylation level for ten Populus tomentosa and Populus simonii accessions.
    Supplemental Table S7 The number of differentially methylated regions in genomic features in Populus tomentosa and Populus simonii.
    Supplemental Table S8 The number of differentially methylated regions in genomic features in Populus tomentosa and Populus simonii.
    Supplemental Table S9 Correlation analysis between differentially methylated region and local SNP in Populus tomentosa and Populus simonii .
    Supplemental Table S10 Interacting DMR and SNP in 2kb.
    Supplemental Fig. S1 Box plot of regulatory roles of lncRNAs in Populus tomentosa and Populus simonii.
    Supplemental Fig. S2 Gene Ontology enrichment analyses of Turquoise Module in Populus simonii.
    Supplemental Fig. S3 Geographical variation characterized by global DNA methylation patterns.
    Supplemental Fig. S4 The distribution of differentially methylated genes in Populus tomentosa and Populus simonii.
    Supplemental Fig. S5 Differentially methylated regions (DMRs) correlated with genetic variation.
    Data S1 The original information of lncRNAs of Populus tomentosa.
    Data S2 The original information of lncRNAs of Populus simonii.
    Data S3 The expression data of differentially expressed lncRNA of Populus tomentosa.
    Data S4 The expression data of differentially expressed gene of Populus tomentosa.
    Data S5 The expression data of differentially expressed lncRNA of Populus simonii.
    Data S6 The expression data of differentially expressed gene ofPopulus simoniif.
    Data S7 Conserved and Species-specific  lncRNA in P. tomentosa and P. simonii. .
    Data S8 GO term enrichment analyses of genes in MEbrown module in Populus tomentosa.
    Data S9 GO term enrichment analyses of genes in MEturquoise module in Populus simonii.
    Data S10 Polymorphisms around Differientially methylated regions of photosynthetic hubgenes used for linkage disequilibrium analysis of Populus tomentosa.
    Data S11 Polymorphisms around Differientially methylated regions of photosynthetic hubgenes used for linkage disequilibrium analysis of Populus simonii.
    Method S1 The process of RNA-sequencing.
    Method S2 The process to calculate DMRs.
  • [1]

    Adhikari K, Mendoza-Revilla J, Sohail A, Fuentes-Guajardo M, Lampert J, et al. 2019. A GWAS in Latin Americans highlights the convergent evolution of lighter skin pigmentation in Eurasia. Nature Communications 10:358

    doi: 10.1038/s41467-018-08147-0

    CrossRef   Google Scholar

    [2]

    Fontes CG, Fine PVA, Wittmann F, Bittencourt PRL, Piedade MTF, et al. 2020. Convergent evolution of tree hydraulic traits in Amazonian habitats: implications for community assemblage and vulnerability to drought. New Phytologist 228:106−20

    doi: 10.1111/nph.16675

    CrossRef   Google Scholar

    [3]

    Li B, Förster C, Robert CAM, Züst T, Hu L, et al. 2018. Convergent evolution of a metabolic switch between aphid and caterpillar resistance in cereals. Science Advances 4:eaat6797

    doi: 10.1126/sciadv.aat6797

    CrossRef   Google Scholar

    [4]

    Xu Y, Lei Y, Su Z, Zhao M, Zhang J, et al. 2021. A chromosome-scale Gastrodia elata genome and large-scale comparative genomic analysis indicate convergent evolution by gene loss in mycoheterotrophic and parasitic plants. The Plant Journal 108:1609−23

    doi: 10.1111/tpj.15528

    CrossRef   Google Scholar

    [5]

    Chen WK, Chen L, Zhang X, Yang N, Guo JH, et al. 2022. Convergent selection of a WD40 protein that enhances grain yield in maize and rice. Science 375:eabg7985

    doi: 10.1126/science.abg7985

    CrossRef   Google Scholar

    [6]

    Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114−20

    doi: 10.1093/bioinformatics/btu170

    CrossRef   Google Scholar

    [7]

    Kim ED, Sung S. 2012. Long noncoding RNA: unveiling hidden layer of gene regulatory networks. Trends in Plant Science 17:16−21

    doi: 10.1016/j.tplants.2011.10.008

    CrossRef   Google Scholar

    [8]

    Ariel F, Romero-Barrios N, Jégu T, Benhamed M, Crespi M. 2015. Battles and hijacks: noncoding transcription in plants. Trends in Plant Science 20:362−71

    doi: 10.1016/j.tplants.2015.03.003

    CrossRef   Google Scholar

    [9]

    Wierzbicki AT, Blevins T, Swiezewski S. 2021. Long Noncoding RNAs in Plants. Annual Review of Plant Biology 72:245−71

    doi: 10.1146/annurev-arplant-093020-035446

    CrossRef   Google Scholar

    [10]

    Wang Y, Fan X, Lin F, He G, Terzaghi W, et al. 2014. Arabidopsis noncoding RNA mediates control of photomorphogenesis by red light. PNAS 111:10359−64

    doi: 10.1073/pnas.1409457111

    CrossRef   Google Scholar

    [11]

    Yu J, Qiu K, Sun W, Yang T, Wu T, et al. 2022. A long noncoding RNA functions in high-light-induced anthocyanin accumulation in apple by activating ethylene synthesis. Plant Physiology 189:66−83

    doi: 10.1093/plphys/kiac049

    CrossRef   Google Scholar

    [12]

    Kutter C, Watt S, Stefflova K, Wilson MD, Goncalves A, et al. 2012. Rapid Turnover of Long Noncoding RNAs and the Evolution of Gene Expression. Plos Genetics 8:e1002841

    doi: 10.1371/journal.pgen.1002841

    CrossRef   Google Scholar

    [13]

    Liu J, Jung C, Xu J, Wang H, Deng S, et al. 2012. Genome-Wide Analysis Uncovers Regulation of Long Intergenic Noncoding RNAs in Arabidopsis. The Plant Cell 24:4333−45

    doi: 10.1105/tpc.112.102855

    CrossRef   Google Scholar

    [14]

    Necsulea A, Soumillon M, Warnefors M, Liechti A, Daish T, et al. 2014. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505:635−40

    doi: 10.1038/nature12943

    CrossRef   Google Scholar

    [15]

    Kulkarni S, Lied A, Kulkarni V, Rucevic M, Martin MP, et al. 2019. CCR5AS lncRNA variation differentially regulates CCR5, influencing HIV disease outcome. Nature Immunology 20:824−34

    doi: 10.1038/s41590-019-0406-1

    CrossRef   Google Scholar

    [16]

    Xiao S, Cao S, Huang Q, Xia L, Deng M, et al. 2019. The RNA N6-methyladenosine modification landscape of human fetal tissues. Nature Cell Biology 21:651−61

    doi: 10.1038/s41556-019-0315-4

    CrossRef   Google Scholar

    [17]

    Fang J, Lutz JA, Wang L, Shugart HH, Yan X. 2020. Using climate-driven leaf phenology and growth to improve predictions of gross primary productivity in North American forests. Global Change Biology 26:6974−88

    doi: 10.1111/gcb.15349

    CrossRef   Google Scholar

    [18]

    Zemach A, McDaniel IE, Silva P, Zilberman D. 2010. Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science 328:916−19

    doi: 10.1126/science.1186366

    CrossRef   Google Scholar

    [19]

    Schmid MW, Heichinger C, Coman Schmid D, Guthörl D, Gagliardini V, et al. 2018. Contribution of epigenetic variation to adaptation in Arabidopsis. Nature Communications 9:4446

    doi: 10.1038/s41467-018-06932-5

    CrossRef   Google Scholar

    [20]

    Wei X, Song X, Wei L, Tang S, Sun J, et al. 2017. An epiallele of rice AK1 affects photosynthetic capacity. Journal of Integrative Plant Biology 59:158−63

    doi: 10.1111/jipb.12518

    CrossRef   Google Scholar

    [21]

    Vidalis A, Živković D, Wardenaar R, Roquis D, Tellier A, et al. 2016. Methylome evolution in plants. Genome Biology 17:264

    doi: 10.1186/s13059-016-1127-5

    CrossRef   Google Scholar

    [22]

    Matzke MA, Mosher RA. 2014. RNA-directed DNA methylation: an epigenetic pathway of increasing complexity. Nature Reviews Genetics 15:394−408

    doi: 10.1038/nrg3683

    CrossRef   Google Scholar

    [23]

    Xu W, Yang T, Wang B, Han B, Zhou H, et al. 2018. Differential expression networks and inheritance patterns of long non-coding RNAs in castor bean seeds. The Plant Journal 95:324−40

    doi: 10.1111/tpj.13953

    CrossRef   Google Scholar

    [24]

    Du Q, Yang X, Xie J, Quan M, Xiao L, et al. 2019. Time-specific and pleiotropic quantitative trait loci coordinately modulate stem growth in Populus. Plant Biotechnology Journal 17:608−24

    doi: 10.1111/pbi.13002

    CrossRef   Google Scholar

    [25]

    Ci D, Song Y, Du Q, Tian M, Han S, et al. 2016. Variation in genomic methylation in natural populations of Populus simonii is associated with leaf shape and photosynthetic traits. Journal of Experimental Botany 67:723−37

    doi: 10.1093/jxb/erv485

    CrossRef   Google Scholar

    [26]

    McKown AD, Guy RD, Klápště J, Geraldes A, Friedmann M, et al. 2014. Geographical and environmental gradients shape phenotypic trait variation and genetic structure in Populus trichocarpa. New Phytologist 201:1263−76

    doi: 10.1111/nph.12601

    CrossRef   Google Scholar

    [27]

    Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37:907−15

    doi: 10.1038/s41587-019-0201-4

    CrossRef   Google Scholar

    [28]

    Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, et al. 2011. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Development 25:1915−27

    doi: 10.1101/gad.17446611

    CrossRef   Google Scholar

    [29]

    Robinson MD, McCarthy DJ, Smyth GK. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139−40

    doi: 10.1093/bioinformatics/btp616

    CrossRef   Google Scholar

    [30]

    Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, et al. 2019. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology 20:278

    doi: 10.1186/s13059-019-1910-1

    CrossRef   Google Scholar

    [31]

    Kang Y, Yang D, Kong L, Hou M, Meng Y, et al. 2017. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Research 45:W12−W16

    doi: 10.1093/nar/gkx428

    CrossRef   Google Scholar

    [32]

    Sun L, Zhang Z, Bailey TL, Perkins AC, Tallack MR, et al. 2012. Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study. BMC Bioinformatics 13:331

    doi: 10.1186/1471-2105-13-331

    CrossRef   Google Scholar

    [33]

    Li A, Zhang J, Zhou Z. 2014. PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinformatics 15:311

    doi: 10.1186/1471-2105-15-311

    CrossRef   Google Scholar

    [34]

    Rice P, Longden I, Bleasby A. 2000. EMBOSS: the European molecular biology open software suite. Trends Genet 16:276−77

    doi: 10.1016/S0168-9525(00)02024-2

    CrossRef   Google Scholar

    [35]

    Jia H, Osak M, Bogu GK, Stanton LW, Johnson R, et al. 2010. Genome-wide computational identification and manual annotation of human long noncoding RNA genes. RNA 16:1478−87

    doi: 10.1261/rna.1951310

    CrossRef   Google Scholar

    [36]

    Tian J, Song Y, Du Q, Yang X, Ci D, et al. 2016. Population genomic analysis of gibberellin-responsive long non-coding RNAs in Populus. Journal of Experimental Botany 67:2467−82

    doi: 10.1093/jxb/erw057

    CrossRef   Google Scholar

    [37]

    Tafer H, Amman F, Eggenhofer F, Stadler PF, Hofacker IL. 2011. Fast accessibility-based prediction of RNA-RNA interactions. Bioinformatics 27:1934−40

    doi: 10.1093/bioinformatics/btr281

    CrossRef   Google Scholar

    [38]

    Langfelder P, Horvath S. 2007. Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 1:54

    doi: 10.1186/1752-0509-1-54

    CrossRef   Google Scholar

    [39]

    Ye X, Wang S, Zhao X, Gao N, Wang Y, et al. 2022. Role of lncRNAs in cis- and trans-regulatory responses to salt in Populus trichocarpa. Plant Journal 110:978−93

    doi: 10.1111/tpj.15714

    CrossRef   Google Scholar

    [40]

    Csűös M. 2010. Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics 26:1910−12

    doi: 10.1093/bioinformatics/btq315

    CrossRef   Google Scholar

    [41]

    Krueger F, Andrews SR. 2011. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27:1571−72

    doi: 10.1093/bioinformatics/btr167

    CrossRef   Google Scholar

    [42]

    Thorvaldsdóttir H, Robinson JT, Mesirov JP. 2013. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics 14:178−92

    doi: 10.1093/bib/bbs017

    CrossRef   Google Scholar

    [43]

    Akalin A, Kormaksson M, Li S, Garrett-Bakelman FE, Figueroa ME, et al. 2012. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biology 13:87

    doi: 10.1186/gb-2012-13-10-r87

    CrossRef   Google Scholar

    [44]

    Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods 9:357−59

    doi: 10.1038/nmeth.1923

    CrossRef   Google Scholar

    [45]

    Tang S, Dong Y, Liang D, Zhang Z, Ye C, et al. 2015. Analysis of the drought stress-responsive transcriptome of black cottonwood (Populus trichocarpa) using deep RNA sequencing. Plant Molecular Biology Reporter 33:424−38

    doi: 10.1007/s11105-014-0759-4

    CrossRef   Google Scholar

    [46]

    Scoffoni C, Chatelet DS, Pasquet-kok J, Rawls M, Donoghue MJ, et al. 2016. Hydraulic basis for the evolution of photosynthetic productivity. Nature Plants 2−16072

    doi: 10.1038/nplants.2016.72

    CrossRef   Google Scholar

    [47]

    Lusk CH, Wright I, Reich PB. 2003. Photosynthetic differences contribute to competitive advantage of evergreen angiosperm trees over evergreen conifers in productive habitats. New Phytologist 160:329−36

    doi: 10.1046/j.1469-8137.2003.00879.x

    CrossRef   Google Scholar

    [48]

    Wang HV, Chekanova JA. 2017. Long Noncoding RNAs in Plants. In Long Non Coding RNA Biology. Advances in Experimental Medicine and Biology, ed. Rao M. vol 1008. Singapore: Springer. pp. 133−54. https://doi.org/10.1007/978-981-10-5203-3_5

    [49]

    Wang H, Niu QW, Wu HW, Liu J, Ye J, et al. 2015. Analysis of non-coding transcriptome in rice and maize uncovers roles of conserved lncRNAs associated with agriculture traits. The Plant Journal 84:404−16

    doi: 10.1111/tpj.13018

    CrossRef   Google Scholar

    [50]

    Golicz AA, Singh MB, Bhalla PL. 2018. The Long Intergenic Noncoding RNA (LincRNA) Landscape of the Soybean Genome. Plant Physiology 176:2133−47

    doi: 10.1104/pp.17.01657

    CrossRef   Google Scholar

    [51]

    Ganguly DR, Crisp PA, Eichten SR, Pogson BJ. 2017. The Arabidopsis DNA Methylome Is Stable under Transgenerational Drought Stress. Plant Physiology 175:1893−912

    doi: 10.1104/pp.17.00744

    CrossRef   Google Scholar

    [52]

    Shen Y, Zhang J, Liu Y, Liu S, Liu Z, et al. 2018. DNA methylation footprints during soybean domestication and improvement. Genome Biology 19:128

    doi: 10.1186/s13059-018-1516-z

    CrossRef   Google Scholar

    [53]

    Shi Y, Zhang X, Chang X, Yan M, Zhao H, et al. 2021. Integrated analysis of DNA methylome and transcriptome reveals epigenetic regulation of CAM photosynthesis in pineapple. BMC Plant Biology 21:14

    doi: 10.1186/s12870-020-02814-5

    CrossRef   Google Scholar

    [54]

    Ramírez Gonzales L, Shi L, Bergonzi SB, Oortwijn M, Franco-Zorrilla JM, et al. 2021. Potato CYCLING DOF FACTOR 1 and its lncRNA counterpart StFLORE link tuber development and drought response. The Plant Journal 105:855−69

    doi: 10.1111/tpj.15093

    CrossRef   Google Scholar

    [55]

    Yuan J, Li J, Yang Y, Tan C, Zhu Y, et al. 2018. Stress-responsive regulation of long non-coding RNA polyadenylation in Oryza sativa. The Plant Journal 93:814−27

    doi: 10.1111/tpj.13804

    CrossRef   Google Scholar

    [56]

    Quan M, Liu X, Xiao L, Chen P, Song F, et al. 2021. Transcriptome analysis and association mapping reveal the genetic regulatory network response to cadmium stress in Populus tomentosa. Journal of Experimental Botany 72:576−91

    doi: 10.1093/jxb/eraa434

    CrossRef   Google Scholar

    [57]

    Hezroni H, Ben-Tov Perry R, Meir Z, Housman G, Lubelsky Y, et al. 2017. A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes. Genome Biology 18:15

    doi: 10.1186/s13059-017-1293-0

    CrossRef   Google Scholar

    [58]

    Deng J, Kong W, Mou X, Wang S, Zeng W. 2018. Identifying novel candidate biomarkers of RCC based on WGCNA analysis. Personalized Medicine 15:381−94

    doi: 10.2217/pme-2017-0091

    CrossRef   Google Scholar

    [59]

    Ishii H, Yoshimura KI, Mori A. 2009. Convergence of leaf display and photosynthetic characteristics of understory Abies amabilis and Tsuga heterophylla in an old-growth forest in southwestern Washington State, USA. Tree Physiology 29:989−98

    doi: 10.1093/treephys/tpp040

    CrossRef   Google Scholar

    [60]

    van Bezouw RFHM, Keurentjes JJB, Harbinson J, Aarts MGM. 2019. Converging phenomics and genomics to study natural variation in plant photosynthetic efficiency. The Plant Journal 97:112−33

    doi: 10.1111/tpj.14190

    CrossRef   Google Scholar

    [61]

    Yang T, Ma H, Zhang J, Wu T, Song T, et al. 2019. Systematic identification of long noncoding RNAs expressed during light-induced anthocyanin accumulation in apple fruit. The Plant Journal 100:572−90

    doi: 10.1111/tpj.14470

    CrossRef   Google Scholar

    [62]

    Washietl S, Kellis M, Garber M. 2014. Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals. Genome Research 24:616−28

    doi: 10.1101/gr.165035.113

    CrossRef   Google Scholar

    [63]

    Tao X, Li M, Zhao T, Feng S, Zhang H, et al. 2021. Neofunctionalization of a polyploidization-activated cotton long intergenic non-coding RNA DAN1 during drought stress regulation. Plant Physiology 186:2152−68

    doi: 10.1093/plphys/kiab179

    CrossRef   Google Scholar

    [64]

    Liu J, Last RL. 2017. A chloroplast thylakoid lumen protein is required for proper photosynthetic acclimation of plants under fluctuating light environments. PNAS 114:E8110−E8117

    doi: 10.1073/pnas.1712206114

    CrossRef   Google Scholar

    [65]

    Klimmek F, Sjödin A, Noutsos C, Leister D, Jansson S. 2006. Abundantly and rarely expressed Lhc protein genes exhibit distinct regulation patterns in plants. Plant Physiology 140:793−804

    doi: 10.1104/pp.105.073304

    CrossRef   Google Scholar

    [66]

    Peterson RB, Schultes NP. 2014. Light-harvesting complex B7 shifts the irradiance response of photosynthetic light-harvesting regulation in leaves of Arabidopsis thaliana. Journal of Plant Physiology 171:311−18

    doi: 10.1016/j.jplph.2013.09.007

    CrossRef   Google Scholar

    [67]

    Kawakatsu T, Huang SC, Jupe F, Sasaki E, Schmitz RJ, et al. 2016. Epigenomic diversity in a global collection of Arabidopsis thaliana Accessions. Cell 166:492−505

    doi: 10.1016/j.cell.2016.06.044

    CrossRef   Google Scholar

    [68]

    De Kort H, Panis B, Deforce D, Van Nieuwerburgh F, Honnay O. 2020. Ecological divergence of wild strawberry DNA methylation patterns at distinct spatial scales. Molecular Ecology 29:4871−81

    doi: 10.1111/mec.15689

    CrossRef   Google Scholar

    [69]

    Ariel F, Jegu T, Latrasse D, Romero-Barrios N, Christ A, et al. 2014. Noncoding transcription by alternative RNA polymerases dynamically regulates an auxin-driven chromatin loop. Molecular Cell 55:383−96

    doi: 10.1016/j.molcel.2014.06.011

    CrossRef   Google Scholar

    [70]

    Wang M, Yuan D, Tu L, Gao W, He Y, et al. 2015. Long noncoding RNAs and their proposed functions in fibre development of cotton (Gossypium spp.). New Phytologist 207:1181−97

    doi: 10.1111/nph.13429

    CrossRef   Google Scholar

    [71]

    Rakyan VK, Down TA, Balding DJ, Beck S. 2011. Epigenome-wide association studies for common human diseases. Nature Reviews Genetics 12:529−41

    doi: 10.1038/nrg3000

    CrossRef   Google Scholar

    [72]

    Lu W, Xiao L, Quan M, Wang Q, El-Kassaby YA, et al. 2020. Linkage-linkage disequilibrium dissection of the epigenetic quantitative trait loci (epiQTLs) underlying growth and wood properties in Populus. New Phytologist 225:1218−33

    doi: 10.1111/nph.16220

    CrossRef   Google Scholar

    [73]

    De Kort H, Toivainen T, Van Nieuwerburgh F, Andrés J, Hytönen TP, et al. 2022. Signatures of polygenic adaptation align with genome-wide methylation patterns in wild strawberry plants. New Phytologist 235:1501−14

    doi: 10.1111/nph.18225

    CrossRef   Google Scholar

    [74]

    Eichten SR, Briskine R, Song J, Li Q, Swanson-Wagner R, et al. 2013. Epigenetic and genetic influences on DNA methylation variation in maize populations. The Plant Cell 25:2783−97

    doi: 10.1105/tpc.113.114793

    CrossRef   Google Scholar

    [75]

    Xu G, Lyu J, Li Q, Liu H, Wang D, et al. 2020. Evolutionary and functional genomics of DNA methylation in maize domestication and improvement. Nature Communications 11:12

    doi: 10.1038/s41467-019-13875-y

    CrossRef   Google Scholar

    [76]

    Xiao L, Du Q, Fang Y, Quan M, Lu W, et al. 2021. Genetic architecture of the metabolic pathway of salicylic acid biosynthesis in Populus. Tree Physiology 41:2198−215

    doi: 10.1093/treephys/tpab068

    CrossRef   Google Scholar

    [77]

    Yang Z, Xu F, Wang H, Teschendorff AE, Xie F, et al. 2021. Pan-cancer characterization of long non-coding RNA and DNA methylation mediated transcriptional dysregulation. EBioMedicine 68:103399

    doi: 10.1016/j.ebiom.2021.103399

    CrossRef   Google Scholar

    [78]

    Yang Y, Chen L, Gu J, Zhang H, Yuan J, et al. 2017. Recurrently deregulated lncRNAs in hepatocellular carcinoma. Nature Communications 8:14421

    doi: 10.1038/ncomms14421

    CrossRef   Google Scholar

    [79]

    Marney CB, Anderson ES, Adnan M, Peng KL, Hu Y, et al. 2021. p53-intact cancers escape tumor suppression through loss of long noncoding RNA Dino. Cell Reports 35:109329

    doi: 10.1016/j.celrep.2021.109329

    CrossRef   Google Scholar

    [80]

    Di Ruscio A, Ebralidze AK, Benoukraf T, Amabile G, Goff LA, et al. 2013. DNMT1-interacting RNAs block gene-specific DNA methylation. Nature 503:371−76

    doi: 10.1038/nature12598

    CrossRef   Google Scholar

    [81]

    Zhou X, Jacobs TB, Xue LJ, Harding SA, Tsai CJ. 2015. Exploiting SNPs for biallelic CRISPR mutations in the outcrossing woody perennial Populus reveals 4-coumarate: CoA ligase specificity and redundancy. New Phytologist 208:298−301

    doi: 10.1111/nph.13470

    CrossRef   Google Scholar

    [82]

    Vojta A, Dobrinic P, Tadic V, Bockor L, Korac P, et al. 2016. Repurposing the CRISPR-Cas9 system for targeted DNA methylation. Nucleic Acids Research 44:5615−28

    doi: 10.1093/nar/gkw159

    CrossRef   Google Scholar

  • Cite this article

    Zhou J, Song F, He Y, Zhang W, Xiao L, et al. 2023. LncRNA evolution and DNA methylation variation participate in photosynthesis pathways of distinct lineages of Populus. Forestry Research 3:3 doi: 10.48130/FR-2023-0003
    Zhou J, Song F, He Y, Zhang W, Xiao L, et al. 2023. LncRNA evolution and DNA methylation variation participate in photosynthesis pathways of distinct lineages of Populus. Forestry Research 3:3 doi: 10.48130/FR-2023-0003

Figures(6)

Article Metrics

Article views(7276) PDF downloads(453)

ARTICLE   Open Access    

LncRNA evolution and DNA methylation variation participate in photosynthesis pathways of distinct lineages of Populus

Forestry Research  3 Article number: 3  (2023)  |  Cite this article

Abstract: During the independent process of evolution in plants, photosynthesis appears to have been under convergent evolution to adapt to specific selection pressure in their geographical regions. However, it is unclear how lncRNA regulation and DNA methylation are involved in the phenotypic convergence in distinct lineages. Here, we present a large-scale comparative study of lncRNA transcription profile and whole-genome bisulfite sequencing (WGBS) data in two unrelated Populus species, selected from three relatively overlapping geographical regions. The results indicated that 39.75% lncRNAs of Populus tomentosa were shown to have homologous sequences in the 46.99% lncRNA of Populus simonii. Evolutionary analysis revealed that lncRNAs showed a rapid gain rate in the Populus lineage. Furthermore, co-expression networks in two Populus species identified eight lncRNAs that have the potential to simultaneously cis- or trans-regulate eight photosynthetic-related genes. These photosynthetic lncRNAs and genes were predominantly expressed in accessions from the southern region, indicating a conserved spatial expression in photosynthetic pathways in Populus. We also detected that most lncRNA targeted photosynthetic genes hypomethylated in promoter regions of Southern accessions compared with Northern accessions. Geographical DMRs correlated with genetic SNP variations in photosynthetic genes among Populus from the three geographic regions, indicating that DNA methylation coordinated with lncRNAs in convergent evolution of photosynthesis in Populus. Our results shed light on the evolutionary forces acting on patterns of lncRNA and DNA methylation, and provide a better understanding of the genetic and epigenetic mechanism in photosynthetic convergence evolution.

    • Convergent evolution has been extensively reported in cases of humans[1], woody plants[2], cereal crops[3], and Orchidaceae plants[4]. Many morphological and physiological traits have been undergoing strong convergent selection, especially in plants. Most convergent traits are not linked to core metabolism[5], but photosynthesis is one of the exceptions[6]. Biochemical pathways involved in the capture of atmospheric carbon is more variable than sequestration of light energy. Thus, investigation of genetic and epigenetic regulatory mechanisms canalizing the convergent evolution in photosynthetic pathways, such as non-coding RNAs and DNA methylation is critical for plants' growth and development during long-term local environment adaptation.

      In recent years, long non-coding RNAs (lncRNAs) was regarded as one of the important regulators of gene expression of multiple biological processes[79]. Several studies have shown that lncRNAs involved in the photosynthesis process. For example, one Arabidopsis lncRNA HIDDEN TREASURE 1 (HID1) acts through PHYTOCHROMEINTERACTING FACTOR 3 (PIF3), which promotes photomorphogenesis in continuous red light[10]. In the anthocyanin-associated coloration pathway, lncRNA MdLNC610 upregulate the expression level of MdACO1 by increasing the ethylene production and anthocyanin levels under high-light treatment[11]. Moreover, lncRNAs evolve rapidly and are poorly conserved among distantly related species[12,13]. LncRNA evolutionary analysis provides great benefit to the understanding of the functions of lncRNAs and the evolution of regulatory networks. The function of ancient lncRNAs may regulate embryonic development and conserved lncRNAs in lncRNA families probably function in many fundamental processes[14]. However, how lncRNAs play roles of adaptative evolution in lineage plants, and to what extent can lncRNAs carry similar functions in photosynthetic pathways in plants, remains largely unknown.

      The function of lncRNAs in regulating gene expression can be affected by genetic and epigenetic variation[1517]. DNA methylation is one of the vital epigenetic modifications that is widespread in the genome of eukaryotes[18]. DNA methylation is heritable during the change of development or affected by environmental conditions. Their adaptive variation may directly evolve through adaptive responses to a changing environment or arise from adaptive genetic variation. Environment-induced epigenetic variation may be limited and restricted to certain regions of the genome during the inheritance[19]. The epiallele near the functional gene in the maintainenance of chloroplast structures participate in the regulation of many genes associated with photosynthesis processes[20]. Over long timescales, genetic variations affect DNA methylation pattern, and associated with segregating structural variants or with mutations in methyltransferase genes[21]. LncRNAs could guide DNA methylation and silence target genes, investigation of the DNA methylation status would enhance the understanding of the regulatory roles of lncRNAs[22]. The expression level of lncRNA was also tightly linked with DNA methylation during the plants development and adaptation to the environment[23]. Yet, more investigations are still needed to unearth the epigenetic variation underlying photosynthetic pathways to response to local adaptation and its lncRNA relevance in the related species.

      In this study, transcriptome analysis was performed to systematically identify lncRNAs and characterize their expression patterns in two unrelated Populus species, P. tomentosa and P. simonii. The convergent emergence or loss of photosynthetic phenotypes may facilitate adaptation to ecologically similar environments. The regulatory roles of the lncRNAs were investigated by co-expression between lncRNAs and their target genes enriched in photosynthetic pathways. Based on the evolutionary analysis of lncRNAs of nine diverse plant species including P. tomentosa, P. simonii, Populus trichocarpa, Salix purpurea, Arabidopsis thaliana, Glycine max, Oryza sativa, Zea mays, and Physcomitrella patens, we identified rapid evolvement even between closely related plants. Also, the potential DNA methylation and adaptive DNA sequences that can be subject to evolutionary divergence as methylation variation correlated with SNP variation. The transcription of lncRNAs target genes is fine-tuned through epigenetic modifications. As a result, photosynthetic genes are representative of adaptive evolution governed by the joint and complementary actions of lncRNAs and epigenetic processes.

    • Ten P. tomentosa accessions and ten P. simonii accessions were collected from their natural population clonal garden in Guan Xian County, Shandong Province, China (36°10′ N, 114°35′ E). The sampling accessions were selected from the Southern (S), Northwestern (NW), and Northeastern (NE) geographical regions according to their natural distribution[24,25]. In 2019, tree seedlings were planted with three replicates in the same location using the root segment technique.

      The measurement for photosynthetic traits were taken on a Li-COR 6400XT portable photosynthesis system (Lincoln, NE, USA). The leaf chamber conditions were: light intensity 1,000 μmol·m−2·s−1 PAR and flow 400 μmol·s−1. Only mature leaves of each plant were measured. All sampling was measured on clear, sunny days between 09:00 and 11:00 in June, 2019. The measurement was performed using three replications per individual. We measured net photosynthetic rate (Pn, µmol·m−2·s−1), conductance to H2O (Cond, mol·m−2·s−1), intercellular CO2 concentration (Ci, µmol·mol−1), and transpiration rate (Trmmol, µmol·m−2·s−1). Water use efficiency (WUE) was determined by photosynthetic rate over transpiration[26]. All measured leaves for each individual were collected, frozen in liquid nitrogen, and stored at −80 °C until use.

    • Total RNAs were isolated from leaves of both P. tomentosa and P. simonii samples using the Plant Qiagen RNeasy kit which were used for RNA-seq (Methods S1). The clean reads of P. tomentosa were mapped to the P. tomentosa reference genome, and the clean reads of P. simonii were mapped to P. trichocarpa reference genome v4.0 (www.phytozome.net) using Hisat2 version2.1.0[27] (Supplemental Table S2). FPKM (fragments per kilobase of transcript per million fragments) values were calculated by Cufflinks v2.1.1[28]. The edgeR software package[29] was employed to identify differentially expressed genes (DEGs) between pairs of samples from different geographical regions of P. tomentosa and P. simonii, respectively, with FDR ≤ 0.05 and fold change ≥ 1.

    • In this study, we integrated RNA-seq data sets of ten P. tomentosa and ten P. simonii, respectively. Each transcriptome was assembled separately by StringTie2[30] and merged by gffcompare, while the transcript with FPKM > 0.5, length > 200, coverage > 1 was filtered. Coding Potential Calculator2 (CPC2) software[31], Coding Noncoding Index (CNCI) software[32], and PLEK[33] were used to evaluate the coding potential of the remaining transcripts. All transcripts with CPC2 labeled as 'coding', or CNCI > 0, or PLEK scores > 0 were discarded. Finally, the class code 'u' refers to the long intergenic noncoding RNAs (lincRNAs), class code 'x' refers to long noncoding natural antisense transcripts (lncNAT), class code 'j' refers to the sense transcripts, and class code 'i' refers to the intronic transcripts. Differentially expressed lncRNAs were calculated in the same way as DEGs above. The GC contents of these lncRNAs were calculated with the GEECEE tool in EMBOSS[34].

      Homologous transcription of lncRNA between lncRNAs transcripts of P. tomentosa and P. simonii was performed using the BLASTN software. Alignments with E-value < 1e-5, coverage > 50%, identity > 80% were identified as Conserved lncRNA. Otherwise, lncRNAs were denoted as species-specific lncRNAs.

    • The potential target genes of lncRNAs were predicted via cis and trans analyses. PCGs around lncRNAs within 10 kb upstream or downstream in genome position were pointed as the potential cis-target genes[35,36]. The potential trans-targets in the Populus PCGs database was based on PCGs sequence complementarity and RNA duplex energy predictions. First, protein sequences of lncRNAs target PCGs in P. tomentosa and P. simonii were used as query sequences in BLASTN with E-value < 1e-5 and identity > 80% to identify homologs. Then RNAplex was used to screen lncRNA–PCGs (duplexes RNAplex -E-60) that exhibited complementary base pairing[37].

    • The WGCNA 1.70.3 package in R[38] was used to construct the unsigned co-expression network. One-step network construction and module detection method were adopted in both P. tomentosa and P. simonii with the following parameters: the minModuleSize was 100, and the cut height was 0.25. The soft power was 5 and 12 in P. tomentosa and P. simonii networks, respectively. To relate traits to the network, we calculated correlations between module eigengenes and the five photosynthetic traits.

    • FASTA sequences for the lncRNAs from nine plants were downloaded from CANTATAdb2.0 and NCBI. 2,990 lncRNAs of Populus trichocarpa from Ye et al.[39], 2003 lncRNAs from Physcomitrella patens NCBI annotation (www.ncbi.nlm.nih.gov/genome/?term=Physcomitrella+patens), 3,270 lncRNAs from Oryza sativa NCBI annotation (www.ncbi.nlm.nih.gov/genome/?term=Oryza+sativa), 5,355 lncRNAs from Zea mays NCBI annotation (www.ncbi.nlm.nih.gov/genome/?term=Zea+mays), the 4,070 predicted lncRNAs of Salix purpurea from CANTATdb 2.0 database, 3,365 lncRNAs from Glycine max NCBI annotation (www.ncbi.nlm.nih.gov/genome/?term=Glycine+max), and 3,480 lncRNAs from Arabidopsis thaliana NCBI annotation (www.ncbi.nlm.nih.gov/genome/?term=Arabidopsis+thaliana).

    • To gain insight into lncRNA evolution in plants, nine plants were used for comparisons. First, the phylogenetic tree was obtained via OrthoFinder. Single-copy lncRNAs were aligned using nucleotide sequence by MAFFT, and a species tree was built using IQTREE with the default parameters. r8s was performed to establish an ultrametric tree (chronogram) using species tree rooted with Physcomitrella patens. The birth, death, age, and ancestral contents of lncRNA families were assessed via COUNT software[40] using Dollo-Parsimony with default settings.

    • Total DNA of ten P. tomentosa accessions and ten P. simonii was extracted from the collected leaves. DNA extraction was performed using a DNase Plant Mini Kit (Qiagen China, Shanghai, China) for whole-genome bisulfite sequencing analysis. The libraries were sequenced on the Illumina HiSeq 4000, and the sequencing reads were filtered using Trimmomatic[6]. Paired-end reads of P. tomentosa and P. simonii genomes were aligned to P. tomentosa and P. trichocarpa V4.0 genome respectively using Bismark (version 0.16.1)[41] with the default parameters. Methylation cytosine sites with less than five methylated reads were removed. Integrative Genomics Viewer software[42] was used to visualize the DNA methylation Dataset. MethylKit were used to identified Differentially methylated regions (DMRs)[43] (Methods S2).

    • A total of ten accessions of P. tomentosa and P. simonii were sequenced on the Illumina GA II platform with an average depth of 15-fold genome coverage. The clean data were collected by removing low-quality reads (< 10% of nucleotides with quality < Q20). The paired-end data were aligned to the P. tomentosa and P. trichocarpa V4.0 reference genome using Bowtie 2 software with default parameters[44]. Samtools and Genome Analysis Toolkit (GATK) were used to perform single nucleotide polymorphisms (SNPs) calling. Low-quality SNPs that missing data ≥ 20% were filtered. 12,651,394 and 4,996,309 high-quality SNPs were retained for further analysis. To determine the relationship between DMRs and SNPs, we computed a one-sided permutation test between each pair of DMRs and SNPs within 2 kb upstream and downstream of each DMR. A DMR was determined to correlate with SNPs when there are at least three SNPs significant correlates with this DMR (one-sided permutation p-value < 0.01).

    • The shape of leaves were associated with photosynthetic abilities of plants and probably contribute to photosynthetic differences of different species[4547]. P. tomentosa and P. simonii accessions displayed considerable variation in size and shape of leaves (Fig. 1a). As there was considerable macroscopic variation in leaf characteristics, we next investigated whether differences observed in features affected photosynthetic performance. Interestingly, there were statistically significant differences in net photosynthetic rate (P = 2.18 × 10−4), conductance to H2O (P = 1.77 × 10−3), and intercellular CO2 concentration (P = 4.0 × 10−3) among two Populus. For all accessions, photosynthetic traits varied greatly especially in P. tomentosa, with coefficients of variation (CV) values ranging from 9.48% (Conductance to H2O) to 38.79% (Water use efficiency). Additionally, all five phenotypic traits showed significant differentiation among the geographical regions of both P. tomentosa and P. simonii (P < 0.05, post-test by LSD) (Supplemental Table S1). For example, the net photosynthetic rate of accessions from the Southern region in both Populus was significantly higher than those from the Northern regions (Fig. 1b). This showed that photosynthetic variation between accessions from geographical regions in P. tomentosa and P. simonii may be partly due to the underlying selective pressure in their environments[48]. Therefore, differences in transcripts expression and regulatory networks are critical to determining interspecific and intraspecific phenotypic variation.

      Figure 1. 

      Morphological and photosynthetic variation of Populus tomentosa and Populus simonii from different geographical regions. (a) Leaf size and shape of P. tomentosa and P. simonii from the Southern geographical region (S), the Northwestern geographical region, and the Northeastern geographical region (NE). Numbers represent accession number in its geographical region. Scale bar, 5 cm. (b) Photosynthetic traits of P. tomentosa (blue) and P. simonii (orange) from three geographical regions. Photosynthetic traits include net photosynthetic rate (Pn), conductance to H2O (Cond), intercellular CO2 (Ci), transpiration rate (Tr), and water use efficiency (WUE). Data represent means ± SE. *, p < 0.05; **, p < 0.01; ***, p < 0.001.

    • To obtain a comprehensive profile of lncRNAs in different Populus species, we assembled transcriptome using the strand-specific RNA-seq data from ten P. tomentosa accessions and ten P. simonii accessions. In total, we identified a total of 1,600 and 1,013 high-confidence lncRNAs in P. tomentosa and P. simonii, respectively (Data S1, S2). Four classes of lncRNAs were identified, and the majority of them were long intergenic noncoding RNAs (lincRNAs) and long noncod natural antisense transcripts (lncNATs) in both species (Fig. 2a). We then investigated the characters and expression profile of these lncRNAs between two species. The lncRNAs are unevenly distributed across the 19 chromosomes of both Populus species, and there was no difference in GC content between P. tomentosa (37.13%) and P. simonii (37.35%) for lncRNA (Fig. 2b). According to genomic locations, the lncRNAs of P. tomentosa range in length from 230 to 13,266 nucleotides(nt), with a median length of 2,009 nt that is significantly shorter than the median length (2,255 nt) of P. simonii (Fig. 2c). On average, the lncRNAs of P. simonii contain a significantly fewer exons than the P. tomentosa lncRNAs. As expected, most of the lncRNAs comprised fewer exons (> 50% consist of one exon) than PCGs (Fig. 2d).

      Figure 2. 

      Identification and characterization of lncRNAs in two Populus species. (a) Percentage distribution of different classifications of the total lncRNAs in Populus tomentosa and Populus simonii. (b) The GC content of lncRNAs and protein-coding genes (PCGs) in P. tomentosa and P. simonii. (c) Length density distributions of lncRNAs and PCGs. The x-axis indicates the log10-transformed sequence length and the y-axis indicates the density value. (d) Percentage distribution of exon numbers for PCGs and lncRNAs.

    • To evaluate lncRNA differences between Populus species and intraspecific variation, we analyzed the lncRNAs expression of each accession of both Populus species. The expression level of the lncRNAs from both species was lower than for the PCGs[28] (Fig. 3a). The overall expression levels of lncRNAs in P. simonii were lower than that of P. tomentosa. We next identified the differentially expressed lncRNAs and PCGs between geographical regions in both species (Data S3S6). Intriguingly, differentially expressed lncRNAs from P. tomentosa (41.25%−45.06%) and P. simonii (10.86%−17.47%) occupied a large proportion in their total lncRNAs, but PCGs had a lower expression variation ratio among geographical regions (P. tomentosa, 8.18%−12.20%; P. simonii, 3.88%−7.36%) (Fig. 3b).

      Figure 3. 

      Expression profiles and evolution of lncRNAs in Populus tomentosa and Populus simonii. (a) Box plot of expression levels of lncRNAs and protein-coding genes (PCGs) in P. tomentosa and P. simonii. Student t-test was used to calculate the p-value. *** p < 0.001. (b) The numbers of differentially expressed (DE) lncRNAs in P. tomentosa and P. simonii between accessions from the Southern geographical region and the Northwestern geographical region, the Southern geographical region and the Northeastern geographical region, the Northwestern geographical region and the Northeastern geographical region. (c) Phylogenetic tree and number of gene families displaying expansion (green) and contraction (red) among nine plant species. Branch lengths reflect evolutionary divergence times in million years (Myrs) inferred from timetrees. Numbers of gained (+) and lost (−) lncRNA families Myr–1 (in red) are indicated next to each branch. (d) The distribution of chromosomes (outer) and lncRNAs (inner) in P. tomentosa (blue) and P. simonii (red). The green lines in the inner rings show lnRNAs that were homoeologous in two Populus lineages. (e), (f) Scatter plots of Pearson correlation coefficient and p-value between the expressions of the lncRNAs and their target PCGs in P. tomentosa and P. simonii. The lncRNA-mRNA with correlation coefficient ≥ 0.6 (red) or ≤ −0.6 (blue), and p-value ≤ 0.05 are considered positive or negative pairs. For screen visualization, p-value were minus log10 transformed after a constant value (0.001) was added.

      LncRNAs are highly diverged at the nucleotide level among plant species but may have high sequence conservation at the intraspecies and interspecies levels. lncRNA orthologous pairs were identified through reciprocal best hits, and they were connected using the single-linkage clustering method to construct lncRNA families. The phylogenetic tree revealed that the evolution of the lncRNAs spaned around 277 Myrs (million years). We identified 1,033 lncRNA families with a total of 3,775 conserved lnRNAs. We then sought to investigate the birth and death rates and the ancestor lncRNA families during the plant evolution. Among these lncRNA families, the number of lncRNA families increased from 12 ancestral families to 35−557 families in all the plant species (Fig. 3c). Notably, terminal branches gained more families than internal branches, particularly in Salicaceae trees. The highest net gain rates in recent terminal branches in Salicaceae trees ranging from 1.65 to 5.02 families Myr−1, indicating a high rate of novel lncRNA families in forest trees. In addition, 482 (47.58%), 669 (22.39%), and 746 (18.33%) lncRNAs of P. tomentosa were found to be conservation in P. simonii, P. trichocarpa, and Salix purpurea, respectively. These results suggested that most lncRNAs were conserved between P. tomentosa and P. simonii despite rapid gene fractionation.

      To explore whether lncRNAs contribute to evolutionary pressures on plant photosynthesis, we compared conserved lncRNAs with species-specific lncRNAs in two Populus species. Using the reciprocal align features of BLASTN, there are 3,305 homoeologous lncRNA pairs between P. tomentosa and P. simonii (Fig. 3d; Data S7). We found that 39.75% lncRNA of P. tomentosa had homologous copies in the 46.99% lncRNAs of P. simonii. These results suggested that the vast majority of lncRNAs were species-specific or limited to closely related species.

      To identify genes potentially regulated by lncRNAs and the potential effects of lncRNAs, we identified 685 and 452 lncRNA-PCGs pairs in P. tomentosa and P. simonii, respectively. Expression analysis on the lncRNAs-PCGs pairs showed that 81.46% and 80.09% of them have a positive correlation (|rp| ≥ 0.6, P < 0.05) in two Populus species (Fig. 3e, f). On average, the rp between expression of P. tomentosa lncRNAs and their targets PCGs (0.48) was higher than that between adjacent PCGs pairs (0.38), which was similar to lncRNA-random PCG pairs (0.39). These correlations were much stronger than those of PCG-random PCG pairs (0.18) (Supplemental Fig. S1a). Moreover, the ratio of extreme expression correlation (|rp| > 0.8) for lncRNA-PCG pairs (25.84%) was higher than those of lncRNA-random PCG pairs (19.09%) and PCG-random PCG pairs (23.54%) (Supplemental Table S2). Similar results were observed in P. simonii (Supplemental Fig. S1b).

    • Although photosynthesis is one of the basic biochemical reactions of plants, they exhibit dramatic differences in multiple characteristics. Differences in gene expression and regulatory networks are critical for determining photosynthesis traits. We performed a weighted gene co-expression network analysis (WGCNA) on PCGs and lncRNAs. Accordingly, we obtained 16 and 13 distinctly expressed modules in P. tomentosa and P. simonii, respectively (Fig. 4a, b). The modules closely related to photosynthetic traits and GO term were of particular interest in this study. 'Photosynthesis, light reaction' (GO:0019684) and 'photosynthesis' (GO:0015979) GO terms were enriched in the subset of PCGs in MEbrown of P. tomentosa network and MEturquoise of P. simonii network (Fig. 4c; Supplemental Fig. S2; Data S8, S9). In P. tomentosa, the module 'Brown' comprised transcripts that were restrained in Northern regions (Fig. 4d). For seven genes enriched in photosynthetic terms, four hub genes including PtoPPL1, PtoLHCA1, PtoPnsb4, and PtoMPH2 were highlighted in the network due to their high eigengene connectivity (Supplemental Table S3). Among those hubgenes, we noted a cis-regulated and three trans-regulated lncRNAs, which were differentially expressed between distinct geographical regions (Fig 4e). In P. simonii, expression of lncRNAs and PCGs in module 'turquoise' was also highly expressed in the Southern geographical region. Three of the nine photosynthetic-enriched genes were highlighted as high eigengene connectivity, including PsiLHCB7, PsiPSBR, and PsiKUP1 (Supplemental Table S3). Gene PsiKUP1, a photosynthetic protein pathway gene that encodes a high-affinity potassium transporter, exhibited the highest expressed level in the S region as compared with NW and NE regions (Fig 4f). Meanwhile, we demonstrated that Psi_XLOC_022701 in the S region was more than two times higher expression than those accesions in the N region, indicating that Psi_XLOC_022701 may trans-regulate PsiKUP1.

      Figure 4. 

      Photosynthetic-associated modules in Populus tomentosa and Populus simonii. Statistical analysis of module–trait correlations in (a) P. tomentosa and (b) P. simonii. The rows and columns indicate the modules and traits, respectively. Cells are colored from blue to red according to the Pearson correlation coefficient in parentheses, and the star-marked cells indicate the highest significant association between the trait and its corresponding module. (c) Circos plot shows the enrichment and differentially expressed genes in each ontology of Module Brown in P. tomentosa. From the outer to the inner circle, is gene ontology (GO) id, number of genes and P -value, number of differentially expressed genes, and enrichment factors. (d) Box plot and beewarm plot of expression levels of lncRNAs and protein-coding genes (PCGs) in Module Brown from P. tomentosa and Module turquoise from P. simonii. Student t-test was used to calculate the p-value. *** p < 0.001, ** p < 0.01. (e) Scatter plots show correlation between the expression of PtoMPH2 and net photosynthetic rate, expression of Pto_XLOC_013503 and net photosynthetic rate, and between expression of PtoMPH2 and Pto_XLOC_013503. Pto_XLOC_013503 positively regulate a chloroplast thylakoid lumen protein, PtoMPH2. r, Pearson correlation coefficient; p, significance of the correlation between trait and gene expression. (e) Scatter plots show correlation between the expression of PtoMPH2 and net photosynthetic rate, expression of Pto_XLOC_013503 and net photosynthetic rate, and between expression of PtoMPH2 and Pto_XLOC_013503. r, Pearson correlation coefficient; p, significance of the correlation between trait and gene expression. (f) Scatter plots show correlation between the expression of PtoKUP1 and net photosynthetic rate, expression of Pto_XLOC_022701 and net photosynthetic rate, and between expression of PtoKUP1 and Pto_XLOC_022701. r, Pearson correlation coefficient; P, significance of the correlation between trait and gene expression.

    • Conserved lncRNAs across species can provide further information to demonstrate their possible functions and the processes[49,50]. Combining the results of co-expression analysis and the origination of lncRNAs in two poplars, we were able to update the putative interspecies and intraspecies expression of the lncRNAs involved in the photosynthesis pathway (Data S3). For P. tomentosa, Pto_XLOC_026190 was highly expressed in the S region (Fig. 5a) and trans-regulated Ptom.010.01955 (facilitates the assembly of the photosystem II supercomplexes, PtoPPL1) (Fig. 5b). The regional differentiation was potentially similar to the patterns of Psi_XLOC_011671 homologous lncRNA (Fig. 5c), which positively regulated Potri.007G061400 (encoding a light stimulus response gene, PsiNIP2) (Fig. 5d). Moreover, another Pto_XLOC_001831 that contained two homologous lncRNA pairs in P. simonii was predicted to target Ptom.012.00615 (encoding a subunit of the chloroplast NAD(P)H dehydrogenase complex, which involved in PSI cyclic electron transport, PtoPnsB4). We found that PtoPnsB4 was highly expressed in the NE region for P. tomentosa. Additionally, homologous lncRNAs of Pto_XLOC_001831 in P. simonii also targeted two photosynthetic-related genes including PsiPSBR and PsiLHCB7. This can be inferred that Pto_XLOC_001831 showed high similarity with Psi_XLOC_022416 and may have functions in the regulatory network of photosynthesis. Both lncRNAs were highly expressed in the S region compared to the NE region. These phenomena indicated that Pto_XLOC_001831 was an evolutionarily more important lncRNA than Pto_XLOC_026190 in P. tomentosa (Fig. 5e). It also suggested that the regulatory module is highly conserved across different poplar species and may be functionally maintained.

      Figure 5. 

      LncRNAs have positive regulatory roles for photosynthetic genes and working model of Psi_XLOC_022416. (a) Pto-XLOC_026190 and its regulatory gene (b) PtoPPL1. The RNA-seq coverage of the genes was extracted from the leaf transcriptomes of P. tomentosa accessions from the Southern geographical region (S), the Northwestern geographical region (NW), and the Northeastern geographical region (NE), and the numbers in the panel indicate the mapping read counts of junction reads or exonic reads. (c) Psi-XLOC_011671 and its regulatory gene (d) PtoNIP1. The RNA-seq coverage of the genes was extracted from the leaf transcriptomes of P. simonii accessions from the Southern geographical region (S), the Northwestern geographical region (NW), and the Northeastern geographical region (NE), and the numbers in the panel indicate the mapping read counts of junction reads or exonic reads. (e) A proposed working model of Psi_XLOC_022416 in regulating PsiPSBR and PsiLHCB7 expression to modulate Populus photosynthetic variation in different geographical regions.

    • Many studies have found that DNA methylation level correlated with the expression level of PCGs. The covariation of DNA methylation and other genetic factors causes phenotypic variation during plants' growth and development[5153]. To investigate the divergence of genomic DNA methylation on photosynthetic variation, we analyzed the difference in methylation patterns associated with photosynthetic variations in P. tomentosa and P. simonii accessions. More than 100 million cytosines were sequenced in each sample, a number sufficient for further analysis (Supplemental Table S4). P. tomentosa displayed an average of 66.31%, 46.85%, 3.75% methylation in CG, CHG, and CHH contexts, respectively. Correspondingly, P. simonii presented a mean level of 40.34%, 32.55%, and 4.81% methylation in CG, CHG, and CHH contexts, respectively. We detected distinct differences in CV among three contexts, with the lowest diversity for CG context and the highest for CHH context (Supplemental Table S5). Accessions from the same geographical regions were often closely correlated, especially in CG context (Supplemental Fig. S3a, S3b). These results indicated that both Populus accessions possessed significant methylation variability and could contribute to trait variation.

      We identified DMRs to further investigate the differences in DNA methylation between three geographical regions in P. tomentosa and P. simonii. Our analysis identified 51,892, 61,161, and 28,981 DMRs in S vs. NW, S vs.NE, and NW vs. NE in P. tomentosa, respectively (Supplemental Table S6). Hypo-DMRs accounted for 54.82%−82.04% of the total DMRs. Similarly, for P. simonii, 23,840 and 34,400 hyper-DMRs were found in NW and NE regions when compared with the accessions from S region, suggesting that the lower methylation in Southern regions in both P. tomentosa and P. simonii (Fig. 6a). It is noteworthy that, despite DMRs in intergenic regions, 16.05% and 16.65% of DMRs occurred in promoters, suggesting that geographical regions might affect DNA methylation within promoter regions (Supplemental Fig. S4a).

      Figure 6. 

      Geographical variation of DNA methylation and genetic variation affect the function of lncRNA. (a) Count of hyper and hypo differentially methylated regions (DMRs) in P. tomentosa and P. simonii from three geographical regions in three contexts. (b), (c) Integrative Genomics Viewer plots of WGBS tracks in accessions from the Southern geographical region (S), the Northwestern geographical region (NW), and the Northeastern geographical region (NE), as well as RNA-seq tracks over (b) PtoPnsB4 and (c) PtoLHCA1. The position of the DMRs is indicated by the pink shadow area. (d) Circos plot representing interaction DMR-SNP pairs. The circles show the structure of gene PtoPnsB4. Blue, green, purple, white, and yellow arcs represent promoter, untranslated region (UTR), exon, intron, and downstream region, respectively. Interior lines represent the pairwise interactions of DMR-SNP pairwise. Orange, red, and blue lines indicate intra-gene interactions of different DMRs between DMR-SNP pairs. (e) A proposed working model that DNA methylation variation participates in the regulation of Pto_XLOC_001831that modulates Populus convergence evolution of photosynthesis.

    • Based on the presence of lncRNAs and genes with DMRs (designated as differentially methylated genes, DMGs), we identified lncRNA targets that were related to DNA methylation variation. Abundant DMGs (70.00% and 52.40%) were identified in the total lncRNA target genes in P. tomentosa and P. simonii (Supplemental Fig. S4b), including 464 and 255 DEGs differentially expressed between geographical regions (Supplemental Fig. S4c). To further investigate whether the balance between DNA methylation and lncRNAs is responsible for the natural variation of photosynthetic traits, we investigated the expression and the methylation patterns among different geographical regions of two species. Our study found several lncRNA-targeted DMGs that hypomethylated in the promoter regions of southern accessions (Supplemental Table S7).

      To explore the genetic variation and DNA methylation that affect the function of lncRNA in regulating its target genes, we further analyzed the relationship between DMRs and SNP in P. tomentosa and P. simonii (Supplemental Table S8; Data S10, S11). We identified 339 strong correlated pairs (P < 0.01, r2 > 0.1) between DMR and SNPs range from 2 to 102 in P. tomentosa (Supplemental Fig. S5a). Among these signals, a large proportion of SNPs 60.5% were identified in flanking regions and intergenic regions, only 24.21% of correlated SNPs were from promoter sequences and 9.47% from exon sequences. In P.simonii, we found one DMR correlated with three SNPs distributed in the intron and promoter (Supplemental Fig. S5b). We detected a regulatory model that showed the function of DNA methylation and genetic variation affecting the effect of lncRNAs on gene expression. For example, Pto_XLOC_001831 was higher expression in S regions compared with NE, but was not differentially expressed in NW region. The promoter regions of its target, PtoPnsB4, showed that the accessions from the NE region had higher methylation levels than the accessions from the S. This hypermethylation in NE accessions inhibited the gene expression in PtoPnsB4 (Fig. 6b). Two CG DMRs in promoter and downstream flanking regions of PtoPnsB4 are strongly associated with one and three SNPs in the exon region (Fig. 6d). Interestingly, SNP-12-7790159 (TT) correlated with hypo-methylated DMR in the promoter region in S and NE geographical regions. In comparison to the accessions from the NW, SNP-12-7790159 caused a missense variant resulted in the substitution of Tyr to His. Missense variant in gene transcript together with promoter DNA methylation inhibited the function of Pto_XLOC_001831 of gene and caused a lower expression level in PtoPnsB4 in accessions from NW regions (Fig. 6e). In addition, the homologue of PtoLHCA1 in P. simonii was hypermethylated in the NE region compared with the S region in CHH context (Fig. 6c). This homologue encodes a component of the light-harvesting complex associated with photosystem I. These results indicated a general coherence of interspecific genetic and epigenetic modification in photosynthetic pathways in two Populus species.

    • Increasing numbers of progress have been made in elucidating important roles of plant non-coding RNAs due to their extensive abundance. LncRNA has evolved in multiple molecular mechanisms to survive abiotic stress, such as water stress[54], temperature extremities[55], salinity[39], and heavy metal toxicity[56], etc. This study covered two Populus species, P. tomentosa and P. simonii, representing Populus sections white poplar and Tacamahaca Spach, which have similar natural geographical distribution in China. Compared with PCGs, lncRNAs had fewer exons, shorter mean lengths of exons, and were less abundantly expressed across the conditions in two species (Fig. 2c, d), suggesting a similar characteristic among two Populus species. Furthermore, plants evolved different lncRNAs expression abundance in response to distinct geographical climates. PCGs and lncRNA pairs have shown a significant contribution of lncRNA in a strong regulatory manner on gene expression[49,55].

      It is long been confusing on the evolutionary conservation of lncRNAs, their high levels of sequence divergence make them hard to study. In stark contrast to PCGs, only a small portion of lncRNA sequences (1.8%−52.6%) is conserved across nine species. LncRNAs in Salicaceae trees lack known orthologs in species outside of monocot plants, indicating poor conservation of lncRNAs[57,58]. We use three closely related Populus species P. tomentosa, P. simonii, and P. tritrocarpa to minimize the effects of genomic sequence divergence. LncRNAs are more frequently gained than lost, and the highest net gain rate was identified in the recent terminal Populus species. These results suggested that lncRNA transcription evolved extremely rapidly between closely related plants. The transience of intergenic lncRNA transcription is mirrored by changes to selective pressures acting on their sequences.

    • Forest trees experienced photosynthetic divergence as a direct response to landscape processes and heterogeneity of habitat. The threshold of temperature and limitation of precipitation may vary substantially with local environmental conditions, which leads to heterogeneous responses in tree biological adaptation of tree growth. The convergence of leaf photosynthetic characteristics in distinct lineages may contribute to the persistence of species in the adjacent environment in forests or similar geographical environmental constraints. Insights into photosynthesis and plants' geographical distribution provide valuable information to investigate plant–environment interactions during their long historical evolution[59,60]. Molecular genetics studies have shown that lncRNAs involved in the precise control of light-mediated development. MLNC3.2 and MLNC4.6 are predicted as endogenous target mimic for miRNA to regulate the expression of the SPL2-like and SPL33 transcription factors during light-induced anthocyanin biosynthesis and involve photosynthesis[61]. Thus, studying photosynthesis-associated modules would be more informative. The co-expression network analysis in P. tomentosa and P. simonii showed that genes involved in 'photosynthesis, light reaction' and 'photosynthesis' were enriched in MEbrown from P. tomentosa co-expression network and MEturquoise from P. simonii co-expression network, respectively. LncRNA expressions from accessions of the south region involving photosynthetic pathways were higher than accessions from the northern region, suggesting a conserved spatial-induced expression of lncRNAs in plants[62,63]. For MEbrown module in P. tomentosa, we discovered a lncRNA Pto_XLOC_013503 was co-expressed with PtoMPH2, and the expression pattern varies among geographical regions. This indicates that the effect of genes and lncRNAs may differ among geographical regions in photosynthetic efficiency and affect growth acclimation under photo-inhibitory light and fluctuating light environments[64]. We also found that Psi_XLOC_022416 in P. simonii has transcriptional regulatory relationships with PsiLHCB7 which can be strongly expressed when light harvesting is limiting for plant growth[65]. AtLHCB7 is also associated with the threshold of light-saturated photosynthesis rate and irradiance threshold for induction of photoprotective non-photochemical quenching[66]. Intriguingly, functional orthologs were found in homolog pairs Pto_XLOC_001831-Psi_XLOC_022416. These homologs were not found outside three Salicaceae species but were shown to have similar functions in photosynthetic pathways. We note these homologs as 'functional orthologs', which may have similar functions but have a poor ancestral relationship.

      Epigenetic variation is tightly linked to environmental and fitness differences, implying its involvement in adaptive evolution[67,68]. In this study, Populus samples were well distinguished into three clusters by DNA methylation which were consistent with the origin of accessions. Interestingly, DNA methylation of accessions from the South was lower than those from the Northern accessions in both Populus species (Fig. 6c). Thus, DNA methylation is involved in the variation of Populus from different geographical regions. In photosynthetic genes, we found that PtoLHCA1, PtoPnsB4, and PsiLHCB7 were all hypomethylated in the Southern region, demonstrating that DNA methylation may act as a regulator in plants' light harvesting process. The differentiated expression patterns of these genes across the three geographic regions (Fig. 6d, e) imply that the transcriptional regulation of photosynthetic may also undergo DNA methylation variation creating P. tomentosa ecotypes.

    • LncRNA works as a regulator by recruiting DNA methyltransferases or demethylases to regulate the target gene transcription. Some lncRNAs are involved in chromatin modification and RNA-directed DNA methylation (RdDM)[69,70]. Theoretical and empirical data showed that the stress responsiveness to fitness traits is typically an interactive modification process of genetic and epigenetic, in which epigenetic signatures are deeply interwoven with DNA sequence polymorphism[71,72]. Drought stress-dependent flowering vigor in the same altitudinal gradient reinforces SNP–DMC associations in adaptive evolution[73]. Patterns of correlation between promising selected DMRs and nearby SNPs assign causality DMRs associated with the flowering time traits and are consistent with the idea that many DMRs are the result of genetic changes for maize[74,75]. In this study, strong SNP-DMR correlation pairs were found when DMRs were involved in epigenetic variation between geographical regions (Fig. 6d), particularly when the co-varying SNPs were in promoter regions and protein-coding regions. Functional variants of genomic regions may have experienced strong selection pressure responsible for local adaptation within the species' widespread natural distribution[24,76]. We reveal distinct types of regulation between lncRNA modulators and target genes that are operative either in one species or across species[77]. The deregulation of Pto_XLOC_001831 expression in NE was associated with alterations in DNA methylation and genetic variation[78]. A missense variant and hypermethylation in the promoter region participate in the regulation of gene expression. On the contrary, the genetic locus can encode a suppressor program that is enforced by the lncRNA independent of the protein product of the locus despite the modification of DNA methylation[79]. DNA methylation regulates lncRNA expression to determine the dysregulation of the gene. A lncRNA arising from the CEBPA gene locus could compete with DNA methyltransferases, which inhibits CEBPA gene methylation and facilitates CEBPA expression[80]. These results provided valuable candidate allelic genes for regional breeding programs to improve photosynthetic efficiency in Populus.

      Collectively, using two Populus species that contain accession from three geographical regions, our results suggest a meaningful functional role for lncRNA and DNA methylation variation in the photosynthetic convergent evolution of Populus. The comparison of geographical regions could inform on the adaptive potential of two closely related Populus species in the evolution process. However, further investigation is required to make conclusive statements concerning the evolutionary basis of DNA methylation with genetic variation. Further investigation of the mechanism underlying the recruitment of DNA methylation through lncRNAs to affect genome-wide patterns of gene regulation is warranted. Therefore, the gene editing technology of CRISPR-Cas9[81] and the methylation editing technology of CRISPR-dCas9[82] will help to determine the trigger for the deep-seated mechanism of naturally occurring lncRNAs and epigenetic variation and may provide a useful source of regulatory variation for tree improvement.

      • This work was supported by the Major Science and Technology project of Inner Mongolia Autonomous Region (No. 2021ZD0008), Natural Science Foundation of Beijing Municipality (No. 6212021), National Natural Science Foundation of China (No. 31872707), Key research and development project of Zhejiang Province (No. 2021C02054), Fok Ying Tung Education Foundation of China (No. 171020) and the 111 Project (No. B20050).

      • The authors declare that they have no conflict of interest.

      • Supplemental Table S1 Phenotypic variation among ten accessions from S, NW and NE climate regions in Populus tomentosa and Populus simonii.
      • Supplemental Table S2 RNA-sequencing data statistics.
      • Supplemental Table S3 Phenotypic variation among ten accessions from S, NW and NE climate regions in Populus tomentosa and Populus simonii.
      • Supplemental Table S4 Differentially expressed genes and lncRNAs in photosynthetic modules of Populus tomentosa and Populus simonii.
      • Supplemental Table S5 Methylome data bisulfite conversion rate and sequencing statistics.
      • Supplemental Table S6 Data description of methylation level for ten Populus tomentosa and Populus simonii accessions.
      • Supplemental Table S7 The number of differentially methylated regions in genomic features in Populus tomentosa and Populus simonii.
      • Supplemental Table S8 The number of differentially methylated regions in genomic features in Populus tomentosa and Populus simonii.
      • Supplemental Table S9 Correlation analysis between differentially methylated region and local SNP in Populus tomentosa and Populus simonii .
      • Supplemental Table S10 Interacting DMR and SNP in 2kb.
      • Supplemental Fig. S1 Box plot of regulatory roles of lncRNAs in Populus tomentosa and Populus simonii.
      • Supplemental Fig. S2 Gene Ontology enrichment analyses of Turquoise Module in Populus simonii.
      • Supplemental Fig. S3 Geographical variation characterized by global DNA methylation patterns.
      • Supplemental Fig. S4 The distribution of differentially methylated genes in Populus tomentosa and Populus simonii.
      • Supplemental Fig. S5 Differentially methylated regions (DMRs) correlated with genetic variation.
      • Data S1 The original information of lncRNAs of Populus tomentosa.
      • Data S2 The original information of lncRNAs of Populus simonii.
      • Data S3 The expression data of differentially expressed lncRNA of Populus tomentosa.
      • Data S4 The expression data of differentially expressed gene of Populus tomentosa.
      • Data S5 The expression data of differentially expressed lncRNA of Populus simonii.
      • Data S6 The expression data of differentially expressed gene ofPopulus simoniif.
      • Data S7 Conserved and Species-specific  lncRNA in P. tomentosa and P. simonii. .
      • Data S8 GO term enrichment analyses of genes in MEbrown module in Populus tomentosa.
      • Data S9 GO term enrichment analyses of genes in MEturquoise module in Populus simonii.
      • Data S10 Polymorphisms around Differientially methylated regions of photosynthetic hubgenes used for linkage disequilibrium analysis of Populus tomentosa.
      • Data S11 Polymorphisms around Differientially methylated regions of photosynthetic hubgenes used for linkage disequilibrium analysis of Populus simonii.
      • Method S1 The process of RNA-sequencing.
      • Method S2 The process to calculate DMRs.
      • Copyright: © 2023 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.
    Figure (6)  References (82)
  • About this article
    Cite this article
    Zhou J, Song F, He Y, Zhang W, Xiao L, et al. 2023. LncRNA evolution and DNA methylation variation participate in photosynthesis pathways of distinct lineages of Populus. Forestry Research 3:3 doi: 10.48130/FR-2023-0003
    Zhou J, Song F, He Y, Zhang W, Xiao L, et al. 2023. LncRNA evolution and DNA methylation variation participate in photosynthesis pathways of distinct lineages of Populus. Forestry Research 3:3 doi: 10.48130/FR-2023-0003

Catalog

  • About this article

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return