ARTICLE   Open Access    

A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression

  • # Authors contributed equally: Qi Xie, Umair Ahmed

More Information
  • Received: 23 October 2023
    Revised: 07 April 2024
    Accepted: 07 May 2024
    Published online: 01 June 2024
    Forestry Research  4 Article number: e021 (2024)  |  Cite this article
  • Real-time quantitative reverse transcription polymerase chain reaction (RT-qPCR) plays a crucial role in relative gene expression analysis, and accurate normalization relies on suitable reference genes (RGs). In this study, a pipeline for identifying candidate RGs from publicly available stem-related RNA-Seq data of different Populus species under various developmental and abiotic stress conditions is presented. DESeq2's median of ratios yielded the smallest coefficient of variance (CV) values in a total of 292 RNA-Seq samples and was therefore chosen as the method for sample normalization. A total of 541 stably expressed genes were retrieved based on the CV values with a cutoff of 0.3. Universal gene-specific primer pairs were designed based on the consensus sequences of the orthologous genes of each Populus RG candidate. The expression levels of 12 candidate RGs and six reported RGs in stems under different abiotic stress conditions or in different Populus species were assessed by RT-qPCR. The expression stability of selected genes was further evaluated using ΔCt, geNorm, NormFinder, and BestKeeper. All candidate RGs were stably expressed in different experiments and conditions in Populus. A test dataset containing 117 RNA-Seq samples was then used to confirm the expression stability, six candidate RGs and three reported RGs met the requirement of CV ≤ 0.3. In summary, this study was to propose a systematic and optimized protocol for the identification of constitutively and stably expressed genes based on RNA-Seq data, and Potri.001G349400 (CNOT2) was identified as the best candidate RG suitable for gene expression studies in poplar stems.
  • Plants are continuously subjected to unpredictable environmental conditions and encounter a multitude of stressors throughout their growth and development, posing a significant challenge to global crop production and food security[1]. Heat and drought are undoubtedly the two most important stresses that have a huge impact on crops. Both elicit a wide array of biochemical, molecular, and physiological alterations and responses, impacting diverse cellular processes and ultimately influencing crop yield and quality[2].

    A primary physiological consequence of both stresses is the diminished photosynthetic capacity, partially resulting from the degradation of chlorophyll due to leaf senescence under stress conditions. Chlorophyll accumulation was diminished in numerous plants subjected to drought or heat stress conditions[3,4]. Various environmental stresses prompt excessive generation of reactive oxygen species (ROS), initiating oxidative damage that compromises lipids, and proteins, and poses a serious threat to cellular functions[2]. To mitigate oxidative stress and minimize damage, plants have developed various protective mechanisms to neutralize ROS. Several antioxidant enzymes, such as SOD, POD, and CAT, are integral to cellular antioxidative defense mechanisms. Additionally, antioxidants such as anthocyanins and proline serve as crucial ROS scavengers[5,6]. The elevation in temperature typically induces the transient synthesis of heat shock proteins (Hsps), which function as molecular chaperones in protecting proteins from denaturation and aggregation, with their activity primarily regulated at the transcriptional level by heat shock factors (Hsfs)[7]. The significance of Hsps and Hsfs in all organisms, including plants, has been assessed in various stress conditions that could disrupt cellular homeostasis and result in protein dysfunction[7]. Drought stress can also trigger the transcription of a suite of marker genes, including RD29A, RD29B, NCED3, AREB1, Rab18, etc., which assist plants in mitigating cellular damage during dehydration and bolstering their resilience to stress[810].

    Previous research efforts focusing on the regulatory control of stress-related genes have largely centered around protein-coding genes. In recent years, non-protein-coding transcripts have emerged as important regulatory factors in gene expression. Among them, long non-coding RNAs (lncRNAs) lncRNAs have been identified as implicated in various abiotic stresses[11,12]. LncRNAs are a class of non-coding RNAs (ncRNAs) exceeding 200 nucleotides in length. They possess minimal or no protein-coding potential[13]. In plants, lncRNAs are specifically transcribed by RNA polymerases Pol IV, Pol V, Pol II, and Pol III[14,15]. LncRNAs exhibit low abundance and display strong tissue and cellular expression specificity relative to mRNAs. Moreover, sequence conservation of lncRNAs is was very poor across different plant species[13,16,17]. The widespread adoption of high-throughput RNA sequencing technology has revealed lncRNAs as potential regulators of plant development and environmental responses. In cucumber, RNA-seq analysis has predicted 2,085 lncRNAs to be heat-responsive, with some potentially acting as competitive endogenous RNAs (ceRNAs) to execute their functions[18]. In radish, a strand-specific RNA-seq (ssRNA-seq) technique identified 169 lncRNAs that were differentially expressed following heat treatment[19]. In Arabidopsis, asHSFB2a, the natural antisense transcript of HSFB2a was massively induced upon heat stress and exhibited a counteracted expression trend relative to HSFB2a. Overexpression of asHSFB2a entirely suppressed the expression of HSFB2a and impacted the plant's response to heat stress[20]. For drought stress resistance, 244 lncRNAs were predicted in tomatoes to be drought responsive probably by interacting with miRNAs and mRNAs[21]. Under drought stress and rehydration, 477 and 706 lncRNAs were differentially expressed in drought-tolerant Brassica napus Q2 compared to drought-sensitive B. napus, respectively[22]. In foxtail millet and maize, 19 and 644 lncRNAs, respectively, were identified as drought-responsive[23,24]. Despite the identification of numerous lncRNAs by high-throughput sequencing, which suggests their potential involvement in various abiotic stress processes, only a minority have been experimentally validated for function.

    In our previous study, we characterized 1,229 differentially expressed (DE) lncRNAs in Chinese cabbage as heat-responsive, and subsequent bioinformatics analysis reduced this number to 81, which are more likely associated with heat resistance[25]. lnc000283 and lnc012465 were selected from among them for further functional investigation. The findings indicated that both lnc000283 and lnc012465 could be promptly induced by heat shock (HS). Overexpression of either lnc000283 or lnc012465 in Arabidopsis plants enhanced their capacity to tolerate heat stress. Additionally, both lnc000283 and lnc012465 conferred drought tolerance to transgenic Arabidopsis.

    The lncRNA sequences examined in this study were from Chiifu-401-42 Chinese cabbage and all Arabidopsis plants were of the Col-0 background. Transgenic plants expressing lnc000283 and lnc012465 were generated using the Agrobacterium tumefaciens-mediated floral dip method[26]. Single-copy and homozygous T3 plants were identified through genetic segregation on an agar medium supplemented with kanamycin. The T3 generation plants, or their homozygous progeny, were utilized in the experiments.

    For phenotypic assessment, Arabidopsis seeds were initially sown on filter paper moistened with ddH2O and placed in a 4 °C freezer for 2 d. Subsequently, they were evenly planted in nutrient-rich soil and transferred to a growth chamber operating a 16-h day/8-h night cycle, with day/night temperatures of 22 °C/18 °C and a light intensity of 250 μmol·m−2·s−1. After 10 d of growth, Arabidopsis plants with uniform growth were transferred to 50-hole plates. Arabidopsis plants grown in Petri dishes were firstly seed-sterilized and then sown on 1/2 MS medium supplemented with 10 g·L−1 sucrose. The seeds were then placed in a 4 °C refrigerator for 2 d in the dark before transferring them to a light incubator. The day/night duration was set to 16 h/8 h, the day/night temperature to 21 °C/18 °C, and the light intensity to 100 μmol·m−2·s−1.

    For heat treatment, 3-week-old seedlings were subjected to 38 °C for 4 d within a light incubator, subsequently transferred to their original growth conditions under the same light/dark cycles. For drought treatment, 3-week-old Arabidopsis seedlings were deprived of water for 10 d, followed by rehydration to facilitate a 2-d recovery period. Plants were photographed and surveyed both before and after treatment.

    The lncRNA sequences (lnc000283 and lnc012465) were chemically synthesized based on RNA-seq data, with restriction sites for BamH1 and Kpn1 engineered upstream and downstream. The resultant lncRNA constructs were subcloned into the pCambia2301 binary vector, incorporating a cauliflower mosaic virus (CaMV) 35S promoter. The recombinant vectors were transformed into Escherichia coli TOP10 competent cells (Clontech), incubated at 37 °C overnight, after which single clones were selected for PCR verification, and the confirmed positive colonies were submitted for sequencing. Following verification, the correct plasmids were introduced into A. tumefaciens strain GV3101 using the freeze-thaw method and subsequently transformed into Arabidopsis wild-type (Col) plants.

    To quantify the chlorophyll content, the aerial portions of wild-type and transgenic Arabidopsis plants, grown in Petri dishes were weighed, minced, and then subjected to boiling in 95% ethanol until fully decolorized. Aliquots of 200 μL from the extract were transferred to a 96-well plate and the absorbance at 663 nm and 645 nm was measured via spectrophotometry by a microplate reader (Multiskan GO, Thermo Scientific, Waltham, MA, USA). Three biological replicates were analyzed for WT and each transgenic line. Chlorophyll content was determined according to the formula of the Arnon method[27]: Chlorophyll a = (12.72A663 − 2.59A645) v/w, Chlorophyll b = (22.88A645 − 4.67A663) v/w, Total chlorophyll = (20.29A645 + 8.05A663) v/w.

    The quantification of anthocyanin was performed as follows: aerial parts of wild-type and transgenic Arabidopsis plants, cultivated in Petri dishes, were weighed and ground to powder in liquid nitrogen. Subsequently, the samples were incubated in 600 μL of acidified methanol (containing 1% HCl) at 70 °C for 1 h. Following this, 1 mL of chloroform was added, and the mixture was vigorously shaken to remove chlorophyll. The mixture was then centrifuged at 12,000 rpm for 5 min, after which the absorbance of the aqueous phase was determined at 535 nm using a spectrophotometer (Shimadzu, Kyoto, Japan). Three biological replicates were analyzed for WT and each transgenic line. The relative anthocyanin content was calculated according to anthocyanin concentration and extraction solution volume. One anthocyanin unit is defined as an absorption unit at a wavelength of 535 nm in 1 mL of extract solution. In the end, the quantity was normalized to the fresh weight of each sample.

    Three-week-old transgenic and WT A. thaliana plants, subjected to normal conditions or varying durations of heat or drought stress, were utilized for subsequent physiological assessments. All assays were performed in accordance with the method described by Chen & Zhang[28]. In brief, 0.1 g of fresh leaf tissue was homogenized in 500 μL of 100 mM PBS (pH 7.8) while chilled on ice. The homogenate was then centrifuged at 4 °C, and the resultant supernatant was employed for further analysis. For the determination of MDA content, 100 μL of the supernatant was combined with 500 μL of a 0.25% thiobarbituric acid (TBA) solution (which was prepared by dissolving 0.125 g of TBA in 5 mL of 1 mol·L−1 NaOH before being added to 45 mL of 10% TCA) and boiled for 15 min. Following a 5 min cooling period on ice, the absorbance was measured at 532 nm and 600 nm. The activity of POD was determined as follows: initially, 28 μL of 0.2% guaiacol and 19 μL of 30% H2O2 were sequentially added to 50 mL of 10mM PBS (pH 7.0), after thorough heating and mixing, 1 mL was transferred into a cuvette, then 50 μL of the supernatant was added to the cuvette and the absorbance at 470 nm was monitored every 15 s for 1 min. To determine the proline content, a reaction solution was prepared by mixing 3% sulfosalicylic acid, acetic acid, and 2.5% acidic ninhydrin in a ratio of 1:1:2, then 50 μL of the supernatant was added to 1 mL of the reaction solution, which was then subjected to a boiling water bath for 15 min (the solution turned red after the boiling water bath). Following cooling on ice, the absorbance at 520 nm was recorded. For the quantification of proline, an L-proline standard curve was prepared by dissolving 0, 5, 10, 15, 20, 25, and 30 μg of L-proline in 0.5 mL of ddH2O, followed by the addition of 1 mL of the reaction solution and measuring the absorbance at 520 nm. The proline content in the samples was then determined based on the L-proline standard curve.

    Total RNA was isolated from the aerial parts of Arabidopsis using the TaKaRa MiniBEST Plant RNA Extraction Kit, followed by purification and reverse transcription using the PrimeScript RT reagent Kit with gDNA Eraser (Takara). The cDNA product was diluted 10 times and real-time PCR was conducted in triplicate for each biological replicate using SYBR PCR Master Mix (Applied Biosystems) on the ABI 7500 system under the following conditions: 98 °C for 3 min, followed by 40 cycles of 98 °C for 2 s and 60 °C for 30 s. The relative expression levels of each gene were normalized against the transcript abundance of the endogenous control UBC30 (At5g56150) and calculated using the 2−ΔCᴛ method. The specific primers employed for qRT-PCR are detailed in Supplemental Table S1.

    In our prior investigation, dozens of lncRNAs associated with the heat stress response in Chinese cabbage were identified through informatics analysis. Two lncRNAs (lnc000283 and lnc012465) were chosen for genetic transformation in Arabidopsis to elucidate their functions comprehensively. Transcriptome data analysis indicated that the expression of lnc000283 and lnc012465 in Chinese cabbage were both induced by HS. To verify the accuracy, the expression patterns of lnc000283 and lnc012465 were confirmed through quantitative real-time PCR (qRT-PCR), and the results from qRT-PCR were consistent with those obtained from RNA-seq (Fig. 1a). The corresponding homologous genes in Arabidopsis were identified as CNT2088434 and CNT2088742, exhibiting sequence similarities of 88% and 87%, respectively (Supplemental Fig. S1). Subcellular localization predictions using the lnclocator database (www.csbio.sjtu.edu.cn/bioinf/lncLocator) suggested that both lncRNAs are localized within the nucleus (Supplemental Table S2). Bioinformatics analysis was conducted using the CPC tool (http://cpc.cbi.pku.edu.cn/) indicated that lnc000283 and lnc012465 are noncoding sequences, with coding probabilities of 0.0466805 and 0.0432148, respectively comparable to the well-characterized lncRNAs COLDAIR and Xist, but significantly lower than those of the protein-coding genes UBC10 and ACT2 (Fig. 1b).

    Figure 1.  Characteristics of lnc000283 and lnc012465. (a) Expression level of lnc000283 and lnc012465 in Chinese cabbage leaves treated at 38 °C at different time points, as determined by qRT-PCR and RNA-seq. CK is a representative plant before heating, and T1, T4, T8, and T12 denote plants that were subjected to 38 °C for 1, 4, 8, and 12 h, respectively. The expression levels were normalized to the expression level of Actin. (b) Analysis of coding potential for lnc000283 and lnc012465. The coding potential scores were calculated using the CPC program. UBC10 (At5g53300) and ACT2 (At3g18780) are positive controls that encode proteins. COLDAIR (HG975388) and Xist (L04961) serve as negative controls, exhibiting minimal protein-coding potential.

    To elucidate the role of lnc000283 and lnc012465 in response to abiotic stress, overexpression vectors were constructed for these lncRNAs, driven by the CaMV 35S promoter, and they were introduced into Arabidopsis thaliana (Col-0 ecotype). Through PCR identification and generational antibiotic screening, two homozygous positive lines for lnc012465 and lnc000283 were obtained. The relative expression levels of these lncRNAs were assessed using qRT-PCR (Fig. 2a). When plants were grown in 1/2 MS medium, with the consumption of nutrients, and reduction of water, the leaves of WT began to turn yellow, but the lnc000283 and lnc012465 overexpression lines developed a deep purple color of leaf veins (Fig. 2b). Examination of chlorophyll and anthocyanin contents in the plants revealed that both overexpression lines had higher levels of chlorophyll and anthocyanin compared to the WT, suggesting that the transgenic plants might possess enhanced resistance to nutritional or water stress (Fig. 2c, d).

    Figure 2.  Arabidopsis plants overexpressing lnc000283 and lnc012465 had higher anthocyanins and chlorophyll content. (a) The relative expression level of lnc000283 and lnc012465 in WT and different transgenic lines. UBC10 (At5g53300) was used as an internal control. Each value is mean ± sd (n = 3). (b) The phenotype of WT and Arabidopsis overexpressing lnc000283 or lnc012465 grown on 1/2 MS medium 50 d after sowing. The (c) anthocyanin and (d) chlorophyll content of WT and transgenic Arabidopsis overexpressing lnc000283 or lnc012465. The asterisks above the bars indicate statistical significance using Student's t-test (*, p < 0.05; **, p < 0.01).

    Given that lnc000283 and lnc012465 were highly induced by heat, the thermotolerance of the overexpressing (OE) plants were compared to that of the wild type. Arabidopsis plants were initially exposed to a an HS treatment at 38 °C for 4 d, followed by recovery at room temperature. The death caused by HS was processive. Post-severe HS challenge for 4 d, OE plants initially appeared similar to WT, but upon recovery, their leaves started to fold or curl, followed by a transition to yellow, white, and eventually drying out (Fig. 3a). OE lnc000283 and OE lnc012465 plants exhibited enhanced thermotolerance compared to WT, with lnc012465 showing particularly strong tolerance (Fig. 3a; Supplemental Fig. S2a). After 5 d of recovery, leaf coloration indicated that transgenic plants maintained a significantly higher percentage of green leaves and a lower percentage of bleached leaves compared to WT (Fig. 3b; Supplemental Fig. S2b). Under non-heat-stress conditions, WT and OE plants possessed comparable water content. However, following heat stress, the fresh-to-dry weight ratio of OE lnc000283 and lnc012465 lines was significantly greater than that of WT (Fig. 3c; Supplemental Fig. S2c). Abiotic stresses frequently trigger the production of excessive reactive oxygen species (ROS), which are believed to cause lipid peroxidation of membrane lipids, leading to damage to macromolecules. Leaf MDA content is commonly used as an indicator of lipid peroxidation under stress conditions; therefore, the MDA content in both transgenic and WT plants was assessed. Figure 3d shows that the MDA content in WT plants progressively increased after heat treatment, whereas in the two lines overexpressing lnc012465, the MDA content increased only slightly and remained significantly lower than that in WT at all time points. In plants overexpressing lnc000283, the MDA content did not significantly differ from that of WT before heat stress. However, after 4 d of heat treatment, the MDA content was significantly lower compared to WT (Supplemental Fig. S2d). The results suggested that the expression of both lnc012465 and lnc000283 can mitigate injury caused by membrane lipid peroxidation under heat-stress conditions. Peroxidase (POD) is a crucial antioxidant enzyme involved in ROS scavenging. Figure 3e and Supplemental Fig. S2e demonstrate that POD activity increased in both transgenic and WT plants after heat treatment. However, the increase in WT plants was modest, whereas OE lnc000283 and OE lnc012465 plants exhibited consistently higher POD activity. As anticipated, proline levels were induced in response to stress in all studied plants (Fig. 3f; Supplemental Fig. S2f). However, under normal conditions and 2 d post-heat stress treatment, the proline content in OE lnc000283 and OE lnc012465 plants did not exhibit significant changes compared to WT (Fig. 3f; Supplemental Fig. S2f). Moreover, after 4 d of heat stress, the proline content in OE lnc012465 lines was significantly lower than in WT, and the OE lnc000283 transgenic line 12-6 also showed a marked decrease in proline content compared to WT (Fig. 3f; Supplemental Fig. S2f). The results indicated that the thermotolerance of plants overexpressing either lnc000283 or lnc012465 was independent of proline accumulation.

    Figure 3.  Overexpressing lnc012465 lines are more tolerant to heat stress. (a) Phenotypes of WT and OE lnc012465 plants were assessed before and after exposure to heat stress. The heat treatment was applied to 25-day-old Arabidopsis plants. (b) The percentage of leaves with different colors in Arabidopsis after heat treatment and recovery for 5 d. (c) The fresh-to-dry weight ratio of Arabidopsis leaves was measured before and after 38 °C heat treatment. (d)−(f) depict the MDA content, POD activity, and proline content in Arabidopsis leaves at varying durations of heat stress. The asterisks above the bars indicate statistical significance using Student's t-test (*, p < 0.05; **, p < 0.01; ***, p < 0.001).

    To elucidate the molecular mechanisms by which lncRNAs enhance thermotolerance in Arabidopsis, the expression of the Hsf gene HsfA7a and three Hsps (Hsp25.3, Hsa32, and Hsp18.1-CI) in OE lnc000283, OE lnc012465, and WT Arabidopsis plants were investigated at various time points following heat treatment. As shown in Fig. 4 and Supplemental Fig. S3, both Hsf and Hsps exhibited a rapid response to heat stress with strong induction. Notably, the transcripts of HsfA7a and Hsp25.3 were significantly upregulated at 1 h after heat exposure, then experienced a sharp decrease. Hsa32 and Hsp18.1-CI were highly induced at 1 h and, unlike the other proteins, sustained high expression levels at 3 h (Fig. 4; Supplemental Fig. S3). At 1 h post-heat treatment, the transcript levels of Hsa32 and HsfA7a in OE lnc000283 did not significantly differ from those in WT. However, by 3 h, Hsa32 expression was roughly 50% of the WT level, while HsfA7a expression was approximately double that of WT (Supplemental Fig. S3). The overexpression of lnc000283 did not significantly affect the transcript level of Hsp25.3 at any of the tested time points. Notably, Hsp18.1-CI expression in both lines overexpressing lnc000283 was significantly induced at all three detection points post-heat treatment, reaching approximately 4-9-fold higher levels than in the WT (Supplemental Fig. S3). In Arabidopsis plants with elevated expression of lnc012465, the expression patterns of all Hsp and Hsf genes were similar to those in plants overexpressing lnc000283, with the notable exception of Hsa32. Unlike the WT, Hsa32 did not show a trend of down-regulation at 3 h post-heat treatment (Fig. 4). The findings suggest that the substantial induction of Hsp18.1-CI may play a role in enhancing the thermotolerance of Arabidopsis plants overexpressing lnc000283 and lnc012465.

    Figure 4.  The expression of HSF and HSP genes in lnc012465 overexpressing lines before and after different heat treatment times. Gene expression levels were quantified using RT-qPCR and normalized to UBC10 (At5g53300). Each value represents the mean ± standard deviation (n = 3). The asterisks above the bars indicate statistical significance using Student's t-test (*, p < 0.05; **, p < 0.01; ***, p < 0.001).

    Prior research has implicated a significant proportion of genes in conferring resistance to various abiotic stresses. To elucidate the functions of lnc000283 and lnc012465 more thoroughly, WT and transgenic plants were subjected to drought stress by depriving them of water for 9 d. It was noted that the majority of leaves in WT plants withered and dried, whereas the OE lnc000283 and OE lnc012465 plants exhibited reduced withering, with only a minority displaying dryness (Fig. 5a; Supplemental Fig. S4a). Eight days post-rewatering, a negligible fraction of WT seedlings exhibited recovery, whereas the overwhelming majority of transgenic plants regained vigorous growth (Fig. 5a; Supplemental Fig. S4a). The transgenic plants demonstrated a significantly higher survival rate compared to the WT plants. Following 9 d of water deficit treatment, less than 40% of the WT plants survived, whereas the OE 012465 lines 8-7 and 9-1 exhibited survival rates of 100% and 95%, respectively, and the OE 000283 lines 11-10 and 12-6 had survival rates of 87% each. (Fig. 5b; Supplemental Fig. S4b). Water loss serves as a critical metric for assessing plant drought tolerance, hence the fresh-to-dry weight ratio of excised leaves was assessed via desiccation analysis. Following 4 d of drought treatment, the fresh-to-dry weight ratio for WT plants was reduced to 43%, whereas for OE lnc000283 lines 11-10 and 12-6, it was reduced to 73% and 75%, respectively. For OE 012465 lines 8-7 and 9-1, the ratios were reduced to 67% and 62%, respectively (Fig. 5c; Supplemental Fig. S4c). The findings indicated that lnc000283 and lnc012465 endow the transgenic plants with drought tolerance.

    Figure 5.  Overexpressing lnc012465 lines are more tolerant to drought stress. (a) Phenotype of WT and OE lnc012465 plants before and after subjecting to drought stress. Drought treatment was carried out on 20-day-old Arabidopsis plants. (b) The percentage of leaves with different colors in Arabidopsis after heat treatment and recovery for 5 d. (c) The fresh weight to dry weight ratio of Arabidopsis leaves before and after undergoing 38 °C heat treatment. (d)−(f) MDA content, POD activity, and proline content in Arabidopsis leaves under different times of heat stress. The asterisks above the bars indicate statistical significance using Student's t-test (*, p < 0.05; **, p < 0.01; ***, p < 0.001)

    MDA content in leaves is a standard biomarker for assessing the extent of drought stress-induced damage. Prior to drought stress exposure, MDA levels in WT and transgenic plants were comparable. However, following 7 and 9 d of water deficit, the MDA content in the transgenic plants was markedly reduced compared to the WT, suggesting a less severe degree of membrane lipid peroxidation in the transgenic plants (Fig. 5d; Supplemental Fig. S4d). Oxidative stress frequently coincides with drought stress, hence the activity of POD was assessed to evaluate the ROS scavenging ability. The findings indicated that as the duration of drought treatment increased, POD activity progressively rose. Before drought exposure, the POD activity in lines 11-10 and 12-6 of OE 000283 was 2.4-fold and 2.2-fold higher than that of the WT, respectively (refer to Supplemental Fig. S4e). Following drought treatment, the POD activity in the transgenic lines remained significantly elevated compared to the wild type, although the enhancement was less pronounced than before the treatment (Supplemental Fig. S4e). In the OE 012465 plants, the POD activity in lines 8-7 and 9-1 significantly surpassed that of the wild type, with the discrepancy being more pronounced during drought stress (Fig. 5e). The proline content in WT and OE 000283 plants exhibited no significant differences before and after 7 d of treatment. However, after 9 d of drought, the proline content in OE 000283 plants was significantly lower compared to that in the WT (Supplemental Fig. S4f). OE 000465 plants showed no significant difference from the wild type before and after drought treatment (Fig. 5f). The findings were consistent with those under heat stress, indicating that the enhanced stress resistance due to the overexpression of lnc000283 and lnc012465 in Arabidopsis is not reliant on proline accumulation.

    Following drought stress treatment, the expression levels of drought-related genes such as RD29A, RD29B, NCED3, AREB1, and Rab18 were significantly elevated in plants overexpressing lnc000283 and lnc012465 compared to WT plants. These findings suggest that lnc000283 and lnc012465 modulate Arabidopsis drought tolerance by regulating the expression of genes associated with the drought stress response (Fig. 6; Supplemental Fig. S5).

    Figure 6.  The expression of drought-responsive genes in lnc012465 overexpressing lines before and after different drought treatment time. Gene expression levels were determined by qRT-PCR normalized against UBC10 (At5g53300). Each value is mean ± sd (n = 3). The asterisks above the bars indicate statistical significance using Student's t-test (*, p < 0.05; **, p < 0.01; ***, p < 0.001).

    The integrity of global food security is under threat due to the confluence of rapid population expansion and profound climatic shifts[29]. Amidst the shifting climatic landscape, heat and drought stress have emerged as primary limitations to crop yield and global food security. Understanding how plants detect stress cues and acclimate to challenging conditions is a pivotal biological inquiry. Moreover, enhancing plant resilience to stress is essential for maintaining agricultural productivity and fostering environmental sustainability[2]. Concurrently, the advancement of next-generation sequencing (NGS) technology has led to the identification of a substantial number of lncRNAs that participate in diverse stress responses, with functional analyses having been conducted on several of these molecules.[30] For instance, in the case of potatoes, the lncRNA StFLORE has been identified to modulate water loss through its interaction with the homologous gene StCDF1[31]. LncRNA TCONS_00021861 can activate the IAA biosynthetic pathway, thereby endowing rice with resistance to drought stress[32]. In wheat, the expression of TalnRNA27 and TalnRNA5 was upregulated in response to heat stress[33]. Our prior investigation identified a total of 81 lncRNAs in Chinese cabbage that engage in intricate interactions with their respective mRNA targets across various phases of heat treatment[25]. Two lncRNAs, lnc000283 and lnc012465, were chosen for subsequent functional analysis. Findings confirmed that these lncRNAs endow transgenic Arabidopsis plants with enhanced tolerance to both heat and drought, thereby offering novel resources for enhancing stress resistance through genetic engineering.

    Abiotic stresses frequently trigger the synthesis of anthocyanins, serving as natural antioxidants that mitigate oxidative damage by neutralizing surplus reactive oxygen species (ROS), thereby protecting plants from growth inhibition and cell death, allowing plants to adapt to abiotic stresses[34,35]. For instance, during chilling stress, the accumulation of anthocyanins within leaves can mitigate oxidative damage, thereby enhancing the photosynthetic rate[36]. Consequently, the level of abiotic stress tolerance can be inferred from the concentration of anthocyanins. The reduction of photosynthetic ability is one of the key physiological phenomena of stresses, which is partly due to the degradation of chlorophyll caused by leaf senescence during stress. The reduced accumulation of chlorophyll in the plants was seen in many plants when exposed to drought or heat stress conditions. The current investigation revealed that lncRNA-overexpressing plants cultivated in Petri dishes exhibited increased accumulation of both chlorophyll and anthocyanins in advanced growth phases, indicating that these transgenic plants, overexpressing lnc000283 and lnc012465, demonstrated enhanced stress tolerance and superior growth performance relative to WT (Fig. 2c, d).

    Upon exposure to heat stress, there is a marked induction of transcription for numerous genes that encode molecular chaperones in plants, with the vast majority of these genes contributing to the prevention of protein denaturation-related damage and the augmentation of thermotolerance[3739]. The present investigation identified multiple heat-inducible genes in plants overexpressing lnc000283 and lnc012465, as well as in WT (Fig. 4; Supplemental Fig. S3). The findings indicated that of the four HSP or HSF genes examined, Hsp18.1-CI exhibited a significantly greater abundance in both OE lnc000283 and OE lnc012465 plants compared to the WT following heat treatment for several days. Hsp18.1-CI, formerly referred to as Hsp18.2 has been the subject of investigation since 1989.[40] Following the fusion of the 5' region of Hsp18.2 in frame with the uidA gene of Escherichia coli, the activity of GUS, serving as the driver gene was observed to increase upon exposure to HS[40]. The Arabidopsis hsfA2 mutant exhibited diminished thermotolerance after heat acclimation, with the transcript levels of Hsp18.1-CI being substantially reduced compared to those in wild-type plants following a 4-h recovery period[41]. The findings revealed that the upregulation of Hsp18.1-CI protein is a critical mechanism by which plants achieve enhanced protection against heat stress in adverse environmental conditions, thereby bolstering their thermotolerance.

    Plants cultivated in natural settings are often subjected to concurrent multiple abiotic stresses, which can exacerbate threats to their routine physiological functions, growth, and developmental processes[42,43]. Elucidating the molecular mechanisms underlying plant responses to abiotic stress is crucial for the development of new crop varieties with enhanced tolerance to multiple abiotic stresses. Previous research has indicated that the overexpression of certain protein-coding genes can endow plants with resistance to a variety of abiotic stresses. For instance, tomatoes with robust expression of ShCML44 demonstrated significantly enhanced tolerance to drought, cold, and salinity stresses[44]. Overexpression of PeCBF4a in poplar plants confers enhanced tolerance to a range of abiotic stresses[45]. With respect to lncRNAs, transgenic Arabidopsis plants that overexpress lncRNA-DRIR displayed marked increased tolerance to salt and drought stresses compared to the wild-type[46]. In the present study, both overexpression lines of lnc000283 and lnc012465 exhibited resistance to heat and drought stresses, thereby contributing to the enhancement of plant resilience against multiple stresses (Figs 3, 5; Supplemental Figs S2, S4).

    The number of genes implicated in plant drought resistance is regulated by both ABA-dependent and ABA-independent pathways[47,48]. It is well established that the expression of RD29A exhibits a high level of responsiveness to drought stress, operating through both ABA-dependent and ABA-independent mechanisms[49]. RD29B, AREB1, and RAB18 are governed by an ABA-dependent regulatory pathway[10,49,50]. NCED3 is involved in ABA biosynthesis[51]. In the present study, the transcript levels of RD29A, RD29B, NCED3, AREB1, and RAB18 were significantly elevated in OE lnc000283 and OE lnc012465 plants compared to those in the WT plants (Fig. 6; Supplemental Fig. S5). The findings indicated that the drought tolerance imparted by OE lnc000283 and OE lnc012465 plants is contingent upon an ABA-dependent mechanism.

    Prior research has indicated that certain long non-coding RNAs (lncRNAs) can assume analogous roles across diverse biological contexts. For example, the lncRNA bra-miR156HG has been shown to modulate leaf morphology and flowering time in both B. campestris and Arabidopsis[52]. Heterogeneous expression of MSL-lncRNAs in Arabidopsis has been associated with the promotion of maleness, and similarly, it is implicated in the sexual lability observed in female poplars[53]. In the present study, lnc000283 and lnc012465 were induced by heat in Chinese cabbage, and their heterologous expression was found to confer heat tolerance in Arabidopsis. Additionally, sequences homologous to lnc000283 and lnc012465 were identified in Arabidopsis (Supplemental Fig. S1). The data suggest that these sequences may share a comparable function to that of heat-inducible sequences, potentially accounting for the conservation of lnc000283 and lnc012465'os functionality across various species.

    In conclusion, the functions of two heat-inducible lncRNAs, lnc000283 and lnc012465 have been elucidated. Transgenic Arabidopsis lines overexpressing these lncRNAs accumulated higher levels of anthocyanins and chlorophyll at a later stage of growth compared to the WT when grown on Petri dishes. Furthermore, under heat and drought stress conditions, these OE plants exhibited enhanced stress tolerance, with several genes related to the stress resistance pathway being significantly upregulated. Collectively, these findings offer novel insights for the development of new varieties with tolerance to multiple stresses.

    The authors confirm contribution to the paper as follows: study conception and supervision: Li N, Song X; experiment performing: Wang Y, Sun S; manuscript preparation and revision: Wang Y, Feng X, Li N. All authors reviewed the results and approved the final version of the manuscript.

    All data generated or analyzed during this study are included in this published article and its supplementary information files.

    This work was supported by the National Natural Science Foundation of China (32172583), the Natural Science Foundation of Hebei (C2021209019), the Natural Science Foundation for Distinguished Young Scholars of Hebei (C2022209010), and the Basic Research Program of Tangshan (22130231H).

  • The authors declare that they have no conflict of interest.

  • Supplemental Table S1 Source and basic information of RNA-Seq training dataset.
    Supplemental Table S2 Information of reported reference genes (RGs) in Populus.
    Supplemental Table S3 List of stress treatments for poplar 717.
    Supplemental Table S4 Source and basic information of RNA-Seq test datasets.
    Supplemental Table S5 Universal RT-qPCR primers for 12 candidate reference genes (RGs) and 6 reported RGs in Populus.
    Supplemental Dataset S1 The list of RNA-Seq data of poplar stem samples for the training dataset.
    Supplemental Dataset S2 Mapping statistics of RNA-Seq data of 298 samples in the training dataset. The read count of each sample and the mapping rate for different mapping cases. Samples with an alignment rate exceeding 70% or exhibiting low read counts are highlighted in red, and these were subsequently excluded from further analysis.
    Supplemental Dataset S3 Gene expression levels and coefficient of variance (CV) values for stably expressed genes of the training dataset. The read counts were normalized using the DESeq2 method. The stably expressed genes were defined by CV ≤ 0.3.
    Supplemental Dataset S4 The results of GO enrichment analysis for reported reference genes (RGs).
    Supplemental Dataset S5 Expression stability ranking of reference genes in different Populus cultivars and stress treatments.
    Supplemental Dataset S6 Photosynthetic characteristics of poplar leaves shade stress.
    Supplemental Dataset S7 Mapping statistics of RNA-Seq data for the test dataset. The read count of each sample and the mapping rate for different mapping cases.
    Supplemental Dataset S8 Gene expression levels and CV values of 18 reference genes in the test dataset.
  • [1]

    Deshpande D, Chhugani K, Chang Y, Karlsberg A, Loeffler C, et al. 2023. RNA-seq data science: From raw data to effective interpretation. Frontiers in Genetics 14:997383

    doi: 10.3389/fgene.2023.997383

    CrossRef   Google Scholar

    [2]

    Wang Z, Gerstein M, Snyder M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10:57−63

    doi: 10.1038/nrg2484

    CrossRef   Google Scholar

    [3]

    Ozsolak F, Milos PM. 2011. RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics 12:87−98

    doi: 10.1038/nrg2934

    CrossRef   Google Scholar

    [4]

    Geraci F, Saha I, Bianchini M. 2020. Editorial: RNA-Seq analysis: methods, applications and challenges. Frontiers in Genetics 11:220

    doi: 10.3389/fgene.2020.00220

    CrossRef   Google Scholar

    [5]

    Marguerat S, Bähler J. 2010. RNA-seq: from technology to biology. Cellular and Molecular Life Sciences 67:569−79

    doi: 10.1007/s00018-009-0180-6

    CrossRef   Google Scholar

    [6]

    Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5:621−28

    doi: 10.1038/nmeth.1226

    CrossRef   Google Scholar

    [7]

    Dillies MA, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, et al. 2013. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Briefings in Bioinformatics 14:671−83

    doi: 10.1093/bib/bbs046

    CrossRef   Google Scholar

    [8]

    Zhao Y, Li MC, Konaté MM, Chen L, Das B, et al. 2021. TPM, FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data from the NCI patient-derived models repository Journal of Translational Medicine 19:269

    doi: 10.1186/s12967-021-02936-w

    CrossRef   Google Scholar

    [9]

    Wang L, Xie W, Chen Y, Tang W, Yang J, et al. 2010. A dynamic gene expression atlas covering the entire life cycle of rice. The Plant Journal 61:752−66

    doi: 10.1111/j.1365-313X.2009.04100.x

    CrossRef   Google Scholar

    [10]

    Li G, Sun X, Zhu X, Wu B, Hong H, et al. 2023. Selection and validation of reference genes in virus-infected sweet potato plants. Genes 14:1477

    doi: 10.3390/genes14071477

    CrossRef   Google Scholar

    [11]

    Wang Q, Guo C, Yang S, Zhong Q, Tian J. 2023. Screening and verification of reference genes for analysis of gene expression in garlic (Allium sativum L.) under cold and drought stress. Plants 12:763

    doi: 10.3390/plants12040763

    CrossRef   Google Scholar

    [12]

    Ahmed U, Xie Q, Shi X, Zheng B. 2022. Development of reference genes for horticultural plants. Critical Reviews in Plant Sciences 41:190−208

    doi: 10.1080/07352689.2022.2084227

    CrossRef   Google Scholar

    [13]

    Panina Y, Germond A, Masui S, Watanabe TM. 2018. Validation of common housekeeping genes as reference for qPCR gene expression analysis during iPS reprogramming process. Scientific Reports 8:8716

    doi: 10.1038/s41598-018-26707-8

    CrossRef   Google Scholar

    [14]

    Bustin SA, Beaulieu JF, Huggett J, Jaggi R, Kibenge FSB, et al. 2010. MIQE précis: Practical implementation of minimum standard guidelines for fluorescence-based quantitative real-time PCR experiments. BMC Molecular Biology 11:74

    doi: 10.1186/1471-2199-11-74

    CrossRef   Google Scholar

    [15]

    Pfaffl MW, Tichopad A, Prgomet C, Neuvians TP. 2004. Determination of stable housekeeping genes, differentially regulated target genes and sample integrity: BestKeeper–Excel-based tool using pair-wise correlations. Biotechnology Letters 26:509−15

    doi: 10.1023/B:BILE.0000019559.84305.47

    CrossRef   Google Scholar

    [16]

    Huis R, Hawkins S, Neutelings G. 2010. Selection of reference genes for quantitative gene expression normalization in flax (Linum usitatissimum L.). BMC Plant Biology 10:71

    doi: 10.1186/1471-2229-10-71

    CrossRef   Google Scholar

    [17]

    Gutierrez L, Mauriat M, Guénin S, Pelloux J, Lefebvre JF, et al. 2008. The lack of a systematic validation of reference genes: a serious pitfall undervalued in reverse transcription-polymerase chain reaction (RT-PCR) analysis in plants. Plant Biotechnology Journal 6:609−18

    doi: 10.1111/j.1467-7652.2008.00346.x

    CrossRef   Google Scholar

    [18]

    Guénin S, Mauriat M, Pelloux J, Van Wuytswinkel O, Bellini C, et al. 2009. Normalization of qRT-PCR data: the necessity of adopting a systematic, experimental conditions-specific, validation of references. Journal of Experimental Botany 60:487−93

    doi: 10.1093/jxb/ern305

    CrossRef   Google Scholar

    [19]

    Thellin O, Zorzi W, Lakaye B, De Borman B, Coumans B, et al. 1999. Housekeeping genes as internal standards: use and limits. Journal of Biotechnology 75:291−95

    doi: 10.1016/S0168-1656(99)00163-7

    CrossRef   Google Scholar

    [20]

    Borges AF, Fonseca C, Ferreira RB, Lourenço AM, Monteiro S. 2014. Reference gene validation for quantitative RT-PCR during biotic and abiotic stresses in Vitis vinifera. PLoS One 9:e111399

    doi: 10.1371/journal.pone.0111399

    CrossRef   Google Scholar

    [21]

    Sun H, Li F, Ruan Q, Zhong X. 2016. Identification and validation of reference genes for quantitative real-time PCR studies in Hedera helix L. Plant Physiology and Biochemistry 108:286−94

    doi: 10.1016/j.plaphy.2016.07.022

    CrossRef   Google Scholar

    [22]

    Imai T, Ubi BE, Saito T, Moriguchi T. 2014. Evaluation of reference genes for accurate normalization of gene expression for real time-quantitative PCR in Pyrus pyrifolia using different tissue samples and seasonal conditions. PLoS One 9:e86492

    doi: 10.1371/journal.pone.0086492

    CrossRef   Google Scholar

    [23]

    Chen F, Song Y, Li X, Chen J, Mo L, et al. 2019. Genome sequences of horticultural plants: past, present, and future. Horticulture Research 6:112

    doi: 10.1038/s41438-019-0195-6

    CrossRef   Google Scholar

    [24]

    Zhao J, Yang F, Feng J, Wang Y, Lachenbruch B, et al. 2017. Genome-wide constitutively expressed gene analysis and new reference gene selection based on transcriptome data: a case study from poplar/canker disease interaction. Frontiers in Plant Science 8:1876

    doi: 10.3389/fpls.2017.01876

    CrossRef   Google Scholar

    [25]

    Chen Y, Luo B, Liu C, Zhang Z, Zhou C, et al. 2021. Identification of reliable reference genes for quantitative real-time PCR analysis of the Rhus chinensis Mill. leaf response to temperature changes. FEBS Open Bio 11:2763−73

    doi: 10.1002/2211-5463.13275

    CrossRef   Google Scholar

    [26]

    Brunner AM, Busov VB, Strauss SH. 2004. Poplar genome sequence: functional genomics in an ecologically dominant plant species. Trends in Plant Science 9:49−56

    doi: 10.1016/j.tplants.2003.11.006

    CrossRef   Google Scholar

    [27]

    Chao Q, Gao Z, Zhang D, Zhao B, Dong F, et al. 2019. The developmental dynamics of the Populus stem transcriptome. Plant Biotechnology Journal 17:206−19

    doi: 10.1111/pbi.12958

    CrossRef   Google Scholar

    [28]

    Wang J, Tian Y, Li J, Yang K, Xing S, et al. 2019. Transcriptome sequencing of active buds from Populus deltoides CL. and Populus × zhaiguanheibaiyang reveals phytohormones involved in branching. Genomics 111:700−9

    doi: 10.1016/j.ygeno.2018.04.007

    CrossRef   Google Scholar

    [29]

    Han X, An Y, Zhou Y, Liu C, Yin W, et al. 2020. Comparative transcriptome analyses define genes and gene modules differing between two Populus genotypes with contrasting stem growth rates. Biotechnology for Biofuels 13:139

    doi: 10.1186/s13068-020-01758-0

    CrossRef   Google Scholar

    [30]

    Shi R, Wang JP, Lin YC, Li Q, Sun Y, et al. 2017. Tissue and cell-type co-expression networks of transcription factors and wood component genes in Populus trichocarpa. Planta 245:927−38

    doi: 10.1007/s00425-016-2640-1

    CrossRef   Google Scholar

    [31]

    Yu L, Ma J, Niu Z, Bai X, Lei W, et al. 2017. Tissue-specific transcriptome analysis reveals multiple responses to salt stress in Populus euphratica seedlings. Genes 8:372

    doi: 10.3390/genes8120372

    CrossRef   Google Scholar

    [32]

    Sundell D, Street NR, Kumar M, Mellerowicz EJ, Kucukoglu M, et al. 2017. AspWood: high-spatial-resolution transcriptome profiles reveal uncharacterized modularity of wood formation in Populus tremula. The Plant Cell 29:1585−604

    doi: 10.1105/tpc.17.00153

    CrossRef   Google Scholar

    [33]

    Filichkin SA, Hamilton M, Dharmawardhana PD, Singh SK, Sullivan C, et al. 2018. Abiotic stresses modulate landscape of poplar transcriptome via alternative splicing, differential intron retention, and isoform ratio switching. Frontiers in Plant Science 9:5

    doi: 10.3389/fpls.2018.00005

    CrossRef   Google Scholar

    [34]

    Zinkgraf M, Gerttula S, Zhao S, Filkov V, Groover A. 2018. Transcriptional and temporal response of Populus stems to gravi-stimulation. Journal of Integrative Plant Biology 60:578−90

    doi: 10.1111/jipb.12645

    CrossRef   Google Scholar

    [35]

    Rogier O, Chateigner A, Amanzougarene S, Lesage-Descauses MC, Balzergue S, et al. 2018. Accuracy of RNAseq based SNP discovery and genotyping in Populus nigra. BMC Genomics 19:909

    doi: 10.1186/s12864-018-5239-z

    CrossRef   Google Scholar

    [36]

    Liao W, Ji L, Wang J, Chen Z, Ye M, et al. 2014. Identification of glutathione S-transferase genes responding to pathogen infestation in Populus tomentosa. Functional & Integrative Genomics 14:517−29

    doi: 10.1007/s10142-014-0379-y

    CrossRef   Google Scholar

    [37]

    Lu S, Li Q, Wei H, Chang MJ, Tunlaya-Anukit S, et al. 2013. Ptr-miR397a is a negative regulator of laccase genes affecting lignin content in Populus trichocarpa. Proceedings of the National Academy of Sciences of the United States of America 110:10848−53

    doi: 10.1073/pnas.1308936110

    CrossRef   Google Scholar

    [38]

    Felten J, Vahala J, Love J, Gorzsás A, Rüggeberg M, et al. 2018. Ethylene signaling induces gelatinous layers with typical features of tension wood in hybrid aspen. New Phytologist 218:999−1014

    doi: 10.1111/nph.15078

    CrossRef   Google Scholar

    [39]

    Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114−20

    doi: 10.1093/bioinformatics/btu170

    CrossRef   Google Scholar

    [40]

    Delhomme N, Mähler N, Schiffthaler B, Sundell D, Mannapperuma C, et al. 2014. Guidelines for RNA-Seq data analysis. EpiGeneSys Protocol 67:1−24

    Google Scholar

    [41]

    Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, et al. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15−21

    doi: 10.1093/bioinformatics/bts635

    CrossRef   Google Scholar

    [42]

    Liao Y, Smyth GK, Shi W. 2013. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research 41:e108

    doi: 10.1093/nar/gkt214

    CrossRef   Google Scholar

    [43]

    Robinson MD, McCarthy DJ, Smyth GK. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139−40

    doi: 10.1093/bioinformatics/btp616

    CrossRef   Google Scholar

    [44]

    Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15:550

    doi: 10.1186/s13059-014-0550-8

    CrossRef   Google Scholar

    [45]

    Li B, Dewey CN. 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323

    doi: 10.1186/1471-2105-12-323

    CrossRef   Google Scholar

    [46]

    Wang Y, Chen Y, Ding L, Zhang J, Wei J, et al. 2016. Validation of reference genes for gene expression by quantitative real-time RT-PCR in stem segments spanning primary to secondary growth in Populus tomentosa. PLoS One 11:e0157370

    doi: 10.1371/journal.pone.0157370

    CrossRef   Google Scholar

    [47]

    Yun T, Li J, Xu Y, Zhou A, Zong D, et al. 2019. Selection of reference genes for RT-qPCR analysis in the bark of Populus yunnanensis cuttings. Journal of Environmental Biology 40:584−91

    doi: 10.22438/jeb/40/3(SI)/Sp-24

    CrossRef   Google Scholar

    [48]

    Tang F, Chu L, Shu W, He X, Wang L, et al. 2019. Selection and validation of reference genes for quantitative expression analysis of miRNAs and mRNAs in Poplar. Plant Methods 15:35

    doi: 10.1186/s13007-019-0420-1

    CrossRef   Google Scholar

    [49]

    Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, et al. 2020. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Molecular Plant 13:1194−202

    doi: 10.1016/j.molp.2020.06.009

    CrossRef   Google Scholar

    [50]

    Kumar S, Stecher G, Li M, Knyaz C, Tamura K. 2018. Mega X: molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution 35:1547−49

    doi: 10.1093/molbev/msy096

    CrossRef   Google Scholar

    [51]

    Qu W, Zhou Y, Zhang Y, Lu Y, Wang X, et al. 2012. MFEprimer-2.0: a fast thermodynamics-based program for checking PCR primer specificity. Nucleic Acids Research 40:W205−W208

    doi: 10.1093/nar/gks552

    CrossRef   Google Scholar

    [52]

    Shi Q, Tian D, Wang J, Chen A, Miao Y, et al. 2023. Overexpression of miR390b promotes stem elongation and height growth in Populus. Horticulture Research 10:uhac258

    doi: 10.1093/hr/uhac258

    CrossRef   Google Scholar

    [53]

    Urbancsok J, Donev EN, Sivan P, van Zalen E, Barbut FR, et al. 2023. Flexure wood formation via growth reprogramming in hybrid aspen involves jasmonates and polyamines and transcriptional changes resembling tension wood development. New Phytologist 240:2312−34

    doi: 10.1111/nph.19307

    CrossRef   Google Scholar

    [54]

    Balasubramanian VK, Rivas-Ubach A, Winkler T, Mitchell H, Moran J, et al. 2023. Modulation of polar auxin transport identifies the molecular determinants of source-sink carbon relationships and sink strength in poplar. Tree Physiologytpad073

    doi: 10.1093/treephys/tpad073

    CrossRef   Google Scholar

    [55]

    Kong L, Song Q, Wei H, Wang Y, Lin M, et al. 2023. The AP2/ERF transcription factor PtoERF15 confers drought tolerance via JA-mediated signaling in Populus. New Phytologist 240:1848−67

    doi: 10.1111/nph.19251

    CrossRef   Google Scholar

    [56]

    Guo Y, Wang S, Yu K, Wang H, Xu H, et al. 2023. Manipulating microRNA miR408 enhances both biomass yield and saccharification efficiency in poplar. Nature Communications 14:4285

    doi: 10.1038/s41467-023-39930-3

    CrossRef   Google Scholar

    [57]

    Li M, Dong H, Li J, Dai X, Lin J, et al. 2023. PtrVCS2 regulates drought resistance by changing vessel morphology and stomatal closure in Populus trichocarpa. International Journal of Molecular Sciences 24:4458

    doi: 10.3390/ijms24054458

    CrossRef   Google Scholar

    [58]

    Ruijter JM, Ramakers C, Hoogaars WMH, Karlen Y, Bakker O, et al. 2009. Amplification efficiency: linking baseline and bias in the analysis of quantitative PCR data. Nucleic Acids Research 37:e45

    doi: 10.1093/nar/gkp045

    CrossRef   Google Scholar

    [59]

    Yang C, Yuan X, Zhang J, Sun W, Liu Z, et al. 2020. Comprehensive transcriptome analysis of reference genes for fruit development of Euscaphis konishii. PeerJ 8:e8474

    doi: 10.7717/peerj.8474

    CrossRef   Google Scholar

    [60]

    Liang L, He Z, Yu H, Wang E, Zhang X, et al. 2020. Selection and validation of reference genes for gene expression studies in Codonopsis pilosula based on transcriptome sequence data. Scientific Reports 10:1362

    doi: 10.1038/s41598-020-58328-5

    CrossRef   Google Scholar

    [61]

    Zhu L, Yang C, You Y, Liang W, Wang N, et al. 2019. Validation of reference genes for qRT-PCR analysis in peel and flesh of six apple cultivars (Malus domestica) at diverse stages of fruit development. Scientia Horticulturae 244:165−71

    doi: 10.1016/j.scienta.2018.09.033

    CrossRef   Google Scholar

    [62]

    Lyu S, Yu Y, Xu S, Cai W, Chen G, et al. 2020. Identification of appropriate reference genes for normalizing miRNA expression in citrus infected by Xanthomonas citri subsp. citri. Genes 11:17

    doi: 10.3390/genes11010017

    CrossRef   Google Scholar

    [63]

    Galimba K, Tosetti R, Loerich K, Micheal L, Pabhakar S, et al. 2020. Identification of early fruit development reference genes in plum. PLoS One 15:e0230920

    doi: 10.1371/journal.pone.0230920

    CrossRef   Google Scholar

    [64]

    Luo M, Gao Z, Li H, Li Q, Zhang C, et al. 2018. Selection of reference genes for miRNA qRT-PCR under abiotic stress in grapevine. Scientific Reports 8:4444

    doi: 10.1038/s41598-018-22743-6

    CrossRef   Google Scholar

  • Cite this article

    Xie Q, Ahmed U, Qi C, Du K, Luo J, et al. 2024. A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression. Forestry Research 4: e021 doi: 10.48130/forres-0024-0017
    Xie Q, Ahmed U, Qi C, Du K, Luo J, et al. 2024. A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression. Forestry Research 4: e021 doi: 10.48130/forres-0024-0017

Figures(7)  /  Tables(1)

Article Metrics

Article views(3613) PDF downloads(453)

ARTICLE   Open Access    

A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression

Forestry Research  4 Article number: e021  (2024)  |  Cite this article

Abstract: Real-time quantitative reverse transcription polymerase chain reaction (RT-qPCR) plays a crucial role in relative gene expression analysis, and accurate normalization relies on suitable reference genes (RGs). In this study, a pipeline for identifying candidate RGs from publicly available stem-related RNA-Seq data of different Populus species under various developmental and abiotic stress conditions is presented. DESeq2's median of ratios yielded the smallest coefficient of variance (CV) values in a total of 292 RNA-Seq samples and was therefore chosen as the method for sample normalization. A total of 541 stably expressed genes were retrieved based on the CV values with a cutoff of 0.3. Universal gene-specific primer pairs were designed based on the consensus sequences of the orthologous genes of each Populus RG candidate. The expression levels of 12 candidate RGs and six reported RGs in stems under different abiotic stress conditions or in different Populus species were assessed by RT-qPCR. The expression stability of selected genes was further evaluated using ΔCt, geNorm, NormFinder, and BestKeeper. All candidate RGs were stably expressed in different experiments and conditions in Populus. A test dataset containing 117 RNA-Seq samples was then used to confirm the expression stability, six candidate RGs and three reported RGs met the requirement of CV ≤ 0.3. In summary, this study was to propose a systematic and optimized protocol for the identification of constitutively and stably expressed genes based on RNA-Seq data, and Potri.001G349400 (CNOT2) was identified as the best candidate RG suitable for gene expression studies in poplar stems.

    • RNA sequencing (RNA-Seq) technology has emerged as a powerful tool in transcriptomic studies, offering high accuracy, sensitivity, and resolution[1]. Unlike traditional methods, RNA-Seq does not rely on prior knowledge of specific RNA molecules, making it effective for identifying unknown RNAs[2]. In RNA-Seq, total RNA or specific RNA fragments are isolated from samples representing different biological conditions or replicated under similar conditions. Recent advancements in next-generation sequencing (NGS) technology have made RNA-Seq the preferred approach for gene expression studies, thanks to its cost-effectiveness and technological improvements. This sequence-based method has revolutionized transcriptome research, enabling various applications, including the analysis of strand-specific expression, the detection of transcript fusions and alternative splicing isoforms, and the characterization of unknown cell types (through single-cell RNA sequencing)[3,4].

      RNA-Seq also enables better discovery of differentially expressed genes (DEGs) in various biological tissues and growth conditions, and may be able to provide high genome coverage even for genes with low expression levels[5]. RNA-Seq analysis measures transcript abundance by quantifying the fragments generated and the number of reads corresponding to each transcript. Since the total RNA content in a sample is unknown, data normalization is essential. Normalization methods include individual normalization based on the total number of reads and transcript lengths in each sample, resulting in Reads Per Kilobase of exon per Million mapped reads (RPKM) or Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values[6]. Alternatively, normalization can be achieved using different methods such as the Trimmed-Mean of M-values (TMM), DESeq2's median of ratios, Transcript Per Million (TPM) and Upper Quartile (UQ). Similar to RPKM, TPM doesn't utilize read information from all samples for normalization[7,8].

      In terms of gene expression analysis, RNA-Seq experiments primarily focus on identifying DEGs in specific biological conditions. However, apart from DEGs, there are numerous genes known as constitutively expressed genes (CEGs) that exhibit consistent expression across different cells or developmental stages, regardless of environmental conditions. For instance, a study on rice revealed that 22.7% of transcripts were expressed by CEGs in 39 different rice tissues[9]. Surprisingly, recent studies suggest that CEGs exhibit variable expression under different conditions and are used as reference genes (RGs)[10,11]. An ideal RG is one that remains unaffected by any experimental condition, shows stable expression, has no pseudogenes, and has a mid-range of quantification cycles or Cq values (Cq = 15–25) in real-time quantitative reverse transcription polymerase chain reaction (RT-qPCR)[12]. So technically speaking, every RG is CEG but not every CEG is RG[13].

      RT-qPCR is a reliable technique for transcript detection and measurement. To ensure accurate RT-qPCR results, it is crucial to select suitable RGs for normalization, following the guidelines outlined in the MIQE (Minimum Information for Publication of Quantitative Real-Time PCR Experiments) standards[14]. RGs should ideally display constant expression levels across various plant tissues, developmental stages, or physiological conditions. They should remain unaffected by external treatments and can be used without the need for stability validation[15]. However, studies focusing on RG validation and comprehensive exploration of transcriptome data in model plants have shown that the accuracy of endogenous controls can be significantly influenced by factors such as plant species, the specific cells/tissues/organs under investigation, and growth conditions[16]. Therefore, the selection of appropriate RGs is a critical step in normalizing RT-qPCR data, as an incorrect selection may lead to ambiguous or even incorrect results[17,18].

      Traditionally, RGs in many horticultural plants were selected from cellular housekeeping genes in the absence of large genomic datasets[12]. Examples include elongation factor-1α (EF-1A) in zucchini, actin in poplar, and ubiquitin conjugating-enzyme (UBC) in banana[12]. However, it has been observed that certain RGs exhibit significant expression variability among different conditions and tissue types[19]. Moreover, even within species, different RG candidates may show varying expression stability under specific experimental conditions or tissue types[20,21]. Therefore, it is crucial to validate the expression stability of candidate RGs before utilizing them for data normalization. Only those RG candidates that have undergone rigorous validation for expression stability can be considered reliable RGs for specific conditions or tissue types[22]. Several statistical tools, such as BestKeeper, geNorm, and NormFinder, are commonly employed to identify the most suitable candidate RGs under specific experimental settings[12]. These tools aid in selecting RGs that exhibit minimal expression variability across various conditions or tissues.

      Forest trees and horticultural plants encompass a diverse range of species, many of which are still in the early stages of genomic and functional genomic research. The emergence of RNA-Seq technology has significantly advanced gene annotation, expression profiling, and functional studies in these plants. The availability of extensive RNA-Seq data sets provides valuable resources for selecting suitable RGs across different species and under various experimental conditions[2325]. Poplar, a widely distributed tree species with significant applications in wood production, environmental protection, and urban greening, serves as an important model plant for woody species[26]. Extensive research has been conducted on poplar stem growth and development, leading to the generation of large-scale transcriptome data sets[2729]. In this study, the objective is to retrieve and assess the quality of publicly available transcriptome data sets related to poplar stem tissues. Subsequently, we aim to predict and evaluate the best candidate RGs for gene expression analysis in stems of various Populus species. The expression stability of these candidate RGs under different stress conditions will be validated using RT-qPCR, leading to the identification of reliable RGs specific to poplar stems. Furthermore, based on this case study, we intend to establish a comprehensive pipeline for the development of RGs for accurate gene expression normalization using RNA-Seq data.

    • The workflow for identifying universal RGs within a genus based on RNA-Seq data is shown in Fig. 1, using RNA-Seq data downloaded from public databases for poplar stem gene expression.

      Figure 1. 

      Workflow for selecting novel reference genes (RGs) based on RNA-Seq data.

    • To obtain the RNA-Seq data of poplar stems, a comprehensive literature search was conducted to identify relevant studies that reported RNA-Seq experiments on poplar stem samples. A dataset comprising 298 RNA-seq samples focusing on poplar stem research was compiled as the training dataset for this study (Supplemental Dataset 1). These samples were derived from 10 distinct experimental groups, each representing different biological aspects of poplar stem. The groups included experiments examining different internodes of poplar[27], various stem tissues such as phloem, xylem, fiber cells, and ray cells[30,31], the entire developmental stage spanning from phloem to xylem[32], abiotic stress treatments such as low temperature, heat, drought, and salt[33], gravity stimulation by placing plants horizontally[34], different growth areas[35], biological stress treatments[36], overexpression of miR397a[37], and transgenic lines treated with ACC (1-aminocyclopropane-1-carboxylic acid)[38] (Supplemental Table S1). Accession numbers associated with the identified studies were then obtained. Subsequently, the RNA-Seq data corresponding to these accession numbers were downloaded from the National Center for Biotechnology Information (NCBI: www.ncbi.nlm.nih.gov) or EMBL's European Bioinformatics Institute (EMBL-EBI: www.ebi.ac.uk) databases. Only RNA-Seq data from poplar stem samples with a minimum of two biological replicates were included in the analysis (Supplemental Table S1).

    • The RNA-Seq data used in this study were obtained from sub-databases of NCBI, including GenBank, GEO, and SRA, as well as from EMBL-EBI, which includes ArrayExpress and ENA (European Nucleotide Archive). For the SRA data, the sratoolkit software v2.9.2 (https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/) was employed to convert the SRA data into FASTQ format before subsequent analysis. The default parameters were used for single-end (SE) sequencing data during the conversion process, while additional parameters (--split-3) were added for paired-end (PE) sequencing data to appropriately handle the data. The data downloaded from the EMBL-EBI database were already in FASTQ format and could be directly used for subsequent analysis.

      The raw data quality was assessed using FastQC v0.10.1 (www.bioinformatics.babraham.ac.uk/projects/fastqc). Subsequently, Trimmomatic v0.32[39] was applied for the removal of adaptor contaminants and low-quality reads. The parameter settings for SE and PE sequencing data were as follows: TruSeq3-SE-2.fa:2:30:10 LEADING:3 SLIDINGWINDOW:5:20 MINLEN:50 and TruSeq3-PE-2.fa:2:30:10 LEADING:3 SLIDINGWINDOW:5:20 MINLEN:50, respectively[40]. These steps ensured that the RNA-Seq data used in the subsequent analysis was of high quality and free from artifacts and noise.

      Following the trimming and filtering of the raw data to obtain clean reads, the clean reads were mapped to the P. trichocarpa v3.0 genome using STAR v2.3.0e_r291[41]. The mapping process utilized the '--readFilesCommand zcat --quantMode TranscriptomeSAM' settings. By mapping the clean reads to the reference genome, the total number of reads in each sample and the number of reads that successfully mapped to the genome were determined. The alignment rate, representing the proportion of mapped reads to the total number of reads, was calculated for each sample. Samples with an alignment rate exceeding 70% were retained for subsequent analysis.

      Upon mapping the clean reads to the genome, a BAM file was generated for each sample. The BAM files contained information about the alignment positions of the reads, enabling the determination of the relative expression levels of all genes in each sample. This step provided the necessary data for further analysis and characterization of gene expression patterns in the poplar stem samples.

    • Four normalization methods, including FPKM/RPKM, TPM, TMM, and DESeq2's median of ratios, were utilized to assess and compare the impact of different normalization approaches on the gene expression analysis and to identify the most appropriate method for the RNA-Seq datasets. The BAM files generated from the previous mapping step were analyzed using the featureCounts program, a part of the Subread package[42], to obtain the unnormalized raw count (RC) of each gene in each sample. For TMM normalization of the RCs, the edgeR package[43] was utilized, while DESeq2 package[44] was employed for DESeq2's normalization. To perform FPKM/RPKM and TPM normalization, the RSEM (RNA-Seq by Expectation Maximization) software[45] was used, applying default parameters. These normalization methods were applied to the RCs, resulting in normalized expression values that can be compared across samples.

      To compare the effects of FPKM/RPKM, TPM, TMM, and DESeq2's median of ratios on the normalization of RC, a bar graph was generated to depict the expression levels of all genes from the lower quartile to the upper quartile values for each method. The relative expression level of each gene was calculated as log2(value+1) to mitigate any outliers resulting from zero values. The Coefficient of Variance (CV) was then calculated for the relative expression level of each gene across all samples. To investigate the variability and consistency of gene expression across RC, FPKM/RPKM, TPM, TMM, and DESeq2's median of ratios methods, a box plot was utilized to visualize the CV distribution of the relative expression levels for all genes.

      Moreover, the CV of the relative expression levels of the reported RGs were used to assess the accuracy and reliability of RC, FPKM/RPKM, TPM, TMM, and DESeq2's median of ratios methods. The CV provides a measure of the variation in gene expression across different samples and allows us to assess the stability and consistency of the RGs under each normalization method.

      The reported RGs in poplar stems had been extensively collected through a literature review[24,4648]. After removing duplicate genes, the relative expression levels of 30 reported RGs were evaluated in all samples (Supplemental Table S2). The CV values and the average CV values of their relative expression levels before and after normalization were then calculated using the four different normalization methods. Boxplots were generated to illustrate the distribution of CV values and the means of CV values.

      After normalizing the data using four normalization methods, the distribution of relative expression levels as well as the distribution of CV values for all Populus genes (42,950 genes) were comprehensively analyzed. The CV distribution and the mean values for the relative expression levels of the 30 reported RGs were also examined. The optimal normalization method was selected based on the above analyses. The relative expression data of all samples were then processed using the selected normalization method for further analysis.

    • Genes with a CV value of ≤ 0.3 in their relative expression levels across all samples, as determined using the optimal normalization method, were considered to have stable expression. These genes were classified as stably expressed genes in this study. The distribution densities of the relative expression levels for all genes, the stably expressed genes, and the 30 reported RGs were statistically analyzed. Available Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation information for all genes in the P. trichocarpa v3.0 genome were used. To further investigate the functional enrichment of the stably expressed genes and the 30 reported RGs, GO, and KEGG enrichment analysis were performed using TBtools software[49].

      Based on the CV values of the relative expression levels of genes across all samples, the top 12 genes with the smallest CV values were selected as candidate RGs for this study. To further confirm the expression stability of the 12 candidate RGs, RT-qPCR experiments were performed and the results were compared with the six reported RGs with the smallest CV in relative expression levels.

    • Hybrid poplar 717 (P. tremula × P. alba clone INRA 717-1B4, hereafter referred to as poplar 717) plants were cultured in 250 mL glass bottles containing 35 mL of 1/2 Murashige and Skoog (MS) medium (Phytotech, Lenexa, USA) supplemented with 0.75% (w/v) agar and 3% (w/v) sucrose. The bottles were placed in a growth chamber with a photoperiod of 16 h of light and 8 h of darkness at a temperature of 25 °C for a duration of two months. The tissue-cultured plants were then transplanted into pots 8.5 cm in diameter and 14 cm high, filled with a soil mixture of three parts peat, two parts base soil, and one part vermiculite. The plants were grown in a greenhouse at a temperature of 28 ± 5 °C for three months. Five Populus cultivars including P. simonii, P. deltoides, hybrid poplar clone 717 (P. tremula × P. alba L.), P. alba and P. adenopoda were grown in the field of the campus of Huazhong Agricultural University in Wuhan, China.

    • Poplar 717 plants of uniform size (100−110 cm) were selected and subjected to various abiotic stresses (Supplemental Table S3). Salt stress was induced by exposing plants to 150 mM NaCl solution for 2 d (D2), 4 d (D4), and 6 d (D6). The plants were watered with the NaCl solution every 24 h. The drought stress treatment adopted the process of simulating the gradual drying of soil. Three groups of plants were withheld water for 5 d (D5), 10 d (D10), and 15 d (D15) to achieve mild, moderate, and severe drought conditions, respectively. The control group (D0) was watered regularly according to evaporative needs. Relative soil moisture was measured using a FieldScoutTM TDR 300 Soil Moisture Meter (Spectrum Technologies, USA). For shade stress, plants were covered by black nylon mesh to achieve 70%−80% shading. The time points for shade stress were D0, D5, D10 and D15. The light intensity at each time point was measured using a TM-205 Auto Ranging Lux/Fc Light Meter (TENMARS, Taiwan). In the above stress treatments, the physiological parameters such as net photosynthetic rate (Pn), stomatal conductance (gs), intercellular CO2 (Ci), and transpiration rate (Tr) of the poplar 717 plants were measured using a LI-6400XT portable photosynthesis system (LI-COR, USA). Gravity stress was induced by bending the plants to investigate tension wood and opposite wood at D2 and D14 time points. For each time point of each treatment, the number of sample replicates was three.

    • Primers for RT-qPCR experiments were designed to ensure gene specificity within species and generality among species, and the following primer design steps were performed. The primary transcript sequences of the RGs from P. trichocarpa were used to perform blastn searches (-evalue: 1e-5, -num_alignments: 5) to obtain their homologous transcripts in P. trichocarpa, P. euphratica, P. deltoides, P. tremula and P. alba. To select the most homologous genes for each RG, a phylogenetic analysis of RG and its homologs was performed using MEGA X[50] with a maximum likelihood method and a bootstrap value of 1,000. To obtain the consensus nucleotide sequence of each RG, multiple sequence alignment was then performed between homologous transcript sequences from different Populus species. Based on the consensus sequence, universal primer pairs for each RG were designed using Primer3 software (https://primer3.ut.ee/) with the following criteria: primer length of 18−27 nt, primer melting temperature (Tm) of 59−63 °C, and product size of 100−300 bp. Finally, mfeprimer-3.2.0 software[51] was used to perform gene-specific detection of the designed primers in the aforementioned five genomes.

    • After the stress treatments, samples were taken from the 9th internode (IN9) at the branch top of each plant. Of the field-grown poplar trees, the young (IN3) and mature stems (IN9) were collected from each cultivar. The samples were immediately placed in liquid nitrogen and stored at −80 °C for RT-qPCR analysis. All plants were sampled with three biological replicates.

      The total RNA was extracted from the young and mature stem tissues of poplar plants using the 2×CTAB method as previously described[52]. The quality and quantity of the extracted RNA was assessed using a NanoDropTM 2000 spectrophotometer (Thermo Scientific, USA). To generate complementary DNA (cDNA), the PrimeScriptTM RT Reagent Kit with gDNA Eraser (TaKaRa, Dalian, China) was used following the manufacturer's instructions. Reactions of cDNA synthesis were performed in 20 µL volumes. The resulting cDNA was diluted 20-fold with ultrapure water and served as a template for RT-qPCR analysis.

      For RT-qPCR, ChamQTM SYBR® qPCR Master Mix (High ROX Premixed) from Vazyme (Nanjing, China) was used following the manufacturer's instructions. The RT-qPCR was conducted on a Light Cycler® 480 instrument II from Roche (USA) using white 384-well plates. The gene expression levels of the candidate RGs were evaluated.

      To determine the amplification efficiency (E) and R2 value for all primer pairs, a 6-point standard curve was generated by performing 20× dilutions of cDNA. The amplification efficiency (E) was calculated as E = 10^(−1/slope of the standard curve), and the R2 value was determined. All RT-qPCR experiments were performed with three biological replicates and two technical replicates.

    • The expression stability of the candidate RGs and reported RGs were evaluated using Ref-Finder (www.leonxie.com/referencegene.php), which incorporates the delta-Ct, BestKeeper, Normfinder, and geNorm algorithms. The evaluation was based on the average Ct values obtained from RT-qPCR experiments. By considering the comprehensive geometric stability ranking, candidate RGs with the most stable expression were identified and selected as novel RGs for poplar stems.

      To evaluate the expression stability of the candidate RGs and reported RGs, a test dataset of 117 RNA-Seq samples was compiled, focusing on poplar stem studies (Supplemental Table S4). The samples came from nine different experimental groups recently uploaded to the public databases and represent various biological aspects of poplar stems, including flexure treatment[53], auxin treatment[54] and transgenic studies on ERF15[55], miR408[56] and PtrVCS2[57]. The test dataset was downloaded and processed as previously described for the training dataset. After normalizing the data by using the DESeq2's median of ratios method, distributions, and CV values of relative expression levels were used to assess the expression stability of the 12 candidate RGs and six reported RGs in the test dataset.

    • Data statistics and analysis were performed using the R programming language. Specifically, R language packages, such as ggplot and ggtree, were utilized for data visualization and plotting to create high-quality graphics and visualize data in a clear and informative manner.

    • Through a comprehensive review of relevant literature on poplar transcriptome sequencing, a dataset comprising 298 RNA-seq samples focusing on poplar stem research was compiled for this study (Supplemental Dataset 1). These samples provide a comprehensive representation of poplar stem development, enabling the identification and evaluation of candidate RGs. The 298 samples were aligned to the Populus trichocarpa v3.0 genome, resulting in 294 samples with an alignment rate exceeding 70% (Supplemental Dataset 2). Among these, samples K5_20 and K5_28 had alignment rates higher than 80% (84.80% and 84.49% respectively), but were excluded from further analysis due to their low read counts[41]. Consequently, transcriptome data from a total of 292 samples were retained for subsequent analysis in this study.

      In addition, a collection of 37 reported RGs in poplar stems was compiled. After removing duplicate genes based on their gene ID numbers, a set of 30 reported RGs was obtained for further analysis (Supplemental Table S2).

    • By analyzing the upper and lower quartile values of gene expression levels in the 292 samples before and after normalization, and presenting the results in bar graphs, the differences in gene expression levels among samples were reduced after normalization with RPKM/FPKM, TPM, TMM, and DESeq2's median of ratios compared to without normalization (RC). Specifically, the distribution of gene expression levels between samples became more similar and less variable after DESeq2 and TMM normalization, indicating that these methods performed better than RPKM/FPKM and TPM (Fig. 2a). Furthermore, the CV of the relative expression of all genes in the 292 samples were examined. After normalization with RPKM/FPKM, TPM, TMM, and DESeq2's median of ratios, the median CV values were 1.52, 1.51, 1.47, and 1.48, respectively, with TMM and DESeq2's median of ratios exhibiting more favorable effects compared to RPKM/FPKM and TPM (Fig. 2b). To evaluate the effect of different normalization methods, the changes in the expression levels of the 30 reported RGs were analyzed. The median and mean CVs for unnormalized RC were 0.90 and 0.97, respectively. After normalization with RPKM/FPKM, TPM, TMM, and DESeq2's median of ratios, the median CVs were 0.52, 0.52, 0.53, and 0.51, respectively; and the mean CVs were 0.58, 0.56, 0.57, and 0.55, respectively (Fig. 2c).

      Figure 2. 

      Comparison of normalization methods. RC (raw counts) represents raw counts without normalization, while FPKM, TPM, TMM, and DESeq2 are results after normalization using four different methods. (a) Distribution of counts for all genes in each sample before and after normalization using 292 samples. Bar plots are employed instead of box plots, with the upper limit of the bar representing the upper quartile and the lower limit representing the lower quartile of the data. (b) Box plot demonstrates the coefficient of variance (CV) of gene expression levels for all genes before and after normalization, utilizing data from 292 samples. (c) Box plot depicting the CV for 30 previously reported reference genes (RGs) after different normalizations across all 292 samples, with the central white point indicating the mean CV.

      DESeq2's median of ratios and TMM showed similar effects in normalizing the expression levels of the 30 reported RGs in poplar. However, DESeq2's median of ratios yielded the smallest CV values and was therefore chosen as the method for sample normalization in the subsequent steps.

    • To screen out the RGs suitable for various poplar species and experimental treatments, the RNA-Seq data of the 292 samples were further analyzed. After RC was normalized by the DESeq2's median of ratios method, the mean, standard deviation (SD), and CV of the relative expression levels of each gene were calculated. A total of 541 stably expressed genes (CV ≤ 0.3) were screened, including four reported RGs (PP2A-2, EIF-A, CDPK and ATPase) (Fig. 3a, Supplemental Dataset 3). The log values of the relative expression levels' mean of the 541 stably expressed genes ranged from 5.91 to 13.03; and for the 30 reported RGs, the log values ranged from 7.17 to 16.09 (Fig. 3b). This indicates that the stably expressed genes identified in this study have similar expression levels to the reported RGs. GO and KEGG enrichment analyses were performed on the stably expressed genes and the reported RGs. The results indicate that the stably expressed genes identified in this study share functional similarities with the reported RGs (Supplemental Dataset 4).

      Figure 3. 

      Comparative analysis between stably expressed genes and reported reference genes (RGs). This figure presents a comparison of expression patterns between stably expressed genes and previously reported RGs. (a) Venn diagram illustrates the overlap between stably expressed genes (with CV ≤ 0.3) identified from 292 RNA-Seq samples and the reported RGs. (b) The density distribution of gene expression levels is depicted for all genes in the Populus genome, stably expressed genes, and reported RGs. The X-axis represents log2(counts + 1), and the Y-axis represents gene density.

    • From the 541 stably expressed genes, 12 genes (Table 1) with the smallest CV values among the 292 samples were selected as candidate RGs, based on the normalization method of DESeq2's median of ratios. Comparing the stability of expression levels between these candidate RGs and the reported RGs, the former had CV values ranging from 0.16 to 0.20, and the latter had CV values ranging from 0.23 to 1.53. The expression of the candidate RGs showed better stabilities (lower CV values) than the reported RGs in the 292 samples (Fig. 4). Furthermore, using the six reported RGs with the smallest CV values as comparisons, the expression stability of 12 candidate RGs was verified by RT-qPCR analysis.

      Table 1.  Information of candidate reference genes (RGs) with the smallest coefficient of variance (CV) values among the 292 RNA-Seq samples.

      Gene IDGene nameDescriptionCV
      Potri.001G349400CNOT2CCR4-NOT transcription complex subunit 20.168
      Potri.002G157500RH8Similar to DEAD/DEAH box helicase0.172
      Potri.005G110600VPS35Vacuolar protein sorting-associated protein 350.173
      Potri.002G197600FIP37.1Similar to ARABIDOPSIS THALIANA FKBP12 INTERACTING PROTEIN 370.175
      Potri.013G070001NAUDP-glucose pyrophosphorylase0.186
      Potri.001G197400Pt-UBP6.2Similar to UBIQUITIN-SPECIFIC PROTEASE 60.190
      Potri.006G116700U2AF1Splicing factor U2AF 35 kDa subunit0.193
      Potri.008G111700NAPredicted hydrolases of HD superfamily0.194
      Potri.011G084400CUL4Similar to hypothetical protein0.194
      Potri.004G064400NASimilar to ankyrin protein kinase0.198
      Potri.008G217300Pt-CUL1.4Similar to cullin-like protein10.199
      Potri.003G045700Pt-ATRLI1.2Similar to RNase L inhibitor protein; putative0.202
      NA: not available. The estimated CVs were based on the normalization method of DESeq2's median of ratios.

      Figure 4. 

      Assessment of candidate and reported reference genes (RGs) stability based on RNA-Seq data. Box plots are employed to display the distribution of expression levels for 12 candidate RGs and 30 reported RGs. The left side shows the expression distribution of candidate RGs, while the right side shows the expression distribution of reported RGs. The small circles on the red lines represent the coefficient of variance (CV) for the gene expression levels.

    • To design universal primers suitable for different poplar species, the workflow illustrated in Fig. 5a was developed. The primary transcript sequences of 18 genes in P. trichocarpa was used for blastn to retrieve their homologous genes from P. trichocarpa, P. euphratica, P. deltoides, and poplar 717. For example, when examining the primary transcript sequence of Potri.005G10600, a total of 13 homologous transcripts were retrieved from blastn searches of the five poplar genomes. A phylogenetic tree was constructed using the primary transcript sequences of Potri.005G10600 and it's 13 homologues to identify the former's orthologs (Fig. 5b). Multiple sequence alignments were performed between Potri.005G10600 and its orthologs to obtain consensus sequences for primer design (Fig. 5c). The specificity of the primers on the five poplar genomes was verified using the mfeprimer-3.2.0 software to obtain universal primers for these species. Finally, RT-qPCR primers for all 18 RGs, including 12 candidate RGs and six reported RGs, were designed according to the method described above (Supplemental Table S5).

      Figure 5. 

      RT-qPCR primer design. This figure illustrates the process of designing RT-qPCR primers using Potri.005G110600 as a representative example. (a) Flowchart outlining the primer design process is presented. (b) Phylogenetic tree showing the target gene and homologous genes in five different Populus species, with color-coded gene names representing the most homologous genes to the target gene. (c) Multiple sequence alignment of transcript sequences for the target gene Potri.005G110600 and its homologous genes. The alignment highlights conserved positions in blue and non-conserved positions in white. The region indicated by red arrows corresponds to the primer binding site.

      To verify the performance of the designed primer pairs, RNA was extracted from the stems of poplar 717 tissue culture seedlings, and cDNA was synthesized through reverse transcription of RNA. The gene specificity and amplification efficiency of the designed primer pairs were then evaluated using RT-qPCR. The correlation coefficient (R2 values) of the 18 primer pairs in RT-qPCR amplification were all above 0.99, indicating that they had high gene specificity. The amplification efficiencies varied from 91.3% to 109.7%, all of which were in the reliable section from 90% to 115%[58]. These results indicated that the designed primers were suitable for further gene expression analysis.

    • Poplar 717 plants were subjected to different stress treatments (Supplemental Table S3) to verify the expression stability of the 12 candidate RGs and six reported RGs by RT-qPCR. The expression stability of these genes was evaluated and ranked using four commonly used stability evaluation programs: delta-Ct, BestKeeper, Normfinder, and geNorm (Supplemental Dataset 5). The physiological parameters, including net photosynthetic rate (Pn), stomatal conductance (gs), intercellular CO2 (Ci), and transpiration rate (Tr) were used to monitor the stress response of poplar plants to various treatments (Supplemental Dataset 6). The comprehensive ranking was generated by computing the geometric mean (geomean) of the gene rankings. The distribution of Ct values of RT-qPCR for all RGs was concentrated closely near the mean, indicating consistent and reliable expression levels of the candidate RGs and the reported RGs (Fig. 6a). Four candidate RGs, Potri.001G349400, Potri.005G110600, Potri.011G084400 and Potri.008G111700, exhibited the highest stability and ranked at the top under all stress conditions (Fig. 6a). To further verify that the candidate RGs in this study are suitable for comparison among different poplar species, young and mature stems (IN3 and IN9) collected from five different Populus cultivars including P. simonii, P. deltoides, poplar 717, P. alba, and P. adenopoda grown on the campus of Huazhong Agricultural University were used. And the stability of 18 RGs in these stems was analyzed. Potri.001G349400, Potri.011G084400, Potri.005G110600 and Potri.002G157500 were among the best for stability in the stems of these cultivars (Fig. 6b). Based on the above two experiments, Potri.001G349400, Potri.005G110600, and Potri.011G084400 showed the best stability compared to other candidate RGs and reported RGs.

      Figure 6. 

      Comprehensive analysis of expression stability and Ct value distribution of reference genes (RGs). This figure presents a comprehensive evaluation of the expression stability of 18 RGs (12 candidate RGs and six reported RGs) across diverse stress treatments and Populus species using Ref-finder, along with the distribution of Ct values for these 18 RGs. (a) Box plots showing the Ct value distribution and integrated stability ranking of the 18 RGs under different stress treatment experiments (Supplemental Table S5). (b) Box plots depicting the Ct value distribution and integrated stability ranking of the 18 RGs in stems of five different Populus cultivars including P. simonii, P. deltoides, poplar 717, P. alba, and P. adenopoda. The small circles on red line indicate the integrated stability ranking, with smaller values indicating greater stability. All RT-qPCR experiments were performed with three biological replicates and two technical replicates. Each dot of the boxplot represents the average of these six replicates.

      To further verify the stability of the candidate RGs and reported RGs, a test dataset was constructed using the latest RNA-Seq data recently uploaded to the public databases, which included nine sets of poplar stem-related experiments with a total of 117 samples (Supplemental Table S4, Supplemental Dataset 7). Based on the DESeq2's median of ratios normalization method, distributions and CV values of relative expression levels of the 18 RGs in the test dataset were assessed. In total, there were six candidate RGs and three reported RGs with CV values below 0.3 (Fig. 7, Supplemental Dataset 8). Among them, Potri.001G349400, Potri.002G197600 and Potri.002G157500 can be considered as novel RGs for poplar stem gene expression analysis, and their gene annotations are respectively CNOT2 (CCR4-NOT transcription complex subunit 2), FIP37.1 (similar to Arabidopsis FKBP12-interacting protein 37), and RH8 (DEAD/DEAH box RNA helicase-like 8). Potri.001G349400/CNOT2 ranked first in the training dataset, RT-qPCR validation and test dataset, showing the best expression stability in stem-related samples among all poplar genes. The universal primers used to amplify Potri.001G349400/CNOT2 were: 5'-GGTCTGACGAGCCAGCAAAGGGT-3' (forward) and 5'-CGCCGCTGCTCCTTGTGGT-3' (reverse).

      Figure 7. 

      Evaluation of candidate and reported reference genes (RGs) stability based on RNA-Seq data. Box plots are employed to display the distribution of expression levels for 12 candidate RGs and six reported RGs. The small circles on the red line represents the coefficient of variance (CV) for the gene expression levels, where smaller CV values signify greater stability of gene expression. The three genes considered as novel RGs for poplar stem gene expression analysis are shown in purple.

    • The selection of suitable RGs for RT-qPCR normalization is crucial in gene expression analysis. While previous studies have often relied on literature-reported RGs, there is a growing recognition of the need for systematic selection of stable RGs. In this study, the aim was to identify constitutively and stably expressed genes that can serve as reliable internal controls for RT-qPCR experiments. To achieve this, both novel and reported RGs were retrieved from transcriptome datasets (Supplemental Table S1) of Populus at the genus level. These datasets provided a comprehensive view of gene expression under different developmental stages and abiotic stress conditions. Based on the CV values of gene expression, 12 novel candidate RGs and six reported RGs that demonstrated stable expression across these conditions were selected (Fig. 4). Furthermore, these novel RG candidate genes were evaluated using the latest poplar stem-related RNA-Seq data from public databases (Fig. 7) and three of them were suggested for poplar stem-related research, Potri.001G349400/CNOT2, Potri.002G197600/FIP37.1 and Potri.002G157500/RH8. By systematically selecting RGs based on their expression stability, the present study provides researchers with an important resource for gene expression analysis in the stems of Populus and potentially other plant species. These stable RGs can serve as reliable internal controls for RT-qPCR experiments, enabling more accurate and robust gene expression studies.

      Normalization methods play a crucial role in identifying internal RGs based on transcriptome data. In this study, the efficacy of several methods commonly used to identify RGs based on RNA-Seq data were evaluated. We compared four common normalization methods: FPKM, TPM, TMM, and DESeq2's median of ratios. Variation among the data is typically evaluated using metrics such as MFC (mean fold change), SD, and interquartile of expression level. CV value is a valuable metric for assessing the variability of gene expression levels relative to their mean, and it has been widely used to identify suitable RGs from transcriptome datasets[59,60]. In many studies, the top 1,000 expressed transcripts with the lowest CV values in contrasting environments are selected as stably expressed genes. The threshold for CV values is often set at < 16% or < 30%. Alternatively, some studies choose genes with a low CV of logarithmically transformed RPKM or transcript copy numbers, typically with a threshold of < 4%. The log2 (normalized values) and average CV values obtained by the five methods were evaluated. The results showed that the DESeq2's median of ratios and TMM normalization methods provided higher consistency compared to RC, FPKM, and TPM. DESeq2's median of ratios yielded the smallest CV values when tested with the reported RGs, and was therefore chosen as the normalization method for RNA-Seq data in this study. This approach ensures reliable and accurate normalization, resulting in more robust gene expression analysis.

      Comprehensive analysis of RNA-Seq data obtained from different Populus species under different conditions facilitated the identification of highly reliable and broadly applicable candidate RGs. Only four out of 30 previously reported RGs have been identified as stably expressed genes across diverse conditions and environments based on their CV values (Fig. 3a). In addition to CV values, the expression levels of transcripts are also important considerations when selecting RGs from transcriptomic data. It is generally preferred to choose transcripts with higher expression levels as internal controls for gene quantification due to reasons of efficiency and accuracy[24]. In this study, both the stably expressed genes and the reported RGs generally had high gene expression levels (Fig. 3b). KEGG enrichment showed that most of these genes were involved in various key biological processes related to gene expression (Supplemental Dataset 4), providing further insights into the functional relevance of these genes. Only four reported RGs, PP2A-2, EIF4A, CDPK, and ATPase, exhibited CV values below 30%, while all the stably expressed genes had CV values < 30% (Figs 3a & 4). This indicates that the newly selected candidate genes have great potential as RGs.

      In many previous studies, the selection of suitable RGs for gene expression studies are often limited to specific species and conditions[6164]. This study was aimed to evaluate the novel and reported RGs at the genus level of Populus. Therefore, it is crucial to design primers that not only exhibit gene specificity within a particular species but also are universal among different Populus species. To ensure primer versatility, we employed an integrative approach to design gene-specific primers to amplify orthologous RGs from multiple species (Fig. 5). Consensus sequences based on multiple sequence alignments of candidate RGs and their orthologs from different Populus species were used for primer design. Subsequently, the gene specificity of the primers in five Populus genomes were verified using mfeprimer-3.2.0. This approach allows for consistent and reliable gene expression analysis across multiple Populus species, facilitating broader applicability of universal primer pairs within the genus.

      By employing a combination of bioinformatics tools and analysis methods, a set of candidate RGs with high stability and applicability were successfully identified in various experimental conditions. The use of RefFinder, which integrates popular stability evaluation algorithms such as geNorm, NormFinder, BestKeeper, and delta-Ct method, allowed us to comprehensively evaluate the performance of the 18 tested genes in RT-qPCR analysis. All these RGs exhibited high stability in both young and mature stems of various Populus cultivars, as well as under different stress treatments in poplar 717 (Fig. 6). To further test the applicability of the candidate RGs in this study, their expression stability was evaluated using the latest RNA-Seq data from public databases (Supplemental Table S4, Fig. 7). The functional relevance of the best-performing RG, Potri.001G349400/CNOT2 needs to be highlighted. CNOT2 is a core member of the Carbon catabolite repression4 (Ccr4)–NOT complex, which plays a crucial role in transcriptional regulation. In addition, Potri.002G197600/FIP37.1 and Potri.002G157500/RH8 can also be considered as novel RGs for poplar stem gene expression analysis based on their good performance in different tests in this study. In the analysis of the training and the test datasets, only two reported RGs (EIF4A and ATPase) had CV values below 0.3 in both cases. Therefore, most reported RGs are not suitable for gene expression analysis in poplar stems. The novel RGs developed in this study have strong applicability for gene expression analysis in poplar stems, but whether they apply to other tissues requires analysis based on relevant RNA-Seq data. The species involved in RNA-Seq and RT-qPCR in this study are relatively common in current poplar research and are also widely represented in the genus Populus, which is very useful for increasing the applicability of these RGs. When studying gene expression in other Populus species not included in this study, performing some amount of RNA-Seq might help improve the applicability of these RGs.

      It is not cost-effective to perform RT-qPCR to test all reported RGs for each new experiment. As gene expression data based on RNA-Seq continue to accumulate from diverse experiments and species, the approach employed in this study holds promise for integrating and mining such data in a meaningful way. Identification of stable and reliable RGs will facilitate accurate and standardized gene expression analysis across different conditions and species.

    • In conclusion, the current study presents a novel methodology for selecting stably expressed genes from transcriptome expression data, which offers several advantages over traditional approaches. By combining expression stability analysis and RT-qPCR validation, we successfully identified a set of novel and stable RGs for Populus stems. This methodology is targeted, convenient, and efficient, enabling the identification of new and more reliable RGs. The analytical method we developed can be applied to other plant genera and will greatly help researchers compare gene expression patterns in different species within the same genus. Furthermore, the identified RGs for stem development in Populus will serve as valuable tools for studying gene expression dynamics during wood formation in plants.

    • The authors confirm contribution to the paper as follows: study conception and design: Zheng B, Shi X; bioinformatics analysis: Xie Q; conducting the experiments: Ahmed U; participating in the experiments: Qi C, Du K, Luo J, Wang P; draft manuscript preparation: Xie Q, Ahmed U; manuscript revision: Shi X, Zheng B, Xie Q, Ahmed U. All authors reviewed the results and approved the final version of the manuscript.

    • All data generated or analyzed during this study are included in this published article and its supplementary information files.

      • This study was supported by the National Natural Science Foundation of China (32171821, 32271834 and 32371908), Hubei Provincial Natural Science Foundation of China (2022CFB234), and the Fundamental Research Funds for the Central Universities (2662022YLYJ009). Thanks to Huazhong Agricultural University for providing PhD scholarships. Thanks to the Chinese Postdoctoral Science Foundation for providing the postdoctoral research funding (320554).

      • The authors declare that they have no conflict of interest.

      • # Authors contributed equally: Qi Xie, Umair Ahmed

      • Copyright: © 2024 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.
    Figure (7)  Table (1) References (64)
  • About this article
    Cite this article
    Xie Q, Ahmed U, Qi C, Du K, Luo J, et al. 2024. A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression. Forestry Research 4: e021 doi: 10.48130/forres-0024-0017
    Xie Q, Ahmed U, Qi C, Du K, Luo J, et al. 2024. A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression. Forestry Research 4: e021 doi: 10.48130/forres-0024-0017

Catalog

  • About this article

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return