ARTICLE   Open Access    

A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression

  • # Authors contributed equally: Qi Xie, Umair Ahmed

More Information
  • Received: 23 October 2023
    Revised: 07 April 2024
    Accepted: 07 May 2024
    Published online: 01 June 2024
    Forestry Research  4 Article number: e021 (2024)  |  Cite this article
  • Real-time quantitative reverse transcription polymerase chain reaction (RT-qPCR) plays a crucial role in relative gene expression analysis, and accurate normalization relies on suitable reference genes (RGs). In this study, a pipeline for identifying candidate RGs from publicly available stem-related RNA-Seq data of different Populus species under various developmental and abiotic stress conditions is presented. DESeq2's median of ratios yielded the smallest coefficient of variance (CV) values in a total of 292 RNA-Seq samples and was therefore chosen as the method for sample normalization. A total of 541 stably expressed genes were retrieved based on the CV values with a cutoff of 0.3. Universal gene-specific primer pairs were designed based on the consensus sequences of the orthologous genes of each Populus RG candidate. The expression levels of 12 candidate RGs and six reported RGs in stems under different abiotic stress conditions or in different Populus species were assessed by RT-qPCR. The expression stability of selected genes was further evaluated using ΔCt, geNorm, NormFinder, and BestKeeper. All candidate RGs were stably expressed in different experiments and conditions in Populus. A test dataset containing 117 RNA-Seq samples was then used to confirm the expression stability, six candidate RGs and three reported RGs met the requirement of CV ≤ 0.3. In summary, this study was to propose a systematic and optimized protocol for the identification of constitutively and stably expressed genes based on RNA-Seq data, and Potri.001G349400 (CNOT2) was identified as the best candidate RG suitable for gene expression studies in poplar stems.
  • Rice (Oryza sativa L.) is a world staple crop that feeds over half of the world's population[1,2]. In recent years, high-temperature events are becoming more frequent and intensive as a result of global warming, which can severely affect rice grain yield and quality[3,4]. During flowering and grain filling stages, high-temperature stress can result in a significant reduction in seed setting rate and influence amylose content, starch fine structure, functional properties and chalkiness degree of rice[57]. Transcriptome and proteome analysis in rice endosperm have also been used to demonstrate the differences in high-temperature environments at gene and protein expression levels[810]. In addition, as an important post-translational modification, protein phosphorylation has proven to be involved in the regulation of starch metabolism in response to high-temperature stress[11]. However, little is known about whether protein ubiquitination regulates seed development under high-temperature stress.

    Ubiquitination is another form of post-translational modification that plays key roles in diverse cellular processes[12]. Several reports have described the functions of ubiquitination in rice defense responses based on ubiquitome analysis. Liu et al.[13] investigated relationships between ubiquitination and salt-stress responses in rice seedlings using a gel-based shotgun proteomic analysis and revealed the potential important role of protein ubiquitination in salt tolerance in rice. Xie et al.[14] identified 861 peptides with ubiquitinated lysines in 464 proteins in rice leaf cells by combining highly sensitive immune affinity purification and high resolution liquid chromatography-tandem mass spectrometry (LC-MS/MS). These ubiquitinated proteins regulated a wide range of processes, including response to stress stimuli. A later study revealed the relationships between ubiquitination status and activation of rice defense responses, and generated an in-depth quantitative proteomic catalog of ubiquitination in rice seedlings after chitin and flg22 treatments, providing useful information for understanding how the ubiquitination system regulates the defense responses upon pathogen attack[15]. Although many studies have shown that ubiquitination plays improtant roles in the heat response of plant[16,17], there has been little systematic discussion on the ubiquitome of rice endosperm in the context of global climate change.

    In this study, we examine the high-temperature induced ubiquitination change in two rice varieties with different starch qualities, through a label-free quantitative ubiquitome analysis. This study provides a comprehensive view of the function of ubiquitination in high-temperature response of rice developing seed, which will shed new light on the improvement of rice grain quality under heat stress.

    Two indica rice varieties with different starch quality, 9311 and Guangluai4 (GLA4), were used as materials. 9311 is a heat-sensitive variety, which displays low amylose content with good starch quality; while GLA4 is known to be the parental variety of HT54, an indica breeding line with heat tolerance, and thus GLA4 is possibly heat tolerant, which shows high amylose content with poor starch quality[18,19]. Rice growth conditions, sample treatment and collection were conducted as previously described[11].

    Husk, pericarp and embryo were detached from immature rice grains on ice[20]. Rice endosperm was then ground with liquid nitrogen, and the cell powder was sonicated 10 rounds of 10 s sonication and 15 s off-sonication on ice in lysis buffer (6 M Guanidine hydrochloride, pH 7.8−8.0, 0.1% Protease Inhibitor Cocktail) using a high intensity ultrasonic processor. Following lysis, the suspension was centrifuged at 14,000 g for 40 min at 4 °C to remove the debris. The supernatant was collected, and the protein concentration was estimated using BCA assay (Pierce BCA Protein assay kit, Thermo Fisher Scientific, Waltham, MA, USA) before further analysis.

    The protein mixture was reduced by DTT with the final concentration of 10 mM at 37 °C for 1 h, alkylated by iodoacetamide with a final concentration of 50 mM at room temperature in the dark for 0.5 h, and digested by trypsin (1:50) at 37 °C for 16 h. Then the sample was diluted by adding trifluoroacetic acid (TFA) to the final concentration of 0.1%. The enzymatic peptides were desalted on a Sep-Pak C18 cartridge (Waters, Milford, MA, USA), concentrated by lyophilization and reconstituted in precooled IAP buffer (50 mM MOPS-NaOH PH 7.2, 10 mM Na2HPO4, 50 mM NaCl) for further analysis.

    The peptides solution was incubated with prewashed K-ε-GG antibody beads (PTMScan Ubiquitin Remnant Motif (K-ε-GG) Kit), and gently shaken at 4 °C for 1.5 h. The suspension was centrifuged at 2,000 g for 30 s, and the supernatant was removed. The Anti-K-ε-GG antibody beads were washed with IAP Buffer three times and with ddH2O three times. The peptides were eluted from the beads with 0.15% trifluoroacetic acid (TFA). Finally, the eluted fractions were combined and desalted with C18 Stage Tips.

    LC-MS/MS analysis were performed using the methods of Pang et al.[11]. Raw mass spectrometric data were analyzed with MaxQuant software (version 1.3.0.5) and were compared with the indica rice protein sequence database (Oryza sativa subsp. indica-ASM465v1). Parameters were set according to Pang et al.[11]. All measurements were obtained from three separate biological replicates.

    Quantification of the modified peptides was performed using the label-free quantification (LFQ) algorithm[11]. Differentially ubiquitinated sites (proteins) in response to high-temperature were identified by Student's t-test (p < 0.05, log2(fold-change) > 1) with at least two valid values in any condition or the ubiquitination sites that exhibited valid values in one condition (at least two of three replicates) and none in the other.

    Subcellular localization was performed using CELLO database (http://cello.life.nctu.edu.tw). Gene Ontology (GO) annotation proteome was derived from the AgriGO (http://bioinfo.cau.edu.cn/agriGO/). The differential metabolic profiles were visualized with MapMan software (version 3.6.0RC1). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation was performed by using KEGG Automatic Annotation Server (KAAS) software. A p-value of < 0.05 was used as the threshold of significant enrichment. SWISS-MODEL was used to generate the tertiary structure of GBSSI (SWISS-MODEL, http://swissmodel.expasy.org/). The figures were annotated with Adobe Illustrator (Adobe Systems, San Jose, CA, USA).

    To elucidate how high-temperature stress influences rice developing endosperm at the ubiquitination level, a label-free analysis was performed to quantify ubiquitome from two indica rice varieties under normal (9311-C and GLA4-C) and high-temperature conditions (9311-H and GLA4-H). The distribution of mass error of all the identified peptides was near zero and most of them (71.5%) were between -1 and 1 ppm, suggesting that the mass accuracy of the MS data fits the requirement (Fig. 1a). Meanwhile, the length of most peptides distributed between 8 and 42, ensuring that sample preparation reached standard conditions (Fig. 1b).

    Figure 1.  Characteristics of the ubiquitinated proteome of rice endosperm and QC validation of MS data. (a) Mass error distribution of all identified ubiquitinated peptides. (b) Peptide length distribution. (c) Frequency distribution of ubiquitinated proteins according to the number of ubiquitination sites identified.

    In all endosperm samples, a total of 437 ubiquitinated peptides were identified from 246 ubiquitinated proteins, covering 488 quantifiable ubiquitinated sites (Supplemental Table S1). Among the ubiquitinated proteins, 60.6% had only one ubiquitinated lysine site, and 18.7%, 8.1%, 5.3%, or 7.3% had two, three, four, or five and more ubiquitinated sites, respectively. In addition, four proteins (1.6%, BGIOSGA004052, BGIOSGA006533, BGIOSGA006780, BGIOSGA022241) were ubiquitinated at 10 or more lysine sites (Fig. 1c, Supplemental Table S1). The proteins BGIOSGA008317 had the most ubiquitination sites with the number of 16. It was noted that besides ubiquitin, two related ubiquitin-like proteins NEDD8 and ISG15 also contain C-terminal di-Gly motifs generated by trypsin cleavage, and the modifications of these three proteins cannot be distinguished by MS[21]. Here, the di-Gly-modified proteome therefore represents a composite of proteins modified by these three proteins. However, the sites from NEDD8 or ISG15 modifications were limited because they mediate only a few reactions in cells[21].

    To better understand the lysine ubiquitome changes in rice endosperm induced by high-temperature, we performed a Gene Ontology (GO) functional annotation analysis on all identified ubiquitinated proteins (Fig. 2a). In the biological process GO category, 'metabolic process' and 'cellular process' were mainly enriched, accounting for 75.1% and 74.1% of ubiquitinated proteins, respectively. In addition, 34.6% proteins were associated with 'response to stimulu', emphasizing the regulatory role of ubiquitination modification in response to high-temperature stress. From the cellular component perspective, ubiquitinated proteins were mainly associated with 'cellular anatomical entity' (99.4%), 'intracellular' (84.4%) and 'protein-containing complex' (29.9%). The molecular function category result suggested that these proteins were largely involved in 'binding' (62.7%), 'catalytic activity' (43.4%) and 'structural molecule activity' (16.5%). Furthermore, subcellular location annotation information indicated that 34.7%−39.4% proteins were located in the cytoplasm, and other were mostly located in the nucleus (23.5%−27.7%), plasma membrane (9.4%−11.4%), and chloroplast (9.6%−12.8%) (Fig. 2b). It is noteworthy that the ubiquitinated proteins located in the cytoplasm were decreased in high-temperature environments in both varieties.

    Figure 2.  Analysis of ubiquitinated proteins and motifs. (a) Gene ontology (GO) functional characterization of ubiquitinated proteins. (b) Subcellular localization of ubiquitinated proteins. From the inside out, the ring represents 9311-C, 9311-H, GLA4-C and GLA4-H, respectively. (c) Motif enrichment analysis of ubiquitinated proteins.

    The following two significantly enriched motifs from all of the identified ubiquitinated sites were identified using MoDL analysis: [A/S]xKub and Kubxx[E/Q/R/V]x[E/G/L/P/Q/R/Y], which covered 84 and 100 sequences, respectively (Fig. 2c). Further analysis showed that the conserved alanine (A) and glutamic acid (E) were included in upstream and downstream of the ubiquitinated lysine sites in rice endosperm. A similar phenomenon also occurred in rice leaf[14], wheat leaf[22], and petunia[23], indicating that alanine (A) and glutamic acid (E) were likely to be the specific amino acids in conserved ubiquitination motifs in plants. Additionally, serine (S) was enriched at the position -2 (upstream) of the ubiquitinated lysine, while various amino acids such as arginine (R), glutamic acid (E), glutamine (Q), valine (V) were found at positions +3 and +5 (downstream).

    To detect possible changes in rice endosperm ubiquitome attributable to high-temperature stress, we then performed LFQ analysis on all quantifiable ubiquitination sites within our dataset. As shown in Fig. 3a, more ubiquitinated proteins, peptides and sites were detected in the treatment groups (9311-H and GLA4-H), suggesting that exposure to high-temperature stress may increase the ubiquitination events in rice endosperm. Only 282 common ubiquitinated sites in 158 proteins were quantifiable for all sample groups due to reversible ubiquitination induced by high-temperature (Fig. 3b). Principal component analysis (PCA) showed that three repeats of each sample clustered together, and four groups were clearly separated (Fig. 3c). Furthermore, the differentially expression profiles of ubiquitination sites (proteins) in 9311 and GLA4 under high-temperature stress were depicted to further understand the possible changes (Fig. 3d). Where LFQ values were missing, the data were filtered to identify those ubiquitination sites with a consistent presence/absence expression pattern. These analyses yielded 89 ubiquitination sites that were only present in 9311-H and six that were only present in 9311-C (Fig. 3d, Supplemental Table S2). Similarly, 51 differentially expressed ubiquitination sites were present in GLA4-H and 13 ubiquitination sites only occurred in GLA4-C (Fig. 3d & Supplemental Table S3). Beyond that, a total of 113 and 50 significantly changed ubiquitination sites (p < 0.05, log2(fold-change) >1) were screened out in 9311 and GLA4, respectively (Fig. 3d, Supplemental Tables S4 & S5). For subsequent comparative analysis, the ubiquitination expression profiles with consistent presence/absence and ubiquitination sites with significant differences in statistical testing were combined and named as 9311-Up, 9311-Down, GLA4-Up, and GLA4-Down, respectively (Fig. 3d). The number of significantly up-regulated ubiquitination sites was far greater than down-regulated ubiquitination sites in both 9311 and GLA4 varieties. These findings indicated that high temperature not only induced the occurrence of ubiquitination sites, but also significantly upregulated the intensity of ubiquitination. Beyond that, the magnitude of the up-regulation in 9311 was higher than that in GLA4 (Fig. 3b & d), indicating that the ubiquitination modification of heat-sensitive varieties was more active than heat-resistant varieties in response to high-temperature stress.

    Figure 3.  A temperature regulated rice endosperm ubiquitome. (a) The number of ubiquitinated proteins, peptides and sites detected in four group samples. (b) Venn diagram of ubiquitination sites (proteins) detected in four group samples. (c) PCA based on ubiquitination intensity across all four sample groups with three biological repetitions. (d) Differentially expression profiles of ubiquitination sites (proteins) in 9311 and GLA4 under high-temperature stress. The expression profiles of selected ubiquitination sites (p < 0.05, log2(fold-change) >1) were normalized using the Z-score and presented in a heatmap.

    To further investigate the ubiquitination regulatory pattern under high temperature stress in two varieties, four groups with significantly regulated sites were analyzed. There were 37 ubiquitination sites showed the same regulatory trend in 9311 and GLA4, accounting for 17.8% and 32.5% of the total differentially expressed sites in 9311 and GLA4, respectively. Among them, 36 ubiquitination sites were upregulated and one site was downregulated (Fig. 4a). In addition, 159 upregulated ubiquitination sites and three downregulated sites were only present in 9311, while 53 upregulated sites and 15 downregulated sites were only present in GLA4. Moreover, nine ubiquitination sites showed opposite regulatory trends in 9311 and GLA4. A similar regulatory trend of ubiquitination proteins is shown in Fig. 4b. It is noted that some proteins had both upregulated and downregulated ubiquitination sites (Supplemental Tables S6 & S7), indicating that significant differences in ubiquitination were, to some extent, independent of protein abundance.

    Figure 4.  Comparison of differentially ubiquitinated sites and proteins in 9311 and GLA4 under high-temperature stress.

    To understand the function of ubiquitination in response to the high-temperature stress of rice endosperm, we conducted GO enrichment-based clustering analysis of the differentially ubiquitinated proteins in 9311 and GLA4 at high temperature, respectively (Fig. 5). In the biological process category of 9311, proteins were relatively enriched in the carbohydrate metabolic process, polysaccharide metabolic process, starch biosynthetic process, cellular macromolecule localization, protein localization, intracellular transport, and phosphorylation (Fig. 5). For the molecular function analysis, we found that the proteins related to kinase activity, nucleotidyltransferase activity, phosphotransferase activity, and nutrient reservoir activity were enriched (Fig. 5). The two principal cellular components were intrinsic component of membrane and integral component of membrane (Fig. 5). There was no significantly enriched GO term in the GLA4 group due to the dataset containing relatively few proteins, and thus, further enrichment analysis was conducted on the proteins that were common to both varieties. The results showed that proteins were over-represented in carbon metabolism, including starch biosynthesis and metabolism, glucan biosynthesis and metabolism, and polysaccharide biosynthesis and metabolism (Fig. 5), indicating the importance of carbohydrate synthesis and metabolism in the ubiquitination regulatory network.

    Figure 5.  Enrichment analysis of differentially expressed ubiquitinated proteins based on Gene Ontology (GO) terms.

    To identify pathways which were differentially ubiquitinated under high-temperature stress, the KEGG pathway-based clustering analysis was conducted. The results showed that the differentially ubiquitinated proteins in both 9311 and GLA4 were mostly abundant in the pathways of carbohydrate metabolism, starch and sucrose metabolism, folding, sorting and degradation, translation, ribosome, and protein processing in endoplasmic reticulum (Fig. 6a). In the 9311 group, the pathways of carbohydrate metabolism, starch and sucrose metabolism, glycosyltransferases, glycolysis, and energy metabolism were enriched in the differentially ubiquitinated proteins (Fig. 6b); while there was no significantly enriched KEGG pathway in the GLA4 group. We further found the proteins that were common to both varieties were only significantly enriched in the starch and sucrose metabolism pathways (p = 0.04). The ubiquitination proteins involved in the starch and sucrose metabolism mainly include: sucrose hydrolysis (SUS, FK, UGPase), and starch synthesis (AGPase, GBSSI, BEI, BEIIb, PUL, PHO1), which are discussed below.

    Figure 6.  KEGG classification and enrichment analysis of differentially ubiquitinated proteins. (a) Number of differentially ubiquitinated proteins based on KEGG classification in 9311 and GLA4. (b) KEGG enrichment analysis of differentially ubiquitinated proteins in 9311.

    Although many reports have described specific examples of ubiquitination in rice defense responses[13,15,16], our knowledge on global changes in the developing endosperm ubiquitome under high-temperature stress is still lacking. In this study, a label-free quantitative proteomic analysis of ubiquitination was applied to examine the high-temperature induced ubiquitination change of two indica rice varieties (9311 and GLA4) with distinct starch quality. We identified many new lysine modification sites on proteins involved in various pathways, highlighting the complexity of the ubiquitination-mediated regulatory system in high-temperature stress responses in rice.

    Heat shock proteins accumulate under various stresses and play important roles in plant defenses against abiotic stresses[24,25]. Research has shown that a number of heat shock proteins were prominent in the rice ubiquitome network, of which OsHSP71.1, and OsHSP82A showed increased ubiquitination levels under chitin and flg22 treatment[15]. Here, seven lysine residues on five heat shock proteins possessed ubiquitination modification in rice endosperm. Three sites (BGIOSGA011420-K78, BGIOSGA026764-K99, BGIOSGA029594-K106) showed significant up-regulation in 9311 under high-temperature stress, while in GLA4, the ubiquitination level of BGIOSGA011420-K78 was down-regulated. This differential ubiquitination of heat-tolerant and heat-sensitive varieties provided a basis for studying the regulation of post-translational modification of heat shock proteins under high-temperature stress, despite the regulatory role of those heat shock proteins being still unclear.

    Transcription factors (TFs) play an essential role in the regulation of gene expression. A total of three transcription factors were identified in the ubiquitination dataset, of which two were NAC family members. As one of the largest plant-specific TF families, NAC is involved in the responses to abiotic and biotic stresses[26]. The ubiquitination modification of K173 in BGIOSGA018048 was specifically expressed in the 9311-H group, which may affect the stress resistance level in high-temperature environments. In addition, two sites K148 and K149 of ERF TF family member BGIOSGA024035 was downregulated in GLA4 under high-temperature stress. This differential ubiquitination was likely to affect the expression of related genes regulated by this transcription factor.

    The results of GO and KEGG enrichment analysis of differential ubiquitination proteins indicated that the sucrose and starch metabolic pathway was largely affected by ubiquitination regulation under high-temperature stress (Figs 5 & 6). The ubiquitination sites involved in sucrose and starch metabolism are listed in Table 1. To assess how high-temperature stress affects the crucial pathway, the significantly differential ubiquitination sites in 9311 and GLA4 were displayed in the heatmap of specific proteins (Fig. 7).

    Table 1.  Ubiquitination sites related to sucrose and starch metabolism in rice endosperm.
    Gene nameAnnotationProtein entryModification site(s)
    SUS1Sucrose synthase 1BGIOSGA010570K172, K177
    SUS2Sucrose synthase 2BGIOSGA021739K160, K165, K176, K804
    SUS3Sucrose synthase 3BGIOSGA026140K172, K177, K541, K544, K588
    FKFructokinaseBGIOSGA027875K143
    UGPaseUDP-glucose pyrophosphorylaseBGIOSGA031231K27, K150, K303, K306
    AGPS1ADP-glucose pyrophosphorylase small subunit 1BGIOSGA030039K94, K464, K484
    AGPS2ADP-glucose pyrophosphorylase small subunit 2BGIOSGA027135K106, K132, K385, K403, K406, K476, K496
    AGPL2ADP-glucose pyrophosphorylase large subunit 2BGIOSGA004052K41, K78, K134, K191, K227, K254, K316, K338, K394, K396, K463, K508, K513
    AGPL3ADP-glucose pyrophosphorylase large subunit 3BGIOSGA017490K509
    GBSSIGranule bound starch synthase IBGIOSGA022241K130, K173, K177, K181, K192, K258, K371, K381, K385, K399, K462, K517, K530, K549, K571, K575
    BEIStarch branching enzyme IBGIOSGA020506K103, K108, K122
    BEIIbStarch branching enzyme IIbBGIOSGA006344K134
    PULStarch debranching enzyme:PullulanaseBGIOSGA015875K230, K330, K431, K736, K884
    PHO1Plastidial phosphorylaseBGIOSGA009780K277, K445, K941
     | Show Table
    DownLoad: CSV
    Figure 7.  Sucrose and starch pathway at the ubiquitination levels in rice endosperm under high-temperature stress.

    In cereal endosperm, sucrose is the substrate for the biosynthesis of starch. The formation of glucose 1-phosphate (G1P, used in starch synthesis, see below) from sucrose requires a series of enzymes[27]. Here, we found that sucrose synthase 1 (SUS1), SUS2, SUS3, fructokinase (FK), and UDP-glucose pyrophosphorylase (UGPase) were ubiquitinated among all sample groups (Table 1, Fig. 7). In the ubiquitome of seedling and leaf in japonica rice, ubiquitination sites have been found in SUS1, SUS2, UGPase, and FK, which were related to sucrose hydrolysis[14,15]. SUS catalyzed the process of cleaving sucrose into UDP-glucose (UDPG) and fructose. Two ubiquitination sites, K172 and K177, were identified in SUS1 in rice endosperm, which were also found in rice leaves[14]. A total of four ubiquitination sites were identified in SUS2, two of which were also reported in rice seedling and leaf, indicating the conservation of the lysine residues in different rice tissues. It was noted that all four ubiquitination sites in SUS2 were upregulated in high-temperature environments, although the regulated sites of 9311 and GLA4 were different. In 9311, the ubiquitination levels of K160, K174, and K804 were increased, while GLA4 was only upregulated in K176. The ubiquitination sites K541, K544, and K588 in SUS3 were screened from developing rice seeds for the first time. In addition, SUS3 had two completely overlapping sites K172 and K177 with SUS1, and it was difficult to determine which enzymes the two sites belonged to. The ubiquitination levels of SUS3-K541 and SUS3-K544 in 9311 significantly increased in high-temperature environments, while there was no significant difference in the ubiquitination level of SUS3 in GLA4. Overall, the ubiquitination sites of SUS in rice endosperm were located in the functional domain except for SUS2-K804, reflecting the importance of ubiquitination regulation in SUS.

    UGPase catalyses the conversion of glucose 1-phosphate and UTP into UDPG[28]. Research has shown that the mutation of UGPase gene lead to chalky endosperm[29]. As shown in Table 1, four ubiquitination sites K27, K150, K303, and K306 were identified in rice endosperm, which were completely inconsistent with the seven ubiquitination sites in rice seedlings and two in leaves[14,15], reflecting the tissue specificity. We speculated that the UGPase with different modification sites may play different regulatory roles in metabolic pathways in different tissues. Under high-temperature stress, the ubiquitination level of UGPase-K27 was 8.1-fold up-regulated. Liao et al.[10] demonstrated that the expression of UDPase was down-regulated in both heat-tolerant and heat-sensitive rice lines under high temperature conditions, which could reasonably explain the significant up-regulation of UGPase-K27 ubiquitination level. The ubiquitination site K143 of FK was also reported in seedling tissues[15].

    The AGPase reaction represents the first committed step of starch biosynthesis[27]. A total of 22 lysine ubiquitination sites were identified in four AGPase subunits (AGPL2, AGPL3, AGPS1, AGPS2). AGPL2 had 13 ubiquitination sites, of which six were located in NTP_transferase domain, including K254, K338, K191, K134, K227, and K316. High-temperature stress resulted in an increase in the ubiquitination level of K254 in both 9311 and GLA4, and significant upregulation of K508 and K513 in 9311, as well as K191, K227, and K316 in GLA4. In contrast, AGPL2-K394 were significantly downregulated in GLA4. AGPL3 contained one ubiquitination site K509, and the modification level of AGPL3-K509 was up-regulated in high-temperature environments in 9311. AGPS1 had one specific ubiquitination site K464 and another two sites K94 and K484 that completely overlapped with AGPS2-K106 and AGPS2-K496, respectively. The modification levels of K464 and K484 significantly increased in high-temperature environments in 9311, and K94 was significantly up-regulated in both varieties. There were seven ubiquitination sites in AGPS2 in rice endosperm, which were different with the sites found in rice leaves[14]. In addition to the two sites that overlapped with AGPS1, AGPS2 had another two ubiquitination sites (K406 and K132) that upregulated in high-temperature environments.

    Amylose content is one of the key determinants that strongly influence rice grain quality[30]. The biosynthesis of amylose requires the catalytic effect of granule-bound starch synthase I (GBSSI)[30,31]. Here, a total of 16 ubiquitination sites were identified in GBSSI (Table 1, Fig. 8a). Among these ubiquitination sites, six lysine residues (K130, K173, K177, K181, K192, K258) were located in glycosyltransferase 5 (GT5) domain, and three sites (K399, K462, K517) were located in GT1 domain (Fig. 6), indicating the important role of ubiquitination regulation of GBSSI. Under high-temperature stress, the ubiquitination levels of six sites (K130, K177, K399, K381, K385, K549) increased in two indica rice varieties, while one sites (K258) showed downregulation in 9311 (Fig. 8a). Numerous studies had described that the amylose content was reduced under high-temperature stress in rice[5,7], which might be due to the degradation of GBSSI proteins caused by the increased significantly up-regulated ubiquitination sites. These ubiquitination sites identified in rice GBSSI with significant differences under high-temperature stress were expected to become a new breakthrough point for the improvement of starch quality.

    Figure 8.  Structure of GBSSI. (a) Domain structure of GBSSI and ubiquitination sites with significant differences in response to high-temperature stress. (b) 3D model of GBSSI and the relationship between ubiquitination sites K462 and ADP, SO4 (salt bridge or hydrogen bond).

    To further determine the regulatory role of the ubiquitination sites in GBSSI, SWISS-MODEL was used to predict 3D structural model. As shown in Fig. 8b, GBSSI had three SO4 (sulfate ions) and one ADP ligand. These ligands interact with GBSSI through hydrogen bonds and salt bridges. Three sites, K447, R458, and K462, were associated with SO4 through salt bridges, while G100, N265, Q412, K462, and Q493 interact with the hydrogen bonds of ADP in GBSSI[32,33]. Based on this finding, it can be reasonably inferred that the K462 site with ubiquitination modification located in the GT1 domain played an important role in the interaction between GBSSI, SO4, and ADP. An in-depth investigation was necessary to gain a more comprehensive understanding of the regulatory function of ubiquitination modification at GBSSI-K462, although there was no significant difference in the ubiquitination level under high-temperature stress.

    Amylopectin, the major component of starch, is synthesized by the coordinated action of multiple enzymes including soluble starch synthase (SSs), starch branching enzyme (BEs), starch debranching enzyme (DBEs), and phosphorylases (PHOs or Phos) with ADPG as a substrate. In this study, ubiquitination sites were detected in BEs, DBEs, and Phos.

    BEs, covering two isoforms, BEI and BEII, are responsible for catalyzing the formation of α-1,6-glucosidic linkages of amylopectin[34]. There were three ubiquitination sites (K103, K108, and K122) identified in BEI (Fig. 9a). K122 was the first amino acid in the carbohydrate-binding module 48 (CBM48) domain. Sequence alignment analyses of BEs from eight plants revealed K122 was conserved among all plants' BEI (Fig. 9a), suggesting a high probability of the functional effects of ubiquitination modification at this site. In high-temperature environments, ubiquitination levels of K108 and K122 were significantly up-regulated in 9311, while no significantly regulated ubiquitination sites of BEI were observed in GLA4. Only one ubiquitination site, K134, was found in BEIIb (Fig. 9a). The ubiquitination levels showed a slightly upward trend with no significant differences in high-temperature environments in both varieties. These changes could be one of the reasons for increased gelatinization temperature and relative crystallinity of rice starch in response to high-temperature[5].

    Figure 9.  Domain structure of (a) BEs, (b) PUL and (c) Pho1 as well as their ubiquitination sites with significant differences in response to heat stress. Residues in red indicate the ubiquitination site. Non-ubiquitinated residues are shown in dark grey.

    DBEs consists of isoamylase (ISA) and pullulanase (PUL) with catalytic function for hydrolyzing α-1,6-glucosic linkages[35]. In the present study, we found that only PUL was ubiquitinated in rice endosperm (Fig. 9b). Among five ubiquitination sites (K230, K330, K432, K736, and K884) identified in PUL, K230 was located in the PULN2 domain, while K330 was in the CBM48 domain. Under high-temperature stress, K330 showed completely opposite regulatory trends in two cultivars. In addition, the ubiquitination level of K884, located in the DUF3372 domain, was significantly up-regulated in 9311. Previous study has reported that the expression of PUL was significantly up-regulated in 9311 under high-temperature stress, while GLA4 showed down-regulation in PUL abundance[11]. Consequently, there might be two possible functions of these ubiquitination sites. One possibility is that ubiquitination sites were unrelated to protein degradation; instead, they regulated the biosynthesis of amylopectin by affecting other functions of the protein. Secondly, ubiquitination sites were associated with protein degradation, and the levels of ubiquitination modification were based on protein abundance, resulting in a completely consistent regulation of ubiquitination modification and protein abundance under high-temperature stress.

    PHOs, including two types, Pho1/PHO1 and Pho2/PHO2, are responsible for the transfer of glucosyl units from Glc-1-P to the non-reducing end of a-1,4-linked glucan chains[36]. Pho1 is a temperature-dependent enzyme and considered crucial not only during the maturation of amylopectin but also in the initiation process of starch synthesis[37,38]. The three ubiquitination sites (K277, K445, K941) identified in Pho1 were located in two phosphorylase domains. We found that two sites, Pho1-K277 and Pho1-K445, were only ubiquitinated in high-temperature environments in 9311 and GLA4, respectively. Pang et al.[11] has demonstrated that the protein abundance of Pho1 decreased under high-temperature stress, especially in GLA4. Satoh et al.[38] reported that the functional activity of Pho1 was weakened under conditions of high temperature and its function might be complement by one or more other factors. Hence, these ubiquitination modifications that specifically occurred in high-temperature environments might be related to the degradation of Pho1 proteins.

    As a factory for protein synthesis in cells, the ribosome is an extremely crucial structure in the cell[39]. It has been proven that multiple ribosomal subunits were abundantly ubiquitinated in Arabidopsis and wheat[22]. In the present study, 57 ubiquitination sites involving 33 ribosome subunits were identified in 40S and 60S ribosome complexes in rice. Under high-temperature stress, the ubiquitination levels of some sites were significantly upregulated or downregulated, implying that ubiquitination of ribosomal proteins is likely to be an important regulatory mechanism in high-temperature response in rice endosperm. The results of GO and KEGG enrichment analysis indicated that the ribosome system was one of the most active systems for ubiquitination regulation under high-temperature stress. We speculated that the ubiquitin-proteasome system might be involved in the removal of subunits or entire ribosomes that were improperly folded in high-temperature environments. As shown in Fig. 10, the S10e, L18Ae, S27Ae, L9e, S3e, S28e, S20e, and S2e subunits were significantly up-regulated in 9311, while L13e subunits showed a completely opposite regulatory trend at the ubiquitination sites K81 and K88. In GLA4, the ubiquitination levels of S10e, S27Ae, L10Ae, L9e, S3e, S2e, and L4e showed a significant increase, while the ubiquitination level of L17e was significantly down-regulated under high-temperature stress. A total of seven ubiquitination sites involving S10e, S27Ae, L9e, S3e, and S2e subunits were jointly up-regulated in both two varieties. These sites might be related to the degradation of improperly folded ribosome subunits under high-temperature stress, while other ubiquitination sites with variety specificity might be associated with ribosomal function.

    Figure 10.  Ribosome system at the ubiquitination levels in rice endosperm under high-temperature stress. Grey shadings represent ubiquitinated proteins with no significant differences under heat stress. Red and blue shadings indicate up-regulated and down-regulated ubiquitinated proteins, respectively. Orange shading displays a combination of up- and down-regulated ubiquitinated sites in the same ubiquitinated protein.

    In conclusion, this study provides the first comprehensive view of the ubiquitome in rice developing endosperm, and demonstrated that ubiquitination has diverse functions in the high-temperature response of rice endosperm by modulating various cellular processes, especially the sucrose and starch metabolism. Comparative analysis of the temperature-induced ubiquitination status revealed some similarities and more interesting differences between 9311 and GLA4. These differences might be the reason for the different qualities formation of the two indica rice varieties, which could provide potential genetic resources for the improvement of the heat resistance in rice. Considering the diversity of ubiquitination modification, it is worthwhile to further validate and explore the function and regulatory mechanism of the key targets and key pathways. The findings provide valuable insights into the role of ubiquitination in response to high-temperature stress and lay a foundation for further functional analysis of lysine ubiquitination in rice.

    The authors confirm contribution to the paper as follows: study conception and design: Bao J, Pang Y; data collection: Pang Y; analysis and interpretation of results: Pang Y; draft manuscript preparation: Ying Y; Revised manuscript preparation: Ying Y, Pang Y, Bao J. All authors reviewed the results and approved the final version of the manuscript.

    The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

    This work was financially supported by the AgroST Project (NK2022050102) and Zhejiang Provincial Natural Science Foundation (LZ21C130003).

  • The authors declare that they have no conflict of interest.

  • Supplemental Table S1 Source and basic information of RNA-Seq training dataset.
    Supplemental Table S2 Information of reported reference genes (RGs) in Populus.
    Supplemental Table S3 List of stress treatments for poplar 717.
    Supplemental Table S4 Source and basic information of RNA-Seq test datasets.
    Supplemental Table S5 Universal RT-qPCR primers for 12 candidate reference genes (RGs) and 6 reported RGs in Populus.
    Supplemental Dataset S1 The list of RNA-Seq data of poplar stem samples for the training dataset.
    Supplemental Dataset S2 Mapping statistics of RNA-Seq data of 298 samples in the training dataset. The read count of each sample and the mapping rate for different mapping cases. Samples with an alignment rate exceeding 70% or exhibiting low read counts are highlighted in red, and these were subsequently excluded from further analysis.
    Supplemental Dataset S3 Gene expression levels and coefficient of variance (CV) values for stably expressed genes of the training dataset. The read counts were normalized using the DESeq2 method. The stably expressed genes were defined by CV ≤ 0.3.
    Supplemental Dataset S4 The results of GO enrichment analysis for reported reference genes (RGs).
    Supplemental Dataset S5 Expression stability ranking of reference genes in different Populus cultivars and stress treatments.
    Supplemental Dataset S6 Photosynthetic characteristics of poplar leaves shade stress.
    Supplemental Dataset S7 Mapping statistics of RNA-Seq data for the test dataset. The read count of each sample and the mapping rate for different mapping cases.
    Supplemental Dataset S8 Gene expression levels and CV values of 18 reference genes in the test dataset.
  • [1]

    Deshpande D, Chhugani K, Chang Y, Karlsberg A, Loeffler C, et al. 2023. RNA-seq data science: From raw data to effective interpretation. Frontiers in Genetics 14:997383

    doi: 10.3389/fgene.2023.997383

    CrossRef   Google Scholar

    [2]

    Wang Z, Gerstein M, Snyder M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10:57−63

    doi: 10.1038/nrg2484

    CrossRef   Google Scholar

    [3]

    Ozsolak F, Milos PM. 2011. RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics 12:87−98

    doi: 10.1038/nrg2934

    CrossRef   Google Scholar

    [4]

    Geraci F, Saha I, Bianchini M. 2020. Editorial: RNA-Seq analysis: methods, applications and challenges. Frontiers in Genetics 11:220

    doi: 10.3389/fgene.2020.00220

    CrossRef   Google Scholar

    [5]

    Marguerat S, Bähler J. 2010. RNA-seq: from technology to biology. Cellular and Molecular Life Sciences 67:569−79

    doi: 10.1007/s00018-009-0180-6

    CrossRef   Google Scholar

    [6]

    Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5:621−28

    doi: 10.1038/nmeth.1226

    CrossRef   Google Scholar

    [7]

    Dillies MA, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, et al. 2013. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Briefings in Bioinformatics 14:671−83

    doi: 10.1093/bib/bbs046

    CrossRef   Google Scholar

    [8]

    Zhao Y, Li MC, Konaté MM, Chen L, Das B, et al. 2021. TPM, FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data from the NCI patient-derived models repository Journal of Translational Medicine 19:269

    doi: 10.1186/s12967-021-02936-w

    CrossRef   Google Scholar

    [9]

    Wang L, Xie W, Chen Y, Tang W, Yang J, et al. 2010. A dynamic gene expression atlas covering the entire life cycle of rice. The Plant Journal 61:752−66

    doi: 10.1111/j.1365-313X.2009.04100.x

    CrossRef   Google Scholar

    [10]

    Li G, Sun X, Zhu X, Wu B, Hong H, et al. 2023. Selection and validation of reference genes in virus-infected sweet potato plants. Genes 14:1477

    doi: 10.3390/genes14071477

    CrossRef   Google Scholar

    [11]

    Wang Q, Guo C, Yang S, Zhong Q, Tian J. 2023. Screening and verification of reference genes for analysis of gene expression in garlic (Allium sativum L.) under cold and drought stress. Plants 12:763

    doi: 10.3390/plants12040763

    CrossRef   Google Scholar

    [12]

    Ahmed U, Xie Q, Shi X, Zheng B. 2022. Development of reference genes for horticultural plants. Critical Reviews in Plant Sciences 41:190−208

    doi: 10.1080/07352689.2022.2084227

    CrossRef   Google Scholar

    [13]

    Panina Y, Germond A, Masui S, Watanabe TM. 2018. Validation of common housekeeping genes as reference for qPCR gene expression analysis during iPS reprogramming process. Scientific Reports 8:8716

    doi: 10.1038/s41598-018-26707-8

    CrossRef   Google Scholar

    [14]

    Bustin SA, Beaulieu JF, Huggett J, Jaggi R, Kibenge FSB, et al. 2010. MIQE précis: Practical implementation of minimum standard guidelines for fluorescence-based quantitative real-time PCR experiments. BMC Molecular Biology 11:74

    doi: 10.1186/1471-2199-11-74

    CrossRef   Google Scholar

    [15]

    Pfaffl MW, Tichopad A, Prgomet C, Neuvians TP. 2004. Determination of stable housekeeping genes, differentially regulated target genes and sample integrity: BestKeeper–Excel-based tool using pair-wise correlations. Biotechnology Letters 26:509−15

    doi: 10.1023/B:BILE.0000019559.84305.47

    CrossRef   Google Scholar

    [16]

    Huis R, Hawkins S, Neutelings G. 2010. Selection of reference genes for quantitative gene expression normalization in flax (Linum usitatissimum L.). BMC Plant Biology 10:71

    doi: 10.1186/1471-2229-10-71

    CrossRef   Google Scholar

    [17]

    Gutierrez L, Mauriat M, Guénin S, Pelloux J, Lefebvre JF, et al. 2008. The lack of a systematic validation of reference genes: a serious pitfall undervalued in reverse transcription-polymerase chain reaction (RT-PCR) analysis in plants. Plant Biotechnology Journal 6:609−18

    doi: 10.1111/j.1467-7652.2008.00346.x

    CrossRef   Google Scholar

    [18]

    Guénin S, Mauriat M, Pelloux J, Van Wuytswinkel O, Bellini C, et al. 2009. Normalization of qRT-PCR data: the necessity of adopting a systematic, experimental conditions-specific, validation of references. Journal of Experimental Botany 60:487−93

    doi: 10.1093/jxb/ern305

    CrossRef   Google Scholar

    [19]

    Thellin O, Zorzi W, Lakaye B, De Borman B, Coumans B, et al. 1999. Housekeeping genes as internal standards: use and limits. Journal of Biotechnology 75:291−95

    doi: 10.1016/S0168-1656(99)00163-7

    CrossRef   Google Scholar

    [20]

    Borges AF, Fonseca C, Ferreira RB, Lourenço AM, Monteiro S. 2014. Reference gene validation for quantitative RT-PCR during biotic and abiotic stresses in Vitis vinifera. PLoS One 9:e111399

    doi: 10.1371/journal.pone.0111399

    CrossRef   Google Scholar

    [21]

    Sun H, Li F, Ruan Q, Zhong X. 2016. Identification and validation of reference genes for quantitative real-time PCR studies in Hedera helix L. Plant Physiology and Biochemistry 108:286−94

    doi: 10.1016/j.plaphy.2016.07.022

    CrossRef   Google Scholar

    [22]

    Imai T, Ubi BE, Saito T, Moriguchi T. 2014. Evaluation of reference genes for accurate normalization of gene expression for real time-quantitative PCR in Pyrus pyrifolia using different tissue samples and seasonal conditions. PLoS One 9:e86492

    doi: 10.1371/journal.pone.0086492

    CrossRef   Google Scholar

    [23]

    Chen F, Song Y, Li X, Chen J, Mo L, et al. 2019. Genome sequences of horticultural plants: past, present, and future. Horticulture Research 6:112

    doi: 10.1038/s41438-019-0195-6

    CrossRef   Google Scholar

    [24]

    Zhao J, Yang F, Feng J, Wang Y, Lachenbruch B, et al. 2017. Genome-wide constitutively expressed gene analysis and new reference gene selection based on transcriptome data: a case study from poplar/canker disease interaction. Frontiers in Plant Science 8:1876

    doi: 10.3389/fpls.2017.01876

    CrossRef   Google Scholar

    [25]

    Chen Y, Luo B, Liu C, Zhang Z, Zhou C, et al. 2021. Identification of reliable reference genes for quantitative real-time PCR analysis of the Rhus chinensis Mill. leaf response to temperature changes. FEBS Open Bio 11:2763−73

    doi: 10.1002/2211-5463.13275

    CrossRef   Google Scholar

    [26]

    Brunner AM, Busov VB, Strauss SH. 2004. Poplar genome sequence: functional genomics in an ecologically dominant plant species. Trends in Plant Science 9:49−56

    doi: 10.1016/j.tplants.2003.11.006

    CrossRef   Google Scholar

    [27]

    Chao Q, Gao Z, Zhang D, Zhao B, Dong F, et al. 2019. The developmental dynamics of the Populus stem transcriptome. Plant Biotechnology Journal 17:206−19

    doi: 10.1111/pbi.12958

    CrossRef   Google Scholar

    [28]

    Wang J, Tian Y, Li J, Yang K, Xing S, et al. 2019. Transcriptome sequencing of active buds from Populus deltoides CL. and Populus × zhaiguanheibaiyang reveals phytohormones involved in branching. Genomics 111:700−9

    doi: 10.1016/j.ygeno.2018.04.007

    CrossRef   Google Scholar

    [29]

    Han X, An Y, Zhou Y, Liu C, Yin W, et al. 2020. Comparative transcriptome analyses define genes and gene modules differing between two Populus genotypes with contrasting stem growth rates. Biotechnology for Biofuels 13:139

    doi: 10.1186/s13068-020-01758-0

    CrossRef   Google Scholar

    [30]

    Shi R, Wang JP, Lin YC, Li Q, Sun Y, et al. 2017. Tissue and cell-type co-expression networks of transcription factors and wood component genes in Populus trichocarpa. Planta 245:927−38

    doi: 10.1007/s00425-016-2640-1

    CrossRef   Google Scholar

    [31]

    Yu L, Ma J, Niu Z, Bai X, Lei W, et al. 2017. Tissue-specific transcriptome analysis reveals multiple responses to salt stress in Populus euphratica seedlings. Genes 8:372

    doi: 10.3390/genes8120372

    CrossRef   Google Scholar

    [32]

    Sundell D, Street NR, Kumar M, Mellerowicz EJ, Kucukoglu M, et al. 2017. AspWood: high-spatial-resolution transcriptome profiles reveal uncharacterized modularity of wood formation in Populus tremula. The Plant Cell 29:1585−604

    doi: 10.1105/tpc.17.00153

    CrossRef   Google Scholar

    [33]

    Filichkin SA, Hamilton M, Dharmawardhana PD, Singh SK, Sullivan C, et al. 2018. Abiotic stresses modulate landscape of poplar transcriptome via alternative splicing, differential intron retention, and isoform ratio switching. Frontiers in Plant Science 9:5

    doi: 10.3389/fpls.2018.00005

    CrossRef   Google Scholar

    [34]

    Zinkgraf M, Gerttula S, Zhao S, Filkov V, Groover A. 2018. Transcriptional and temporal response of Populus stems to gravi-stimulation. Journal of Integrative Plant Biology 60:578−90

    doi: 10.1111/jipb.12645

    CrossRef   Google Scholar

    [35]

    Rogier O, Chateigner A, Amanzougarene S, Lesage-Descauses MC, Balzergue S, et al. 2018. Accuracy of RNAseq based SNP discovery and genotyping in Populus nigra. BMC Genomics 19:909

    doi: 10.1186/s12864-018-5239-z

    CrossRef   Google Scholar

    [36]

    Liao W, Ji L, Wang J, Chen Z, Ye M, et al. 2014. Identification of glutathione S-transferase genes responding to pathogen infestation in Populus tomentosa. Functional & Integrative Genomics 14:517−29

    doi: 10.1007/s10142-014-0379-y

    CrossRef   Google Scholar

    [37]

    Lu S, Li Q, Wei H, Chang MJ, Tunlaya-Anukit S, et al. 2013. Ptr-miR397a is a negative regulator of laccase genes affecting lignin content in Populus trichocarpa. Proceedings of the National Academy of Sciences of the United States of America 110:10848−53

    doi: 10.1073/pnas.1308936110

    CrossRef   Google Scholar

    [38]

    Felten J, Vahala J, Love J, Gorzsás A, Rüggeberg M, et al. 2018. Ethylene signaling induces gelatinous layers with typical features of tension wood in hybrid aspen. New Phytologist 218:999−1014

    doi: 10.1111/nph.15078

    CrossRef   Google Scholar

    [39]

    Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114−20

    doi: 10.1093/bioinformatics/btu170

    CrossRef   Google Scholar

    [40]

    Delhomme N, Mähler N, Schiffthaler B, Sundell D, Mannapperuma C, et al. 2014. Guidelines for RNA-Seq data analysis. EpiGeneSys Protocol 67:1−24

    Google Scholar

    [41]

    Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, et al. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15−21

    doi: 10.1093/bioinformatics/bts635

    CrossRef   Google Scholar

    [42]

    Liao Y, Smyth GK, Shi W. 2013. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research 41:e108

    doi: 10.1093/nar/gkt214

    CrossRef   Google Scholar

    [43]

    Robinson MD, McCarthy DJ, Smyth GK. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139−40

    doi: 10.1093/bioinformatics/btp616

    CrossRef   Google Scholar

    [44]

    Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15:550

    doi: 10.1186/s13059-014-0550-8

    CrossRef   Google Scholar

    [45]

    Li B, Dewey CN. 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323

    doi: 10.1186/1471-2105-12-323

    CrossRef   Google Scholar

    [46]

    Wang Y, Chen Y, Ding L, Zhang J, Wei J, et al. 2016. Validation of reference genes for gene expression by quantitative real-time RT-PCR in stem segments spanning primary to secondary growth in Populus tomentosa. PLoS One 11:e0157370

    doi: 10.1371/journal.pone.0157370

    CrossRef   Google Scholar

    [47]

    Yun T, Li J, Xu Y, Zhou A, Zong D, et al. 2019. Selection of reference genes for RT-qPCR analysis in the bark of Populus yunnanensis cuttings. Journal of Environmental Biology 40:584−91

    doi: 10.22438/jeb/40/3(SI)/Sp-24

    CrossRef   Google Scholar

    [48]

    Tang F, Chu L, Shu W, He X, Wang L, et al. 2019. Selection and validation of reference genes for quantitative expression analysis of miRNAs and mRNAs in Poplar. Plant Methods 15:35

    doi: 10.1186/s13007-019-0420-1

    CrossRef   Google Scholar

    [49]

    Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, et al. 2020. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Molecular Plant 13:1194−202

    doi: 10.1016/j.molp.2020.06.009

    CrossRef   Google Scholar

    [50]

    Kumar S, Stecher G, Li M, Knyaz C, Tamura K. 2018. Mega X: molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution 35:1547−49

    doi: 10.1093/molbev/msy096

    CrossRef   Google Scholar

    [51]

    Qu W, Zhou Y, Zhang Y, Lu Y, Wang X, et al. 2012. MFEprimer-2.0: a fast thermodynamics-based program for checking PCR primer specificity. Nucleic Acids Research 40:W205−W208

    doi: 10.1093/nar/gks552

    CrossRef   Google Scholar

    [52]

    Shi Q, Tian D, Wang J, Chen A, Miao Y, et al. 2023. Overexpression of miR390b promotes stem elongation and height growth in Populus. Horticulture Research 10:uhac258

    doi: 10.1093/hr/uhac258

    CrossRef   Google Scholar

    [53]

    Urbancsok J, Donev EN, Sivan P, van Zalen E, Barbut FR, et al. 2023. Flexure wood formation via growth reprogramming in hybrid aspen involves jasmonates and polyamines and transcriptional changes resembling tension wood development. New Phytologist 240:2312−34

    doi: 10.1111/nph.19307

    CrossRef   Google Scholar

    [54]

    Balasubramanian VK, Rivas-Ubach A, Winkler T, Mitchell H, Moran J, et al. 2023. Modulation of polar auxin transport identifies the molecular determinants of source-sink carbon relationships and sink strength in poplar. Tree Physiologytpad073

    doi: 10.1093/treephys/tpad073

    CrossRef   Google Scholar

    [55]

    Kong L, Song Q, Wei H, Wang Y, Lin M, et al. 2023. The AP2/ERF transcription factor PtoERF15 confers drought tolerance via JA-mediated signaling in Populus. New Phytologist 240:1848−67

    doi: 10.1111/nph.19251

    CrossRef   Google Scholar

    [56]

    Guo Y, Wang S, Yu K, Wang H, Xu H, et al. 2023. Manipulating microRNA miR408 enhances both biomass yield and saccharification efficiency in poplar. Nature Communications 14:4285

    doi: 10.1038/s41467-023-39930-3

    CrossRef   Google Scholar

    [57]

    Li M, Dong H, Li J, Dai X, Lin J, et al. 2023. PtrVCS2 regulates drought resistance by changing vessel morphology and stomatal closure in Populus trichocarpa. International Journal of Molecular Sciences 24:4458

    doi: 10.3390/ijms24054458

    CrossRef   Google Scholar

    [58]

    Ruijter JM, Ramakers C, Hoogaars WMH, Karlen Y, Bakker O, et al. 2009. Amplification efficiency: linking baseline and bias in the analysis of quantitative PCR data. Nucleic Acids Research 37:e45

    doi: 10.1093/nar/gkp045

    CrossRef   Google Scholar

    [59]

    Yang C, Yuan X, Zhang J, Sun W, Liu Z, et al. 2020. Comprehensive transcriptome analysis of reference genes for fruit development of Euscaphis konishii. PeerJ 8:e8474

    doi: 10.7717/peerj.8474

    CrossRef   Google Scholar

    [60]

    Liang L, He Z, Yu H, Wang E, Zhang X, et al. 2020. Selection and validation of reference genes for gene expression studies in Codonopsis pilosula based on transcriptome sequence data. Scientific Reports 10:1362

    doi: 10.1038/s41598-020-58328-5

    CrossRef   Google Scholar

    [61]

    Zhu L, Yang C, You Y, Liang W, Wang N, et al. 2019. Validation of reference genes for qRT-PCR analysis in peel and flesh of six apple cultivars (Malus domestica) at diverse stages of fruit development. Scientia Horticulturae 244:165−71

    doi: 10.1016/j.scienta.2018.09.033

    CrossRef   Google Scholar

    [62]

    Lyu S, Yu Y, Xu S, Cai W, Chen G, et al. 2020. Identification of appropriate reference genes for normalizing miRNA expression in citrus infected by Xanthomonas citri subsp. citri. Genes 11:17

    doi: 10.3390/genes11010017

    CrossRef   Google Scholar

    [63]

    Galimba K, Tosetti R, Loerich K, Micheal L, Pabhakar S, et al. 2020. Identification of early fruit development reference genes in plum. PLoS One 15:e0230920

    doi: 10.1371/journal.pone.0230920

    CrossRef   Google Scholar

    [64]

    Luo M, Gao Z, Li H, Li Q, Zhang C, et al. 2018. Selection of reference genes for miRNA qRT-PCR under abiotic stress in grapevine. Scientific Reports 8:4444

    doi: 10.1038/s41598-018-22743-6

    CrossRef   Google Scholar

  • Cite this article

    Xie Q, Ahmed U, Qi C, Du K, Luo J, et al. 2024. A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression. Forestry Research 4: e021 doi: 10.48130/forres-0024-0017
    Xie Q, Ahmed U, Qi C, Du K, Luo J, et al. 2024. A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression. Forestry Research 4: e021 doi: 10.48130/forres-0024-0017

Figures(7)  /  Tables(1)

Article Metrics

Article views(3608) PDF downloads(453)

ARTICLE   Open Access    

A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression

Forestry Research  4 Article number: e021  (2024)  |  Cite this article

Abstract: Real-time quantitative reverse transcription polymerase chain reaction (RT-qPCR) plays a crucial role in relative gene expression analysis, and accurate normalization relies on suitable reference genes (RGs). In this study, a pipeline for identifying candidate RGs from publicly available stem-related RNA-Seq data of different Populus species under various developmental and abiotic stress conditions is presented. DESeq2's median of ratios yielded the smallest coefficient of variance (CV) values in a total of 292 RNA-Seq samples and was therefore chosen as the method for sample normalization. A total of 541 stably expressed genes were retrieved based on the CV values with a cutoff of 0.3. Universal gene-specific primer pairs were designed based on the consensus sequences of the orthologous genes of each Populus RG candidate. The expression levels of 12 candidate RGs and six reported RGs in stems under different abiotic stress conditions or in different Populus species were assessed by RT-qPCR. The expression stability of selected genes was further evaluated using ΔCt, geNorm, NormFinder, and BestKeeper. All candidate RGs were stably expressed in different experiments and conditions in Populus. A test dataset containing 117 RNA-Seq samples was then used to confirm the expression stability, six candidate RGs and three reported RGs met the requirement of CV ≤ 0.3. In summary, this study was to propose a systematic and optimized protocol for the identification of constitutively and stably expressed genes based on RNA-Seq data, and Potri.001G349400 (CNOT2) was identified as the best candidate RG suitable for gene expression studies in poplar stems.

    • RNA sequencing (RNA-Seq) technology has emerged as a powerful tool in transcriptomic studies, offering high accuracy, sensitivity, and resolution[1]. Unlike traditional methods, RNA-Seq does not rely on prior knowledge of specific RNA molecules, making it effective for identifying unknown RNAs[2]. In RNA-Seq, total RNA or specific RNA fragments are isolated from samples representing different biological conditions or replicated under similar conditions. Recent advancements in next-generation sequencing (NGS) technology have made RNA-Seq the preferred approach for gene expression studies, thanks to its cost-effectiveness and technological improvements. This sequence-based method has revolutionized transcriptome research, enabling various applications, including the analysis of strand-specific expression, the detection of transcript fusions and alternative splicing isoforms, and the characterization of unknown cell types (through single-cell RNA sequencing)[3,4].

      RNA-Seq also enables better discovery of differentially expressed genes (DEGs) in various biological tissues and growth conditions, and may be able to provide high genome coverage even for genes with low expression levels[5]. RNA-Seq analysis measures transcript abundance by quantifying the fragments generated and the number of reads corresponding to each transcript. Since the total RNA content in a sample is unknown, data normalization is essential. Normalization methods include individual normalization based on the total number of reads and transcript lengths in each sample, resulting in Reads Per Kilobase of exon per Million mapped reads (RPKM) or Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values[6]. Alternatively, normalization can be achieved using different methods such as the Trimmed-Mean of M-values (TMM), DESeq2's median of ratios, Transcript Per Million (TPM) and Upper Quartile (UQ). Similar to RPKM, TPM doesn't utilize read information from all samples for normalization[7,8].

      In terms of gene expression analysis, RNA-Seq experiments primarily focus on identifying DEGs in specific biological conditions. However, apart from DEGs, there are numerous genes known as constitutively expressed genes (CEGs) that exhibit consistent expression across different cells or developmental stages, regardless of environmental conditions. For instance, a study on rice revealed that 22.7% of transcripts were expressed by CEGs in 39 different rice tissues[9]. Surprisingly, recent studies suggest that CEGs exhibit variable expression under different conditions and are used as reference genes (RGs)[10,11]. An ideal RG is one that remains unaffected by any experimental condition, shows stable expression, has no pseudogenes, and has a mid-range of quantification cycles or Cq values (Cq = 15–25) in real-time quantitative reverse transcription polymerase chain reaction (RT-qPCR)[12]. So technically speaking, every RG is CEG but not every CEG is RG[13].

      RT-qPCR is a reliable technique for transcript detection and measurement. To ensure accurate RT-qPCR results, it is crucial to select suitable RGs for normalization, following the guidelines outlined in the MIQE (Minimum Information for Publication of Quantitative Real-Time PCR Experiments) standards[14]. RGs should ideally display constant expression levels across various plant tissues, developmental stages, or physiological conditions. They should remain unaffected by external treatments and can be used without the need for stability validation[15]. However, studies focusing on RG validation and comprehensive exploration of transcriptome data in model plants have shown that the accuracy of endogenous controls can be significantly influenced by factors such as plant species, the specific cells/tissues/organs under investigation, and growth conditions[16]. Therefore, the selection of appropriate RGs is a critical step in normalizing RT-qPCR data, as an incorrect selection may lead to ambiguous or even incorrect results[17,18].

      Traditionally, RGs in many horticultural plants were selected from cellular housekeeping genes in the absence of large genomic datasets[12]. Examples include elongation factor-1α (EF-1A) in zucchini, actin in poplar, and ubiquitin conjugating-enzyme (UBC) in banana[12]. However, it has been observed that certain RGs exhibit significant expression variability among different conditions and tissue types[19]. Moreover, even within species, different RG candidates may show varying expression stability under specific experimental conditions or tissue types[20,21]. Therefore, it is crucial to validate the expression stability of candidate RGs before utilizing them for data normalization. Only those RG candidates that have undergone rigorous validation for expression stability can be considered reliable RGs for specific conditions or tissue types[22]. Several statistical tools, such as BestKeeper, geNorm, and NormFinder, are commonly employed to identify the most suitable candidate RGs under specific experimental settings[12]. These tools aid in selecting RGs that exhibit minimal expression variability across various conditions or tissues.

      Forest trees and horticultural plants encompass a diverse range of species, many of which are still in the early stages of genomic and functional genomic research. The emergence of RNA-Seq technology has significantly advanced gene annotation, expression profiling, and functional studies in these plants. The availability of extensive RNA-Seq data sets provides valuable resources for selecting suitable RGs across different species and under various experimental conditions[2325]. Poplar, a widely distributed tree species with significant applications in wood production, environmental protection, and urban greening, serves as an important model plant for woody species[26]. Extensive research has been conducted on poplar stem growth and development, leading to the generation of large-scale transcriptome data sets[2729]. In this study, the objective is to retrieve and assess the quality of publicly available transcriptome data sets related to poplar stem tissues. Subsequently, we aim to predict and evaluate the best candidate RGs for gene expression analysis in stems of various Populus species. The expression stability of these candidate RGs under different stress conditions will be validated using RT-qPCR, leading to the identification of reliable RGs specific to poplar stems. Furthermore, based on this case study, we intend to establish a comprehensive pipeline for the development of RGs for accurate gene expression normalization using RNA-Seq data.

    • The workflow for identifying universal RGs within a genus based on RNA-Seq data is shown in Fig. 1, using RNA-Seq data downloaded from public databases for poplar stem gene expression.

      Figure 1. 

      Workflow for selecting novel reference genes (RGs) based on RNA-Seq data.

    • To obtain the RNA-Seq data of poplar stems, a comprehensive literature search was conducted to identify relevant studies that reported RNA-Seq experiments on poplar stem samples. A dataset comprising 298 RNA-seq samples focusing on poplar stem research was compiled as the training dataset for this study (Supplemental Dataset 1). These samples were derived from 10 distinct experimental groups, each representing different biological aspects of poplar stem. The groups included experiments examining different internodes of poplar[27], various stem tissues such as phloem, xylem, fiber cells, and ray cells[30,31], the entire developmental stage spanning from phloem to xylem[32], abiotic stress treatments such as low temperature, heat, drought, and salt[33], gravity stimulation by placing plants horizontally[34], different growth areas[35], biological stress treatments[36], overexpression of miR397a[37], and transgenic lines treated with ACC (1-aminocyclopropane-1-carboxylic acid)[38] (Supplemental Table S1). Accession numbers associated with the identified studies were then obtained. Subsequently, the RNA-Seq data corresponding to these accession numbers were downloaded from the National Center for Biotechnology Information (NCBI: www.ncbi.nlm.nih.gov) or EMBL's European Bioinformatics Institute (EMBL-EBI: www.ebi.ac.uk) databases. Only RNA-Seq data from poplar stem samples with a minimum of two biological replicates were included in the analysis (Supplemental Table S1).

    • The RNA-Seq data used in this study were obtained from sub-databases of NCBI, including GenBank, GEO, and SRA, as well as from EMBL-EBI, which includes ArrayExpress and ENA (European Nucleotide Archive). For the SRA data, the sratoolkit software v2.9.2 (https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/) was employed to convert the SRA data into FASTQ format before subsequent analysis. The default parameters were used for single-end (SE) sequencing data during the conversion process, while additional parameters (--split-3) were added for paired-end (PE) sequencing data to appropriately handle the data. The data downloaded from the EMBL-EBI database were already in FASTQ format and could be directly used for subsequent analysis.

      The raw data quality was assessed using FastQC v0.10.1 (www.bioinformatics.babraham.ac.uk/projects/fastqc). Subsequently, Trimmomatic v0.32[39] was applied for the removal of adaptor contaminants and low-quality reads. The parameter settings for SE and PE sequencing data were as follows: TruSeq3-SE-2.fa:2:30:10 LEADING:3 SLIDINGWINDOW:5:20 MINLEN:50 and TruSeq3-PE-2.fa:2:30:10 LEADING:3 SLIDINGWINDOW:5:20 MINLEN:50, respectively[40]. These steps ensured that the RNA-Seq data used in the subsequent analysis was of high quality and free from artifacts and noise.

      Following the trimming and filtering of the raw data to obtain clean reads, the clean reads were mapped to the P. trichocarpa v3.0 genome using STAR v2.3.0e_r291[41]. The mapping process utilized the '--readFilesCommand zcat --quantMode TranscriptomeSAM' settings. By mapping the clean reads to the reference genome, the total number of reads in each sample and the number of reads that successfully mapped to the genome were determined. The alignment rate, representing the proportion of mapped reads to the total number of reads, was calculated for each sample. Samples with an alignment rate exceeding 70% were retained for subsequent analysis.

      Upon mapping the clean reads to the genome, a BAM file was generated for each sample. The BAM files contained information about the alignment positions of the reads, enabling the determination of the relative expression levels of all genes in each sample. This step provided the necessary data for further analysis and characterization of gene expression patterns in the poplar stem samples.

    • Four normalization methods, including FPKM/RPKM, TPM, TMM, and DESeq2's median of ratios, were utilized to assess and compare the impact of different normalization approaches on the gene expression analysis and to identify the most appropriate method for the RNA-Seq datasets. The BAM files generated from the previous mapping step were analyzed using the featureCounts program, a part of the Subread package[42], to obtain the unnormalized raw count (RC) of each gene in each sample. For TMM normalization of the RCs, the edgeR package[43] was utilized, while DESeq2 package[44] was employed for DESeq2's normalization. To perform FPKM/RPKM and TPM normalization, the RSEM (RNA-Seq by Expectation Maximization) software[45] was used, applying default parameters. These normalization methods were applied to the RCs, resulting in normalized expression values that can be compared across samples.

      To compare the effects of FPKM/RPKM, TPM, TMM, and DESeq2's median of ratios on the normalization of RC, a bar graph was generated to depict the expression levels of all genes from the lower quartile to the upper quartile values for each method. The relative expression level of each gene was calculated as log2(value+1) to mitigate any outliers resulting from zero values. The Coefficient of Variance (CV) was then calculated for the relative expression level of each gene across all samples. To investigate the variability and consistency of gene expression across RC, FPKM/RPKM, TPM, TMM, and DESeq2's median of ratios methods, a box plot was utilized to visualize the CV distribution of the relative expression levels for all genes.

      Moreover, the CV of the relative expression levels of the reported RGs were used to assess the accuracy and reliability of RC, FPKM/RPKM, TPM, TMM, and DESeq2's median of ratios methods. The CV provides a measure of the variation in gene expression across different samples and allows us to assess the stability and consistency of the RGs under each normalization method.

      The reported RGs in poplar stems had been extensively collected through a literature review[24,4648]. After removing duplicate genes, the relative expression levels of 30 reported RGs were evaluated in all samples (Supplemental Table S2). The CV values and the average CV values of their relative expression levels before and after normalization were then calculated using the four different normalization methods. Boxplots were generated to illustrate the distribution of CV values and the means of CV values.

      After normalizing the data using four normalization methods, the distribution of relative expression levels as well as the distribution of CV values for all Populus genes (42,950 genes) were comprehensively analyzed. The CV distribution and the mean values for the relative expression levels of the 30 reported RGs were also examined. The optimal normalization method was selected based on the above analyses. The relative expression data of all samples were then processed using the selected normalization method for further analysis.

    • Genes with a CV value of ≤ 0.3 in their relative expression levels across all samples, as determined using the optimal normalization method, were considered to have stable expression. These genes were classified as stably expressed genes in this study. The distribution densities of the relative expression levels for all genes, the stably expressed genes, and the 30 reported RGs were statistically analyzed. Available Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation information for all genes in the P. trichocarpa v3.0 genome were used. To further investigate the functional enrichment of the stably expressed genes and the 30 reported RGs, GO, and KEGG enrichment analysis were performed using TBtools software[49].

      Based on the CV values of the relative expression levels of genes across all samples, the top 12 genes with the smallest CV values were selected as candidate RGs for this study. To further confirm the expression stability of the 12 candidate RGs, RT-qPCR experiments were performed and the results were compared with the six reported RGs with the smallest CV in relative expression levels.

    • Hybrid poplar 717 (P. tremula × P. alba clone INRA 717-1B4, hereafter referred to as poplar 717) plants were cultured in 250 mL glass bottles containing 35 mL of 1/2 Murashige and Skoog (MS) medium (Phytotech, Lenexa, USA) supplemented with 0.75% (w/v) agar and 3% (w/v) sucrose. The bottles were placed in a growth chamber with a photoperiod of 16 h of light and 8 h of darkness at a temperature of 25 °C for a duration of two months. The tissue-cultured plants were then transplanted into pots 8.5 cm in diameter and 14 cm high, filled with a soil mixture of three parts peat, two parts base soil, and one part vermiculite. The plants were grown in a greenhouse at a temperature of 28 ± 5 °C for three months. Five Populus cultivars including P. simonii, P. deltoides, hybrid poplar clone 717 (P. tremula × P. alba L.), P. alba and P. adenopoda were grown in the field of the campus of Huazhong Agricultural University in Wuhan, China.

    • Poplar 717 plants of uniform size (100−110 cm) were selected and subjected to various abiotic stresses (Supplemental Table S3). Salt stress was induced by exposing plants to 150 mM NaCl solution for 2 d (D2), 4 d (D4), and 6 d (D6). The plants were watered with the NaCl solution every 24 h. The drought stress treatment adopted the process of simulating the gradual drying of soil. Three groups of plants were withheld water for 5 d (D5), 10 d (D10), and 15 d (D15) to achieve mild, moderate, and severe drought conditions, respectively. The control group (D0) was watered regularly according to evaporative needs. Relative soil moisture was measured using a FieldScoutTM TDR 300 Soil Moisture Meter (Spectrum Technologies, USA). For shade stress, plants were covered by black nylon mesh to achieve 70%−80% shading. The time points for shade stress were D0, D5, D10 and D15. The light intensity at each time point was measured using a TM-205 Auto Ranging Lux/Fc Light Meter (TENMARS, Taiwan). In the above stress treatments, the physiological parameters such as net photosynthetic rate (Pn), stomatal conductance (gs), intercellular CO2 (Ci), and transpiration rate (Tr) of the poplar 717 plants were measured using a LI-6400XT portable photosynthesis system (LI-COR, USA). Gravity stress was induced by bending the plants to investigate tension wood and opposite wood at D2 and D14 time points. For each time point of each treatment, the number of sample replicates was three.

    • Primers for RT-qPCR experiments were designed to ensure gene specificity within species and generality among species, and the following primer design steps were performed. The primary transcript sequences of the RGs from P. trichocarpa were used to perform blastn searches (-evalue: 1e-5, -num_alignments: 5) to obtain their homologous transcripts in P. trichocarpa, P. euphratica, P. deltoides, P. tremula and P. alba. To select the most homologous genes for each RG, a phylogenetic analysis of RG and its homologs was performed using MEGA X[50] with a maximum likelihood method and a bootstrap value of 1,000. To obtain the consensus nucleotide sequence of each RG, multiple sequence alignment was then performed between homologous transcript sequences from different Populus species. Based on the consensus sequence, universal primer pairs for each RG were designed using Primer3 software (https://primer3.ut.ee/) with the following criteria: primer length of 18−27 nt, primer melting temperature (Tm) of 59−63 °C, and product size of 100−300 bp. Finally, mfeprimer-3.2.0 software[51] was used to perform gene-specific detection of the designed primers in the aforementioned five genomes.

    • After the stress treatments, samples were taken from the 9th internode (IN9) at the branch top of each plant. Of the field-grown poplar trees, the young (IN3) and mature stems (IN9) were collected from each cultivar. The samples were immediately placed in liquid nitrogen and stored at −80 °C for RT-qPCR analysis. All plants were sampled with three biological replicates.

      The total RNA was extracted from the young and mature stem tissues of poplar plants using the 2×CTAB method as previously described[52]. The quality and quantity of the extracted RNA was assessed using a NanoDropTM 2000 spectrophotometer (Thermo Scientific, USA). To generate complementary DNA (cDNA), the PrimeScriptTM RT Reagent Kit with gDNA Eraser (TaKaRa, Dalian, China) was used following the manufacturer's instructions. Reactions of cDNA synthesis were performed in 20 µL volumes. The resulting cDNA was diluted 20-fold with ultrapure water and served as a template for RT-qPCR analysis.

      For RT-qPCR, ChamQTM SYBR® qPCR Master Mix (High ROX Premixed) from Vazyme (Nanjing, China) was used following the manufacturer's instructions. The RT-qPCR was conducted on a Light Cycler® 480 instrument II from Roche (USA) using white 384-well plates. The gene expression levels of the candidate RGs were evaluated.

      To determine the amplification efficiency (E) and R2 value for all primer pairs, a 6-point standard curve was generated by performing 20× dilutions of cDNA. The amplification efficiency (E) was calculated as E = 10^(−1/slope of the standard curve), and the R2 value was determined. All RT-qPCR experiments were performed with three biological replicates and two technical replicates.

    • The expression stability of the candidate RGs and reported RGs were evaluated using Ref-Finder (www.leonxie.com/referencegene.php), which incorporates the delta-Ct, BestKeeper, Normfinder, and geNorm algorithms. The evaluation was based on the average Ct values obtained from RT-qPCR experiments. By considering the comprehensive geometric stability ranking, candidate RGs with the most stable expression were identified and selected as novel RGs for poplar stems.

      To evaluate the expression stability of the candidate RGs and reported RGs, a test dataset of 117 RNA-Seq samples was compiled, focusing on poplar stem studies (Supplemental Table S4). The samples came from nine different experimental groups recently uploaded to the public databases and represent various biological aspects of poplar stems, including flexure treatment[53], auxin treatment[54] and transgenic studies on ERF15[55], miR408[56] and PtrVCS2[57]. The test dataset was downloaded and processed as previously described for the training dataset. After normalizing the data by using the DESeq2's median of ratios method, distributions, and CV values of relative expression levels were used to assess the expression stability of the 12 candidate RGs and six reported RGs in the test dataset.

    • Data statistics and analysis were performed using the R programming language. Specifically, R language packages, such as ggplot and ggtree, were utilized for data visualization and plotting to create high-quality graphics and visualize data in a clear and informative manner.

    • Through a comprehensive review of relevant literature on poplar transcriptome sequencing, a dataset comprising 298 RNA-seq samples focusing on poplar stem research was compiled for this study (Supplemental Dataset 1). These samples provide a comprehensive representation of poplar stem development, enabling the identification and evaluation of candidate RGs. The 298 samples were aligned to the Populus trichocarpa v3.0 genome, resulting in 294 samples with an alignment rate exceeding 70% (Supplemental Dataset 2). Among these, samples K5_20 and K5_28 had alignment rates higher than 80% (84.80% and 84.49% respectively), but were excluded from further analysis due to their low read counts[41]. Consequently, transcriptome data from a total of 292 samples were retained for subsequent analysis in this study.

      In addition, a collection of 37 reported RGs in poplar stems was compiled. After removing duplicate genes based on their gene ID numbers, a set of 30 reported RGs was obtained for further analysis (Supplemental Table S2).

    • By analyzing the upper and lower quartile values of gene expression levels in the 292 samples before and after normalization, and presenting the results in bar graphs, the differences in gene expression levels among samples were reduced after normalization with RPKM/FPKM, TPM, TMM, and DESeq2's median of ratios compared to without normalization (RC). Specifically, the distribution of gene expression levels between samples became more similar and less variable after DESeq2 and TMM normalization, indicating that these methods performed better than RPKM/FPKM and TPM (Fig. 2a). Furthermore, the CV of the relative expression of all genes in the 292 samples were examined. After normalization with RPKM/FPKM, TPM, TMM, and DESeq2's median of ratios, the median CV values were 1.52, 1.51, 1.47, and 1.48, respectively, with TMM and DESeq2's median of ratios exhibiting more favorable effects compared to RPKM/FPKM and TPM (Fig. 2b). To evaluate the effect of different normalization methods, the changes in the expression levels of the 30 reported RGs were analyzed. The median and mean CVs for unnormalized RC were 0.90 and 0.97, respectively. After normalization with RPKM/FPKM, TPM, TMM, and DESeq2's median of ratios, the median CVs were 0.52, 0.52, 0.53, and 0.51, respectively; and the mean CVs were 0.58, 0.56, 0.57, and 0.55, respectively (Fig. 2c).

      Figure 2. 

      Comparison of normalization methods. RC (raw counts) represents raw counts without normalization, while FPKM, TPM, TMM, and DESeq2 are results after normalization using four different methods. (a) Distribution of counts for all genes in each sample before and after normalization using 292 samples. Bar plots are employed instead of box plots, with the upper limit of the bar representing the upper quartile and the lower limit representing the lower quartile of the data. (b) Box plot demonstrates the coefficient of variance (CV) of gene expression levels for all genes before and after normalization, utilizing data from 292 samples. (c) Box plot depicting the CV for 30 previously reported reference genes (RGs) after different normalizations across all 292 samples, with the central white point indicating the mean CV.

      DESeq2's median of ratios and TMM showed similar effects in normalizing the expression levels of the 30 reported RGs in poplar. However, DESeq2's median of ratios yielded the smallest CV values and was therefore chosen as the method for sample normalization in the subsequent steps.

    • To screen out the RGs suitable for various poplar species and experimental treatments, the RNA-Seq data of the 292 samples were further analyzed. After RC was normalized by the DESeq2's median of ratios method, the mean, standard deviation (SD), and CV of the relative expression levels of each gene were calculated. A total of 541 stably expressed genes (CV ≤ 0.3) were screened, including four reported RGs (PP2A-2, EIF-A, CDPK and ATPase) (Fig. 3a, Supplemental Dataset 3). The log values of the relative expression levels' mean of the 541 stably expressed genes ranged from 5.91 to 13.03; and for the 30 reported RGs, the log values ranged from 7.17 to 16.09 (Fig. 3b). This indicates that the stably expressed genes identified in this study have similar expression levels to the reported RGs. GO and KEGG enrichment analyses were performed on the stably expressed genes and the reported RGs. The results indicate that the stably expressed genes identified in this study share functional similarities with the reported RGs (Supplemental Dataset 4).

      Figure 3. 

      Comparative analysis between stably expressed genes and reported reference genes (RGs). This figure presents a comparison of expression patterns between stably expressed genes and previously reported RGs. (a) Venn diagram illustrates the overlap between stably expressed genes (with CV ≤ 0.3) identified from 292 RNA-Seq samples and the reported RGs. (b) The density distribution of gene expression levels is depicted for all genes in the Populus genome, stably expressed genes, and reported RGs. The X-axis represents log2(counts + 1), and the Y-axis represents gene density.

    • From the 541 stably expressed genes, 12 genes (Table 1) with the smallest CV values among the 292 samples were selected as candidate RGs, based on the normalization method of DESeq2's median of ratios. Comparing the stability of expression levels between these candidate RGs and the reported RGs, the former had CV values ranging from 0.16 to 0.20, and the latter had CV values ranging from 0.23 to 1.53. The expression of the candidate RGs showed better stabilities (lower CV values) than the reported RGs in the 292 samples (Fig. 4). Furthermore, using the six reported RGs with the smallest CV values as comparisons, the expression stability of 12 candidate RGs was verified by RT-qPCR analysis.

      Table 1.  Information of candidate reference genes (RGs) with the smallest coefficient of variance (CV) values among the 292 RNA-Seq samples.

      Gene IDGene nameDescriptionCV
      Potri.001G349400CNOT2CCR4-NOT transcription complex subunit 20.168
      Potri.002G157500RH8Similar to DEAD/DEAH box helicase0.172
      Potri.005G110600VPS35Vacuolar protein sorting-associated protein 350.173
      Potri.002G197600FIP37.1Similar to ARABIDOPSIS THALIANA FKBP12 INTERACTING PROTEIN 370.175
      Potri.013G070001NAUDP-glucose pyrophosphorylase0.186
      Potri.001G197400Pt-UBP6.2Similar to UBIQUITIN-SPECIFIC PROTEASE 60.190
      Potri.006G116700U2AF1Splicing factor U2AF 35 kDa subunit0.193
      Potri.008G111700NAPredicted hydrolases of HD superfamily0.194
      Potri.011G084400CUL4Similar to hypothetical protein0.194
      Potri.004G064400NASimilar to ankyrin protein kinase0.198
      Potri.008G217300Pt-CUL1.4Similar to cullin-like protein10.199
      Potri.003G045700Pt-ATRLI1.2Similar to RNase L inhibitor protein; putative0.202
      NA: not available. The estimated CVs were based on the normalization method of DESeq2's median of ratios.

      Figure 4. 

      Assessment of candidate and reported reference genes (RGs) stability based on RNA-Seq data. Box plots are employed to display the distribution of expression levels for 12 candidate RGs and 30 reported RGs. The left side shows the expression distribution of candidate RGs, while the right side shows the expression distribution of reported RGs. The small circles on the red lines represent the coefficient of variance (CV) for the gene expression levels.

    • To design universal primers suitable for different poplar species, the workflow illustrated in Fig. 5a was developed. The primary transcript sequences of 18 genes in P. trichocarpa was used for blastn to retrieve their homologous genes from P. trichocarpa, P. euphratica, P. deltoides, and poplar 717. For example, when examining the primary transcript sequence of Potri.005G10600, a total of 13 homologous transcripts were retrieved from blastn searches of the five poplar genomes. A phylogenetic tree was constructed using the primary transcript sequences of Potri.005G10600 and it's 13 homologues to identify the former's orthologs (Fig. 5b). Multiple sequence alignments were performed between Potri.005G10600 and its orthologs to obtain consensus sequences for primer design (Fig. 5c). The specificity of the primers on the five poplar genomes was verified using the mfeprimer-3.2.0 software to obtain universal primers for these species. Finally, RT-qPCR primers for all 18 RGs, including 12 candidate RGs and six reported RGs, were designed according to the method described above (Supplemental Table S5).

      Figure 5. 

      RT-qPCR primer design. This figure illustrates the process of designing RT-qPCR primers using Potri.005G110600 as a representative example. (a) Flowchart outlining the primer design process is presented. (b) Phylogenetic tree showing the target gene and homologous genes in five different Populus species, with color-coded gene names representing the most homologous genes to the target gene. (c) Multiple sequence alignment of transcript sequences for the target gene Potri.005G110600 and its homologous genes. The alignment highlights conserved positions in blue and non-conserved positions in white. The region indicated by red arrows corresponds to the primer binding site.

      To verify the performance of the designed primer pairs, RNA was extracted from the stems of poplar 717 tissue culture seedlings, and cDNA was synthesized through reverse transcription of RNA. The gene specificity and amplification efficiency of the designed primer pairs were then evaluated using RT-qPCR. The correlation coefficient (R2 values) of the 18 primer pairs in RT-qPCR amplification were all above 0.99, indicating that they had high gene specificity. The amplification efficiencies varied from 91.3% to 109.7%, all of which were in the reliable section from 90% to 115%[58]. These results indicated that the designed primers were suitable for further gene expression analysis.

    • Poplar 717 plants were subjected to different stress treatments (Supplemental Table S3) to verify the expression stability of the 12 candidate RGs and six reported RGs by RT-qPCR. The expression stability of these genes was evaluated and ranked using four commonly used stability evaluation programs: delta-Ct, BestKeeper, Normfinder, and geNorm (Supplemental Dataset 5). The physiological parameters, including net photosynthetic rate (Pn), stomatal conductance (gs), intercellular CO2 (Ci), and transpiration rate (Tr) were used to monitor the stress response of poplar plants to various treatments (Supplemental Dataset 6). The comprehensive ranking was generated by computing the geometric mean (geomean) of the gene rankings. The distribution of Ct values of RT-qPCR for all RGs was concentrated closely near the mean, indicating consistent and reliable expression levels of the candidate RGs and the reported RGs (Fig. 6a). Four candidate RGs, Potri.001G349400, Potri.005G110600, Potri.011G084400 and Potri.008G111700, exhibited the highest stability and ranked at the top under all stress conditions (Fig. 6a). To further verify that the candidate RGs in this study are suitable for comparison among different poplar species, young and mature stems (IN3 and IN9) collected from five different Populus cultivars including P. simonii, P. deltoides, poplar 717, P. alba, and P. adenopoda grown on the campus of Huazhong Agricultural University were used. And the stability of 18 RGs in these stems was analyzed. Potri.001G349400, Potri.011G084400, Potri.005G110600 and Potri.002G157500 were among the best for stability in the stems of these cultivars (Fig. 6b). Based on the above two experiments, Potri.001G349400, Potri.005G110600, and Potri.011G084400 showed the best stability compared to other candidate RGs and reported RGs.

      Figure 6. 

      Comprehensive analysis of expression stability and Ct value distribution of reference genes (RGs). This figure presents a comprehensive evaluation of the expression stability of 18 RGs (12 candidate RGs and six reported RGs) across diverse stress treatments and Populus species using Ref-finder, along with the distribution of Ct values for these 18 RGs. (a) Box plots showing the Ct value distribution and integrated stability ranking of the 18 RGs under different stress treatment experiments (Supplemental Table S5). (b) Box plots depicting the Ct value distribution and integrated stability ranking of the 18 RGs in stems of five different Populus cultivars including P. simonii, P. deltoides, poplar 717, P. alba, and P. adenopoda. The small circles on red line indicate the integrated stability ranking, with smaller values indicating greater stability. All RT-qPCR experiments were performed with three biological replicates and two technical replicates. Each dot of the boxplot represents the average of these six replicates.

      To further verify the stability of the candidate RGs and reported RGs, a test dataset was constructed using the latest RNA-Seq data recently uploaded to the public databases, which included nine sets of poplar stem-related experiments with a total of 117 samples (Supplemental Table S4, Supplemental Dataset 7). Based on the DESeq2's median of ratios normalization method, distributions and CV values of relative expression levels of the 18 RGs in the test dataset were assessed. In total, there were six candidate RGs and three reported RGs with CV values below 0.3 (Fig. 7, Supplemental Dataset 8). Among them, Potri.001G349400, Potri.002G197600 and Potri.002G157500 can be considered as novel RGs for poplar stem gene expression analysis, and their gene annotations are respectively CNOT2 (CCR4-NOT transcription complex subunit 2), FIP37.1 (similar to Arabidopsis FKBP12-interacting protein 37), and RH8 (DEAD/DEAH box RNA helicase-like 8). Potri.001G349400/CNOT2 ranked first in the training dataset, RT-qPCR validation and test dataset, showing the best expression stability in stem-related samples among all poplar genes. The universal primers used to amplify Potri.001G349400/CNOT2 were: 5'-GGTCTGACGAGCCAGCAAAGGGT-3' (forward) and 5'-CGCCGCTGCTCCTTGTGGT-3' (reverse).

      Figure 7. 

      Evaluation of candidate and reported reference genes (RGs) stability based on RNA-Seq data. Box plots are employed to display the distribution of expression levels for 12 candidate RGs and six reported RGs. The small circles on the red line represents the coefficient of variance (CV) for the gene expression levels, where smaller CV values signify greater stability of gene expression. The three genes considered as novel RGs for poplar stem gene expression analysis are shown in purple.

    • The selection of suitable RGs for RT-qPCR normalization is crucial in gene expression analysis. While previous studies have often relied on literature-reported RGs, there is a growing recognition of the need for systematic selection of stable RGs. In this study, the aim was to identify constitutively and stably expressed genes that can serve as reliable internal controls for RT-qPCR experiments. To achieve this, both novel and reported RGs were retrieved from transcriptome datasets (Supplemental Table S1) of Populus at the genus level. These datasets provided a comprehensive view of gene expression under different developmental stages and abiotic stress conditions. Based on the CV values of gene expression, 12 novel candidate RGs and six reported RGs that demonstrated stable expression across these conditions were selected (Fig. 4). Furthermore, these novel RG candidate genes were evaluated using the latest poplar stem-related RNA-Seq data from public databases (Fig. 7) and three of them were suggested for poplar stem-related research, Potri.001G349400/CNOT2, Potri.002G197600/FIP37.1 and Potri.002G157500/RH8. By systematically selecting RGs based on their expression stability, the present study provides researchers with an important resource for gene expression analysis in the stems of Populus and potentially other plant species. These stable RGs can serve as reliable internal controls for RT-qPCR experiments, enabling more accurate and robust gene expression studies.

      Normalization methods play a crucial role in identifying internal RGs based on transcriptome data. In this study, the efficacy of several methods commonly used to identify RGs based on RNA-Seq data were evaluated. We compared four common normalization methods: FPKM, TPM, TMM, and DESeq2's median of ratios. Variation among the data is typically evaluated using metrics such as MFC (mean fold change), SD, and interquartile of expression level. CV value is a valuable metric for assessing the variability of gene expression levels relative to their mean, and it has been widely used to identify suitable RGs from transcriptome datasets[59,60]. In many studies, the top 1,000 expressed transcripts with the lowest CV values in contrasting environments are selected as stably expressed genes. The threshold for CV values is often set at < 16% or < 30%. Alternatively, some studies choose genes with a low CV of logarithmically transformed RPKM or transcript copy numbers, typically with a threshold of < 4%. The log2 (normalized values) and average CV values obtained by the five methods were evaluated. The results showed that the DESeq2's median of ratios and TMM normalization methods provided higher consistency compared to RC, FPKM, and TPM. DESeq2's median of ratios yielded the smallest CV values when tested with the reported RGs, and was therefore chosen as the normalization method for RNA-Seq data in this study. This approach ensures reliable and accurate normalization, resulting in more robust gene expression analysis.

      Comprehensive analysis of RNA-Seq data obtained from different Populus species under different conditions facilitated the identification of highly reliable and broadly applicable candidate RGs. Only four out of 30 previously reported RGs have been identified as stably expressed genes across diverse conditions and environments based on their CV values (Fig. 3a). In addition to CV values, the expression levels of transcripts are also important considerations when selecting RGs from transcriptomic data. It is generally preferred to choose transcripts with higher expression levels as internal controls for gene quantification due to reasons of efficiency and accuracy[24]. In this study, both the stably expressed genes and the reported RGs generally had high gene expression levels (Fig. 3b). KEGG enrichment showed that most of these genes were involved in various key biological processes related to gene expression (Supplemental Dataset 4), providing further insights into the functional relevance of these genes. Only four reported RGs, PP2A-2, EIF4A, CDPK, and ATPase, exhibited CV values below 30%, while all the stably expressed genes had CV values < 30% (Figs 3a & 4). This indicates that the newly selected candidate genes have great potential as RGs.

      In many previous studies, the selection of suitable RGs for gene expression studies are often limited to specific species and conditions[6164]. This study was aimed to evaluate the novel and reported RGs at the genus level of Populus. Therefore, it is crucial to design primers that not only exhibit gene specificity within a particular species but also are universal among different Populus species. To ensure primer versatility, we employed an integrative approach to design gene-specific primers to amplify orthologous RGs from multiple species (Fig. 5). Consensus sequences based on multiple sequence alignments of candidate RGs and their orthologs from different Populus species were used for primer design. Subsequently, the gene specificity of the primers in five Populus genomes were verified using mfeprimer-3.2.0. This approach allows for consistent and reliable gene expression analysis across multiple Populus species, facilitating broader applicability of universal primer pairs within the genus.

      By employing a combination of bioinformatics tools and analysis methods, a set of candidate RGs with high stability and applicability were successfully identified in various experimental conditions. The use of RefFinder, which integrates popular stability evaluation algorithms such as geNorm, NormFinder, BestKeeper, and delta-Ct method, allowed us to comprehensively evaluate the performance of the 18 tested genes in RT-qPCR analysis. All these RGs exhibited high stability in both young and mature stems of various Populus cultivars, as well as under different stress treatments in poplar 717 (Fig. 6). To further test the applicability of the candidate RGs in this study, their expression stability was evaluated using the latest RNA-Seq data from public databases (Supplemental Table S4, Fig. 7). The functional relevance of the best-performing RG, Potri.001G349400/CNOT2 needs to be highlighted. CNOT2 is a core member of the Carbon catabolite repression4 (Ccr4)–NOT complex, which plays a crucial role in transcriptional regulation. In addition, Potri.002G197600/FIP37.1 and Potri.002G157500/RH8 can also be considered as novel RGs for poplar stem gene expression analysis based on their good performance in different tests in this study. In the analysis of the training and the test datasets, only two reported RGs (EIF4A and ATPase) had CV values below 0.3 in both cases. Therefore, most reported RGs are not suitable for gene expression analysis in poplar stems. The novel RGs developed in this study have strong applicability for gene expression analysis in poplar stems, but whether they apply to other tissues requires analysis based on relevant RNA-Seq data. The species involved in RNA-Seq and RT-qPCR in this study are relatively common in current poplar research and are also widely represented in the genus Populus, which is very useful for increasing the applicability of these RGs. When studying gene expression in other Populus species not included in this study, performing some amount of RNA-Seq might help improve the applicability of these RGs.

      It is not cost-effective to perform RT-qPCR to test all reported RGs for each new experiment. As gene expression data based on RNA-Seq continue to accumulate from diverse experiments and species, the approach employed in this study holds promise for integrating and mining such data in a meaningful way. Identification of stable and reliable RGs will facilitate accurate and standardized gene expression analysis across different conditions and species.

    • In conclusion, the current study presents a novel methodology for selecting stably expressed genes from transcriptome expression data, which offers several advantages over traditional approaches. By combining expression stability analysis and RT-qPCR validation, we successfully identified a set of novel and stable RGs for Populus stems. This methodology is targeted, convenient, and efficient, enabling the identification of new and more reliable RGs. The analytical method we developed can be applied to other plant genera and will greatly help researchers compare gene expression patterns in different species within the same genus. Furthermore, the identified RGs for stem development in Populus will serve as valuable tools for studying gene expression dynamics during wood formation in plants.

    • The authors confirm contribution to the paper as follows: study conception and design: Zheng B, Shi X; bioinformatics analysis: Xie Q; conducting the experiments: Ahmed U; participating in the experiments: Qi C, Du K, Luo J, Wang P; draft manuscript preparation: Xie Q, Ahmed U; manuscript revision: Shi X, Zheng B, Xie Q, Ahmed U. All authors reviewed the results and approved the final version of the manuscript.

    • All data generated or analyzed during this study are included in this published article and its supplementary information files.

      • This study was supported by the National Natural Science Foundation of China (32171821, 32271834 and 32371908), Hubei Provincial Natural Science Foundation of China (2022CFB234), and the Fundamental Research Funds for the Central Universities (2662022YLYJ009). Thanks to Huazhong Agricultural University for providing PhD scholarships. Thanks to the Chinese Postdoctoral Science Foundation for providing the postdoctoral research funding (320554).

      • The authors declare that they have no conflict of interest.

      • # Authors contributed equally: Qi Xie, Umair Ahmed

      • Copyright: © 2024 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.
    Figure (7)  Table (1) References (64)
  • About this article
    Cite this article
    Xie Q, Ahmed U, Qi C, Du K, Luo J, et al. 2024. A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression. Forestry Research 4: e021 doi: 10.48130/forres-0024-0017
    Xie Q, Ahmed U, Qi C, Du K, Luo J, et al. 2024. A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression. Forestry Research 4: e021 doi: 10.48130/forres-0024-0017

Catalog

  • About this article

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return