A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression

Qi Xie; Umair Ahmed; Cheng Qi; Kebing Du; Jie Luo; Pengcheng Wang; Bo Zheng; Xueping Shi; Qi Xie; Umair Ahmed; Cheng Qi; Kebing Du; Jie Luo; Pengcheng Wang; Bo Zheng; Xueping Shi

doi:10.48130/forres-0024-0017

2024 Volume 4

Article Contents

Next Previous

ARTICLE Open Access

A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression

1.
National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan 430070, China
2.
College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan 430070, China
3.
Poplar Research Center, Huazhong Agricultural University, Wuhan 430070, China
4.
Hubei Engineering Technology Research Center for Forestry Information, Huazhong Agricultural University, Wuhan 430070, China
^# Authors contributed equally: Qi Xie, Umair Ahmed

More Information

Corresponding author: xpshi@mail.hzau.edu.cn

Received: 23 October 2023
Revised: 07 April 2024
Accepted: 07 May 2024
Published online: 01 June 2024
Forestry Research 4, Article number: e021 (2024) | Cite this article

Abstract

Real-time quantitative reverse transcription polymerase chain reaction (RT-qPCR) plays a crucial role in relative gene expression analysis, and accurate normalization relies on suitable reference genes (RGs). In this study, a pipeline for identifying candidate RGs from publicly available stem-related RNA-Seq data of different Populus species under various developmental and abiotic stress conditions is presented. DESeq2's median of ratios yielded the smallest coefficient of variance (CV) values in a total of 292 RNA-Seq samples and was therefore chosen as the method for sample normalization. A total of 541 stably expressed genes were retrieved based on the CV values with a cutoff of 0.3. Universal gene-specific primer pairs were designed based on the consensus sequences of the orthologous genes of each Populus RG candidate. The expression levels of 12 candidate RGs and six reported RGs in stems under different abiotic stress conditions or in different Populus species were assessed by RT-qPCR. The expression stability of selected genes was further evaluated using ΔCt, geNorm, NormFinder, and BestKeeper. All candidate RGs were stably expressed in different experiments and conditions in Populus. A test dataset containing 117 RNA-Seq samples was then used to confirm the expression stability, six candidate RGs and three reported RGs met the requirement of CV ≤ 0.3. In summary, this study was to propose a systematic and optimized protocol for the identification of constitutively and stably expressed genes based on RNA-Seq data, and Potri.001G349400 (CNOT2) was identified as the best candidate RG suitable for gene expression studies in poplar stems.
- Reference genes,
- Populus,
- Gene expression,
- Transcriptome,
- RT-qPCR,
- Abiotic stress,
- Stem development

Supplementary information

Supplemental Table S1 Source and basic information of RNA-Seq training dataset.
Supplemental Table S2 Information of reported reference genes (RGs) in Populus.
Supplemental Table S3 List of stress treatments for poplar 717.
Supplemental Table S4 Source and basic information of RNA-Seq test datasets.
Supplemental Table S5 Universal RT-qPCR primers for 12 candidate reference genes (RGs) and 6 reported RGs in Populus.
Supplemental Dataset S1 The list of RNA-Seq data of poplar stem samples for the training dataset.
Supplemental Dataset S2 Mapping statistics of RNA-Seq data of 298 samples in the training dataset. The read count of each sample and the mapping rate for different mapping cases. Samples with an alignment rate exceeding 70% or exhibiting low read counts are highlighted in red, and these were subsequently excluded from further analysis.
Supplemental Dataset S3 Gene expression levels and coefficient of variance (CV) values for stably expressed genes of the training dataset. The read counts were normalized using the DESeq2 method. The stably expressed genes were defined by CV ≤ 0.3.
Supplemental Dataset S4 The results of GO enrichment analysis for reported reference genes (RGs).
Supplemental Dataset S5 Expression stability ranking of reference genes in different Populus cultivars and stress treatments.
Supplemental Dataset S6 Photosynthetic characteristics of poplar leaves shade stress.
Supplemental Dataset S7 Mapping statistics of RNA-Seq data for the test dataset. The read count of each sample and the mapping rate for different mapping cases.
Supplemental Dataset S8 Gene expression levels and CV values of 18 reference genes in the test dataset.

Rights and permissions
Copyright: © 2024 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Deshpande D, Chhugani K, Chang Y, Karlsberg A, Loeffler C, et al. 2023. RNA-seq data science: From raw data to effective interpretation. Frontiers in Genetics 14:997383 doi: 10.3389/fgene.2023.997383 CrossRef Google Scholar
[2]	Wang Z, Gerstein M, Snyder M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10:57−63 doi: 10.1038/nrg2484 CrossRef Google Scholar
[3]	Ozsolak F, Milos PM. 2011. RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics 12:87−98 doi: 10.1038/nrg2934 CrossRef Google Scholar
[4]	Geraci F, Saha I, Bianchini M. 2020. Editorial: RNA-Seq analysis: methods, applications and challenges. Frontiers in Genetics 11:220 doi: 10.3389/fgene.2020.00220 CrossRef Google Scholar
[5]	Marguerat S, Bähler J. 2010. RNA-seq: from technology to biology. Cellular and Molecular Life Sciences 67:569−79 doi: 10.1007/s00018-009-0180-6 CrossRef Google Scholar
[6]	Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5:621−28 doi: 10.1038/nmeth.1226 CrossRef Google Scholar
[7]	Dillies MA, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, et al. 2013. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Briefings in Bioinformatics 14:671−83 doi: 10.1093/bib/bbs046 CrossRef Google Scholar
[8]	Zhao Y, Li MC, Konaté MM, Chen L, Das B, et al. 2021. TPM, FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data from the NCI patient-derived models repository Journal of Translational Medicine 19:269 doi: 10.1186/s12967-021-02936-w CrossRef Google Scholar
[9]	Wang L, Xie W, Chen Y, Tang W, Yang J, et al. 2010. A dynamic gene expression atlas covering the entire life cycle of rice. The Plant Journal 61:752−66 doi: 10.1111/j.1365-313X.2009.04100.x CrossRef Google Scholar
[10]	Li G, Sun X, Zhu X, Wu B, Hong H, et al. 2023. Selection and validation of reference genes in virus-infected sweet potato plants. Genes 14:1477 doi: 10.3390/genes14071477 CrossRef Google Scholar
[11]	Wang Q, Guo C, Yang S, Zhong Q, Tian J. 2023. Screening and verification of reference genes for analysis of gene expression in garlic (Allium sativum L.) under cold and drought stress. Plants 12:763 doi: 10.3390/plants12040763 CrossRef Google Scholar
[12]	Ahmed U, Xie Q, Shi X, Zheng B. 2022. Development of reference genes for horticultural plants. Critical Reviews in Plant Sciences 41:190−208 doi: 10.1080/07352689.2022.2084227 CrossRef Google Scholar
[13]	Panina Y, Germond A, Masui S, Watanabe TM. 2018. Validation of common housekeeping genes as reference for qPCR gene expression analysis during iPS reprogramming process. Scientific Reports 8:8716 doi: 10.1038/s41598-018-26707-8 CrossRef Google Scholar
[14]	Bustin SA, Beaulieu JF, Huggett J, Jaggi R, Kibenge FSB, et al. 2010. MIQE précis: Practical implementation of minimum standard guidelines for fluorescence-based quantitative real-time PCR experiments. BMC Molecular Biology 11:74 doi: 10.1186/1471-2199-11-74 CrossRef Google Scholar
[15]	Pfaffl MW, Tichopad A, Prgomet C, Neuvians TP. 2004. Determination of stable housekeeping genes, differentially regulated target genes and sample integrity: BestKeeper–Excel-based tool using pair-wise correlations. Biotechnology Letters 26:509−15 doi: 10.1023/B:BILE.0000019559.84305.47 CrossRef Google Scholar
[16]	Huis R, Hawkins S, Neutelings G. 2010. Selection of reference genes for quantitative gene expression normalization in flax (Linum usitatissimum L.). BMC Plant Biology 10:71 doi: 10.1186/1471-2229-10-71 CrossRef Google Scholar
[17]	Gutierrez L, Mauriat M, Guénin S, Pelloux J, Lefebvre JF, et al. 2008. The lack of a systematic validation of reference genes: a serious pitfall undervalued in reverse transcription-polymerase chain reaction (RT-PCR) analysis in plants. Plant Biotechnology Journal 6:609−18 doi: 10.1111/j.1467-7652.2008.00346.x CrossRef Google Scholar
[18]	Guénin S, Mauriat M, Pelloux J, Van Wuytswinkel O, Bellini C, et al. 2009. Normalization of qRT-PCR data: the necessity of adopting a systematic, experimental conditions-specific, validation of references. Journal of Experimental Botany 60:487−93 doi: 10.1093/jxb/ern305 CrossRef Google Scholar
[19]	Thellin O, Zorzi W, Lakaye B, De Borman B, Coumans B, et al. 1999. Housekeeping genes as internal standards: use and limits. Journal of Biotechnology 75:291−95 doi: 10.1016/S0168-1656(99)00163-7 CrossRef Google Scholar
[20]	Borges AF, Fonseca C, Ferreira RB, Lourenço AM, Monteiro S. 2014. Reference gene validation for quantitative RT-PCR during biotic and abiotic stresses in Vitis vinifera. PLoS One 9:e111399 doi: 10.1371/journal.pone.0111399 CrossRef Google Scholar
[21]	Sun H, Li F, Ruan Q, Zhong X. 2016. Identification and validation of reference genes for quantitative real-time PCR studies in Hedera helix L. Plant Physiology and Biochemistry 108:286−94 doi: 10.1016/j.plaphy.2016.07.022 CrossRef Google Scholar
[22]	Imai T, Ubi BE, Saito T, Moriguchi T. 2014. Evaluation of reference genes for accurate normalization of gene expression for real time-quantitative PCR in Pyrus pyrifolia using different tissue samples and seasonal conditions. PLoS One 9:e86492 doi: 10.1371/journal.pone.0086492 CrossRef Google Scholar
[23]	Chen F, Song Y, Li X, Chen J, Mo L, et al. 2019. Genome sequences of horticultural plants: past, present, and future. Horticulture Research 6:112 doi: 10.1038/s41438-019-0195-6 CrossRef Google Scholar
[24]	Zhao J, Yang F, Feng J, Wang Y, Lachenbruch B, et al. 2017. Genome-wide constitutively expressed gene analysis and new reference gene selection based on transcriptome data: a case study from poplar/canker disease interaction. Frontiers in Plant Science 8:1876 doi: 10.3389/fpls.2017.01876 CrossRef Google Scholar
[25]	Chen Y, Luo B, Liu C, Zhang Z, Zhou C, et al. 2021. Identification of reliable reference genes for quantitative real-time PCR analysis of the Rhus chinensis Mill. leaf response to temperature changes. FEBS Open Bio 11:2763−73 doi: 10.1002/2211-5463.13275 CrossRef Google Scholar
[26]	Brunner AM, Busov VB, Strauss SH. 2004. Poplar genome sequence: functional genomics in an ecologically dominant plant species. Trends in Plant Science 9:49−56 doi: 10.1016/j.tplants.2003.11.006 CrossRef Google Scholar
[27]	Chao Q, Gao Z, Zhang D, Zhao B, Dong F, et al. 2019. The developmental dynamics of the Populus stem transcriptome. Plant Biotechnology Journal 17:206−19 doi: 10.1111/pbi.12958 CrossRef Google Scholar
[28]	Wang J, Tian Y, Li J, Yang K, Xing S, et al. 2019. Transcriptome sequencing of active buds from Populus deltoides CL. and Populus × zhaiguanheibaiyang reveals phytohormones involved in branching. Genomics 111:700−9 doi: 10.1016/j.ygeno.2018.04.007 CrossRef Google Scholar
[29]	Han X, An Y, Zhou Y, Liu C, Yin W, et al. 2020. Comparative transcriptome analyses define genes and gene modules differing between two Populus genotypes with contrasting stem growth rates. Biotechnology for Biofuels 13:139 doi: 10.1186/s13068-020-01758-0 CrossRef Google Scholar
[30]	Shi R, Wang JP, Lin YC, Li Q, Sun Y, et al. 2017. Tissue and cell-type co-expression networks of transcription factors and wood component genes in Populus trichocarpa. Planta 245:927−38 doi: 10.1007/s00425-016-2640-1 CrossRef Google Scholar
[31]	Yu L, Ma J, Niu Z, Bai X, Lei W, et al. 2017. Tissue-specific transcriptome analysis reveals multiple responses to salt stress in Populus euphratica seedlings. Genes 8:372 doi: 10.3390/genes8120372 CrossRef Google Scholar
[32]	Sundell D, Street NR, Kumar M, Mellerowicz EJ, Kucukoglu M, et al. 2017. AspWood: high-spatial-resolution transcriptome profiles reveal uncharacterized modularity of wood formation in Populus tremula. The Plant Cell 29:1585−604 doi: 10.1105/tpc.17.00153 CrossRef Google Scholar
[33]	Filichkin SA, Hamilton M, Dharmawardhana PD, Singh SK, Sullivan C, et al. 2018. Abiotic stresses modulate landscape of poplar transcriptome via alternative splicing, differential intron retention, and isoform ratio switching. Frontiers in Plant Science 9:5 doi: 10.3389/fpls.2018.00005 CrossRef Google Scholar
[34]	Zinkgraf M, Gerttula S, Zhao S, Filkov V, Groover A. 2018. Transcriptional and temporal response of Populus stems to gravi-stimulation. Journal of Integrative Plant Biology 60:578−90 doi: 10.1111/jipb.12645 CrossRef Google Scholar
[35]	Rogier O, Chateigner A, Amanzougarene S, Lesage-Descauses MC, Balzergue S, et al. 2018. Accuracy of RNAseq based SNP discovery and genotyping in Populus nigra. BMC Genomics 19:909 doi: 10.1186/s12864-018-5239-z CrossRef Google Scholar
[36]	Liao W, Ji L, Wang J, Chen Z, Ye M, et al. 2014. Identification of glutathione S-transferase genes responding to pathogen infestation in Populus tomentosa. Functional & Integrative Genomics 14:517−29 doi: 10.1007/s10142-014-0379-y CrossRef Google Scholar
[37]	Lu S, Li Q, Wei H, Chang MJ, Tunlaya-Anukit S, et al. 2013. Ptr-miR397a is a negative regulator of laccase genes affecting lignin content in Populus trichocarpa. Proceedings of the National Academy of Sciences of the United States of America 110:10848−53 doi: 10.1073/pnas.1308936110 CrossRef Google Scholar
[38]	Felten J, Vahala J, Love J, Gorzsás A, Rüggeberg M, et al. 2018. Ethylene signaling induces gelatinous layers with typical features of tension wood in hybrid aspen. New Phytologist 218:999−1014 doi: 10.1111/nph.15078 CrossRef Google Scholar
[39]	Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114−20 doi: 10.1093/bioinformatics/btu170 CrossRef Google Scholar
[40]	Delhomme N, Mähler N, Schiffthaler B, Sundell D, Mannapperuma C, et al. 2014. Guidelines for RNA-Seq data analysis. EpiGeneSys Protocol 67:1−24 Google Scholar
[41]	Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, et al. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15−21 doi: 10.1093/bioinformatics/bts635 CrossRef Google Scholar
[42]	Liao Y, Smyth GK, Shi W. 2013. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research 41:e108 doi: 10.1093/nar/gkt214 CrossRef Google Scholar
[43]	Robinson MD, McCarthy DJ, Smyth GK. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139−40 doi: 10.1093/bioinformatics/btp616 CrossRef Google Scholar
[44]	Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15:550 doi: 10.1186/s13059-014-0550-8 CrossRef Google Scholar
[45]	Li B, Dewey CN. 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323 doi: 10.1186/1471-2105-12-323 CrossRef Google Scholar
[46]	Wang Y, Chen Y, Ding L, Zhang J, Wei J, et al. 2016. Validation of reference genes for gene expression by quantitative real-time RT-PCR in stem segments spanning primary to secondary growth in Populus tomentosa. PLoS One 11:e0157370 doi: 10.1371/journal.pone.0157370 CrossRef Google Scholar
[47]	Yun T, Li J, Xu Y, Zhou A, Zong D, et al. 2019. Selection of reference genes for RT-qPCR analysis in the bark of Populus yunnanensis cuttings. Journal of Environmental Biology 40:584−91 doi: 10.22438/jeb/40/3(SI)/Sp-24 CrossRef Google Scholar
[48]	Tang F, Chu L, Shu W, He X, Wang L, et al. 2019. Selection and validation of reference genes for quantitative expression analysis of miRNAs and mRNAs in Poplar. Plant Methods 15:35 doi: 10.1186/s13007-019-0420-1 CrossRef Google Scholar
[49]	Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, et al. 2020. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Molecular Plant 13:1194−202 doi: 10.1016/j.molp.2020.06.009 CrossRef Google Scholar
[50]	Kumar S, Stecher G, Li M, Knyaz C, Tamura K. 2018. Mega X: molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution 35:1547−49 doi: 10.1093/molbev/msy096 CrossRef Google Scholar
[51]	Qu W, Zhou Y, Zhang Y, Lu Y, Wang X, et al. 2012. MFEprimer-2.0: a fast thermodynamics-based program for checking PCR primer specificity. Nucleic Acids Research 40:W205−W208 doi: 10.1093/nar/gks552 CrossRef Google Scholar
[52]	Shi Q, Tian D, Wang J, Chen A, Miao Y, et al. 2023. Overexpression of miR390b promotes stem elongation and height growth in Populus. Horticulture Research 10:uhac258 doi: 10.1093/hr/uhac258 CrossRef Google Scholar
[53]	Urbancsok J, Donev EN, Sivan P, van Zalen E, Barbut FR, et al. 2023. Flexure wood formation via growth reprogramming in hybrid aspen involves jasmonates and polyamines and transcriptional changes resembling tension wood development. New Phytologist 240:2312−34 doi: 10.1111/nph.19307 CrossRef Google Scholar
[54]	Balasubramanian VK, Rivas-Ubach A, Winkler T, Mitchell H, Moran J, et al. 2023. Modulation of polar auxin transport identifies the molecular determinants of source-sink carbon relationships and sink strength in poplar. Tree Physiologytpad073 doi: 10.1093/treephys/tpad073 CrossRef Google Scholar
[55]	Kong L, Song Q, Wei H, Wang Y, Lin M, et al. 2023. The AP2/ERF transcription factor PtoERF15 confers drought tolerance via JA-mediated signaling in Populus. New Phytologist 240:1848−67 doi: 10.1111/nph.19251 CrossRef Google Scholar
[56]	Guo Y, Wang S, Yu K, Wang H, Xu H, et al. 2023. Manipulating microRNA miR408 enhances both biomass yield and saccharification efficiency in poplar. Nature Communications 14:4285 doi: 10.1038/s41467-023-39930-3 CrossRef Google Scholar
[57]	Li M, Dong H, Li J, Dai X, Lin J, et al. 2023. PtrVCS2 regulates drought resistance by changing vessel morphology and stomatal closure in Populus trichocarpa. International Journal of Molecular Sciences 24:4458 doi: 10.3390/ijms24054458 CrossRef Google Scholar
[58]	Ruijter JM, Ramakers C, Hoogaars WMH, Karlen Y, Bakker O, et al. 2009. Amplification efficiency: linking baseline and bias in the analysis of quantitative PCR data. Nucleic Acids Research 37:e45 doi: 10.1093/nar/gkp045 CrossRef Google Scholar
[59]	Yang C, Yuan X, Zhang J, Sun W, Liu Z, et al. 2020. Comprehensive transcriptome analysis of reference genes for fruit development of Euscaphis konishii. PeerJ 8:e8474 doi: 10.7717/peerj.8474 CrossRef Google Scholar
[60]	Liang L, He Z, Yu H, Wang E, Zhang X, et al. 2020. Selection and validation of reference genes for gene expression studies in Codonopsis pilosula based on transcriptome sequence data. Scientific Reports 10:1362 doi: 10.1038/s41598-020-58328-5 CrossRef Google Scholar
[61]	Zhu L, Yang C, You Y, Liang W, Wang N, et al. 2019. Validation of reference genes for qRT-PCR analysis in peel and flesh of six apple cultivars (Malus domestica) at diverse stages of fruit development. Scientia Horticulturae 244:165−71 doi: 10.1016/j.scienta.2018.09.033 CrossRef Google Scholar
[62]	Lyu S, Yu Y, Xu S, Cai W, Chen G, et al. 2020. Identification of appropriate reference genes for normalizing miRNA expression in citrus infected by Xanthomonas citri subsp. citri. Genes 11:17 doi: 10.3390/genes11010017 CrossRef Google Scholar
[63]	Galimba K, Tosetti R, Loerich K, Micheal L, Pabhakar S, et al. 2020. Identification of early fruit development reference genes in plum. PLoS One 15:e0230920 doi: 10.1371/journal.pone.0230920 CrossRef Google Scholar
[64]	Luo M, Gao Z, Li H, Li Q, Zhang C, et al. 2018. Selection of reference genes for miRNA qRT-PCR under abiotic stress in grapevine. Scientific Reports 8:4444 doi: 10.1038/s41598-018-22743-6 CrossRef Google Scholar

About this article

Cite this article

Xie Q, Ahmed U, Qi C, Du K, Luo J, et al. 2024. A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression. Forestry Research 4: e021 doi: 10.48130/forres-0024-0017

Xie Q, Ahmed U, Qi C, Du K, Luo J, et al. 2024. A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression. Forestry Research 4: e021 doi: 10.48130/forres-0024-0017

Figures(7) / Tables(1)

Download PDF

Article Metrics

Article views(6812) PDF downloads(858)

Other Articles By Authors

on this site
- Qi Xie
- Umair Ahmed
- Cheng Qi
- Kebing Du
- Jie Luo
- Pengcheng Wang
- Bo Zheng
- Xueping Shi
on Google Scholar
- Qi Xie
- Umair Ahmed
- Cheng Qi
- Kebing Du
- Jie Luo
- Pengcheng Wang
- Bo Zheng
- Xueping Shi

HTML

Introduction

RNA sequencing (RNA-Seq) technology has emerged as a powerful tool in transcriptomic studies, offering high accuracy, sensitivity, and resolution^[1]. Unlike traditional methods, RNA-Seq does not rely on prior knowledge of specific RNA molecules, making it effective for identifying unknown RNAs^[2]. In RNA-Seq, total RNA or specific RNA fragments are isolated from samples representing different biological conditions or replicated under similar conditions. Recent advancements in next-generation sequencing (NGS) technology have made RNA-Seq the preferred approach for gene expression studies, thanks to its cost-effectiveness and technological improvements. This sequence-based method has revolutionized transcriptome research, enabling various applications, including the analysis of strand-specific expression, the detection of transcript fusions and alternative splicing isoforms, and the characterization of unknown cell types (through single-cell RNA sequencing)^[3,4].

RNA-Seq also enables better discovery of differentially expressed genes (DEGs) in various biological tissues and growth conditions, and may be able to provide high genome coverage even for genes with low expression levels^[5]. RNA-Seq analysis measures transcript abundance by quantifying the fragments generated and the number of reads corresponding to each transcript. Since the total RNA content in a sample is unknown, data normalization is essential. Normalization methods include individual normalization based on the total number of reads and transcript lengths in each sample, resulting in Reads Per Kilobase of exon per Million mapped reads (RPKM) or Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values^[6]. Alternatively, normalization can be achieved using different methods such as the Trimmed-Mean of M-values (TMM), DESeq2's median of ratios, Transcript Per Million (TPM) and Upper Quartile (UQ). Similar to RPKM, TPM doesn't utilize read information from all samples for normalization^[7,8].

In terms of gene expression analysis, RNA-Seq experiments primarily focus on identifying DEGs in specific biological conditions. However, apart from DEGs, there are numerous genes known as constitutively expressed genes (CEGs) that exhibit consistent expression across different cells or developmental stages, regardless of environmental conditions. For instance, a study on rice revealed that 22.7% of transcripts were expressed by CEGs in 39 different rice tissues^[9]. Surprisingly, recent studies suggest that CEGs exhibit variable expression under different conditions and are used as reference genes (RGs)^[10,11]. An ideal RG is one that remains unaffected by any experimental condition, shows stable expression, has no pseudogenes, and has a mid-range of quantification cycles or Cq values (Cq = 15–25) in real-time quantitative reverse transcription polymerase chain reaction (RT-qPCR)^[12]. So technically speaking, every RG is CEG but not every CEG is RG^[13].

RT-qPCR is a reliable technique for transcript detection and measurement. To ensure accurate RT-qPCR results, it is crucial to select suitable RGs for normalization, following the guidelines outlined in the MIQE (Minimum Information for Publication of Quantitative Real-Time PCR Experiments) standards^[14]. RGs should ideally display constant expression levels across various plant tissues, developmental stages, or physiological conditions. They should remain unaffected by external treatments and can be used without the need for stability validation^[15]. However, studies focusing on RG validation and comprehensive exploration of transcriptome data in model plants have shown that the accuracy of endogenous controls can be significantly influenced by factors such as plant species, the specific cells/tissues/organs under investigation, and growth conditions^[16]. Therefore, the selection of appropriate RGs is a critical step in normalizing RT-qPCR data, as an incorrect selection may lead to ambiguous or even incorrect results^[17,18].

Traditionally, RGs in many horticultural plants were selected from cellular housekeeping genes in the absence of large genomic datasets^[12]. Examples include elongation factor-1α (EF-1A) in zucchini, actin in poplar, and ubiquitin conjugating-enzyme (UBC) in banana^[12]. However, it has been observed that certain RGs exhibit significant expression variability among different conditions and tissue types^[19]. Moreover, even within species, different RG candidates may show varying expression stability under specific experimental conditions or tissue types^[20,21]. Therefore, it is crucial to validate the expression stability of candidate RGs before utilizing them for data normalization. Only those RG candidates that have undergone rigorous validation for expression stability can be considered reliable RGs for specific conditions or tissue types^[22]. Several statistical tools, such as BestKeeper, geNorm, and NormFinder, are commonly employed to identify the most suitable candidate RGs under specific experimental settings^[12]. These tools aid in selecting RGs that exhibit minimal expression variability across various conditions or tissues.

Forest trees and horticultural plants encompass a diverse range of species, many of which are still in the early stages of genomic and functional genomic research. The emergence of RNA-Seq technology has significantly advanced gene annotation, expression profiling, and functional studies in these plants. The availability of extensive RNA-Seq data sets provides valuable resources for selecting suitable RGs across different species and under various experimental conditions^[23−25]. Poplar, a widely distributed tree species with significant applications in wood production, environmental protection, and urban greening, serves as an important model plant for woody species^[26]. Extensive research has been conducted on poplar stem growth and development, leading to the generation of large-scale transcriptome data sets^[27−29]. In this study, the objective is to retrieve and assess the quality of publicly available transcriptome data sets related to poplar stem tissues. Subsequently, we aim to predict and evaluate the best candidate RGs for gene expression analysis in stems of various Populus species. The expression stability of these candidate RGs under different stress conditions will be validated using RT-qPCR, leading to the identification of reliable RGs specific to poplar stems. Furthermore, based on this case study, we intend to establish a comprehensive pipeline for the development of RGs for accurate gene expression normalization using RNA-Seq data.

Discussion

The selection of suitable RGs for RT-qPCR normalization is crucial in gene expression analysis. While previous studies have often relied on literature-reported RGs, there is a growing recognition of the need for systematic selection of stable RGs. In this study, the aim was to identify constitutively and stably expressed genes that can serve as reliable internal controls for RT-qPCR experiments. To achieve this, both novel and reported RGs were retrieved from transcriptome datasets (Supplemental Table S1) of Populus at the genus level. These datasets provided a comprehensive view of gene expression under different developmental stages and abiotic stress conditions. Based on the CV values of gene expression, 12 novel candidate RGs and six reported RGs that demonstrated stable expression across these conditions were selected (Fig. 4). Furthermore, these novel RG candidate genes were evaluated using the latest poplar stem-related RNA-Seq data from public databases (Fig. 7) and three of them were suggested for poplar stem-related research, Potri.001G349400/CNOT2, Potri.002G197600/FIP37.1 and Potri.002G157500/RH8. By systematically selecting RGs based on their expression stability, the present study provides researchers with an important resource for gene expression analysis in the stems of Populus and potentially other plant species. These stable RGs can serve as reliable internal controls for RT-qPCR experiments, enabling more accurate and robust gene expression studies.

Normalization methods play a crucial role in identifying internal RGs based on transcriptome data. In this study, the efficacy of several methods commonly used to identify RGs based on RNA-Seq data were evaluated. We compared four common normalization methods: FPKM, TPM, TMM, and DESeq2's median of ratios. Variation among the data is typically evaluated using metrics such as MFC (mean fold change), SD, and interquartile of expression level. CV value is a valuable metric for assessing the variability of gene expression levels relative to their mean, and it has been widely used to identify suitable RGs from transcriptome datasets^[59,60]. In many studies, the top 1,000 expressed transcripts with the lowest CV values in contrasting environments are selected as stably expressed genes. The threshold for CV values is often set at < 16% or < 30%. Alternatively, some studies choose genes with a low CV of logarithmically transformed RPKM or transcript copy numbers, typically with a threshold of < 4%. The log₂ (normalized values) and average CV values obtained by the five methods were evaluated. The results showed that the DESeq2's median of ratios and TMM normalization methods provided higher consistency compared to RC, FPKM, and TPM. DESeq2's median of ratios yielded the smallest CV values when tested with the reported RGs, and was therefore chosen as the normalization method for RNA-Seq data in this study. This approach ensures reliable and accurate normalization, resulting in more robust gene expression analysis.

Comprehensive analysis of RNA-Seq data obtained from different Populus species under different conditions facilitated the identification of highly reliable and broadly applicable candidate RGs. Only four out of 30 previously reported RGs have been identified as stably expressed genes across diverse conditions and environments based on their CV values (Fig. 3a). In addition to CV values, the expression levels of transcripts are also important considerations when selecting RGs from transcriptomic data. It is generally preferred to choose transcripts with higher expression levels as internal controls for gene quantification due to reasons of efficiency and accuracy^[24]. In this study, both the stably expressed genes and the reported RGs generally had high gene expression levels (Fig. 3b). KEGG enrichment showed that most of these genes were involved in various key biological processes related to gene expression (Supplemental Dataset 4), providing further insights into the functional relevance of these genes. Only four reported RGs, PP2A-2, EIF4A, CDPK, and ATPase, exhibited CV values below 30%, while all the stably expressed genes had CV values < 30% (Figs 3a & 4). This indicates that the newly selected candidate genes have great potential as RGs.

In many previous studies, the selection of suitable RGs for gene expression studies are often limited to specific species and conditions^[61−64]. This study was aimed to evaluate the novel and reported RGs at the genus level of Populus. Therefore, it is crucial to design primers that not only exhibit gene specificity within a particular species but also are universal among different Populus species. To ensure primer versatility, we employed an integrative approach to design gene-specific primers to amplify orthologous RGs from multiple species (Fig. 5). Consensus sequences based on multiple sequence alignments of candidate RGs and their orthologs from different Populus species were used for primer design. Subsequently, the gene specificity of the primers in five Populus genomes were verified using mfeprimer-3.2.0. This approach allows for consistent and reliable gene expression analysis across multiple Populus species, facilitating broader applicability of universal primer pairs within the genus.

By employing a combination of bioinformatics tools and analysis methods, a set of candidate RGs with high stability and applicability were successfully identified in various experimental conditions. The use of RefFinder, which integrates popular stability evaluation algorithms such as geNorm, NormFinder, BestKeeper, and delta-Ct method, allowed us to comprehensively evaluate the performance of the 18 tested genes in RT-qPCR analysis. All these RGs exhibited high stability in both young and mature stems of various Populus cultivars, as well as under different stress treatments in poplar 717 (Fig. 6). To further test the applicability of the candidate RGs in this study, their expression stability was evaluated using the latest RNA-Seq data from public databases (Supplemental Table S4, Fig. 7). The functional relevance of the best-performing RG, Potri.001G349400/CNOT2 needs to be highlighted. CNOT2 is a core member of the Carbon catabolite repression4 (Ccr4)–NOT complex, which plays a crucial role in transcriptional regulation. In addition, Potri.002G197600/FIP37.1 and Potri.002G157500/RH8 can also be considered as novel RGs for poplar stem gene expression analysis based on their good performance in different tests in this study. In the analysis of the training and the test datasets, only two reported RGs (EIF4A and ATPase) had CV values below 0.3 in both cases. Therefore, most reported RGs are not suitable for gene expression analysis in poplar stems. The novel RGs developed in this study have strong applicability for gene expression analysis in poplar stems, but whether they apply to other tissues requires analysis based on relevant RNA-Seq data. The species involved in RNA-Seq and RT-qPCR in this study are relatively common in current poplar research and are also widely represented in the genus Populus, which is very useful for increasing the applicability of these RGs. When studying gene expression in other Populus species not included in this study, performing some amount of RNA-Seq might help improve the applicability of these RGs.

It is not cost-effective to perform RT-qPCR to test all reported RGs for each new experiment. As gene expression data based on RNA-Seq continue to accumulate from diverse experiments and species, the approach employed in this study holds promise for integrating and mining such data in a meaningful way. Identification of stable and reliable RGs will facilitate accurate and standardized gene expression analysis across different conditions and species.

Conclusions

In conclusion, the current study presents a novel methodology for selecting stably expressed genes from transcriptome expression data, which offers several advantages over traditional approaches. By combining expression stability analysis and RT-qPCR validation, we successfully identified a set of novel and stable RGs for Populus stems. This methodology is targeted, convenient, and efficient, enabling the identification of new and more reliable RGs. The analytical method we developed can be applied to other plant genera and will greatly help researchers compare gene expression patterns in different species within the same genus. Furthermore, the identified RGs for stem development in Populus will serve as valuable tools for studying gene expression dynamics during wood formation in plants.

Author contributions

The authors confirm contribution to the paper as follows: study conception and design: Zheng B, Shi X; bioinformatics analysis: Xie Q; conducting the experiments: Ahmed U; participating in the experiments: Qi C, Du K, Luo J, Wang P; draft manuscript preparation: Xie Q, Ahmed U; manuscript revision: Shi X, Zheng B, Xie Q, Ahmed U. All authors reviewed the results and approved the final version of the manuscript.

Gene ID	Gene name	Description	CV
Potri.001G349400	CNOT2	CCR4-NOT transcription complex subunit 2	0.168
Potri.002G157500	RH8	Similar to DEAD/DEAH box helicase	0.172
Potri.005G110600	VPS35	Vacuolar protein sorting-associated protein 35	0.173
Potri.002G197600	FIP37.1	Similar to ARABIDOPSIS THALIANA FKBP12 INTERACTING PROTEIN 37	0.175
Potri.013G070001	NA	UDP-glucose pyrophosphorylase	0.186
Potri.001G197400	Pt-UBP6.2	Similar to UBIQUITIN-SPECIFIC PROTEASE 6	0.190
Potri.006G116700	U2AF1	Splicing factor U2AF 35 kDa subunit	0.193
Potri.008G111700	NA	Predicted hydrolases of HD superfamily	0.194
Potri.011G084400	CUL4	Similar to hypothetical protein	0.194
Potri.004G064400	NA	Similar to ankyrin protein kinase	0.198
Potri.008G217300	Pt-CUL1.4	Similar to cullin-like protein1	0.199
Potri.003G045700	Pt-ATRLI1.2	Similar to RNase L inhibitor protein; putative	0.202
NA: not available. The estimated CVs were based on the normalization method of DESeq2's median of ratios.

{{lists.name}}

A protocol for identifying universal reference genes within a genus based on RNA-Seq data: a case study of poplar stem gene expression