Haplotype-resolved genome assembly of poplar line NL895 provides a valuable tree genomic resource

Jie Luo; Yan Wang; Zihui Li; Ziwei Wang; Xu Cao; Nian Wang; Jie Luo; Yan Wang; Zihui Li; Ziwei Wang; Xu Cao; Nian Wang

doi:10.48130/forres-0024-0013

2024 Volume 4

Article Contents

Next Previous

ARTICLE Open Access

Haplotype-resolved genome assembly of poplar line NL895 provides a valuable tree genomic resource

1.
College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan 430070, China
2.
Jiangsu Key Laboratory of Sericultural Biology and Biotechnology, College of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang 212013, Jiangsu, China
3.
Key Laboratory of Silkworm and Mulberry Genetic Improvement, Ministry of Agriculture and Rural Affairs, Chinese Academy of Agricultural Sciences, Sericultural Research Institute, Zhenjiang 212013, Jiangsu, China

More Information

Corresponding author: wangn@mail.hzau.edu.cn

Received: 15 November 2023
Revised: 13 March 2024
Accepted: 07 April 2024
Published online: 23 April 2024
Forestry Research 4, Article number: e015 (2024) | Cite this article

Abstract

Poplar line NL895 can potentially become a model plant for poplar study as it is a widely cultivated elite line. However, the lack of genome resources hindered the use of NL895 as the major plant material in poplar. In this study, we provided a high-quality genome assembly for poplar line NL895 with PacBio single molecule real-time (SMRT) sequencing and High-throughput chromosome conformation capture (Hi-C) technology. The raw assembly of NL895 for the diploid genome included 606 contigs with a total size of ~815 Mb, and the monoploid genome included 246 contigs with a total size of ~412 Mb. The haplotype-resolved chromosomes in the diploid genomes were also generated. All the monoploid, diploid, and haplotype-resolved genomes showed more than 97% completeness and they can largely improve the mapping efficiency in RNA-Seq analysis. By comprehensively comparing the two haplotype genomes we found the heterozygosity of NL895 is much higher than other poplar lines. We also found that NL895 harbors more genomic variants and more gene diversity. The haplotype-specific genes showed higher variable gene expression patterns. These characters would be attributed to the high heterosis of poplar line NL895. The allele-specific expression (ASE) was also investigated and lots of alleles showed biased expressions in different tissues or environmental conditions. Taken together, the genome sequence for NL895 is a valuable tree genomic resource and it would greatly facilitate studies in poplar.
- Poplar,
- NL895,
- Genome assembly,
- Haplotype-resolved genome,
- Heterozygosity,
- Allele specific expression (ASE)

Supplementary information

Supplemental Table S1 Summary of sequence data used for genome assembly.
Supplemental Table S2 Summary of poplar NL895 purged genome assembly with different software.
Supplemental Table S3 Summary of sequence data used for gene prediction.
Supplemental Table S4 Mapping rate for RNA-Seq mapped onto 3 genomes.
Supplemental Table S5 Genomic variants on NL895 diploid genome.
Supplemental Table S6 The categories of genomic variants in NL895 diploid genome.
Supplemental Table S7 Effects of genomic variants on genes in NL895 diploid genome.
Supplemental Table S8 Details of effects caused by genomic variants on genes in NL895 diploid genome.
Supplemental Table S9 Heterozygous variants in NL895 and 88 poplar lines.
Supplemental Table S10 Gene pairs in NL895 diploid genome.
Supplemental Table S11 Motif numbers in promoter.
Supplemental Fig. S1 Distribution of 17-bp Kmers.
Supplemental Fig. S2 Hi-C interaction matrix heatmap of NL895 diploid genome.

Rights and permissions
Copyright: © 2024 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Zhang B, Zhu W, Diao S, Wu X, Lu J, et al. 2019. The poplar pangenome provides insights into the evolutionary history of the genus. Communications Biology 2:215 doi: 10.1038/s42003-019-0474-7 CrossRef Google Scholar
[2]	Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, et al. 2006. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596−604 doi: 10.1126/science.11286 CrossRef Google Scholar
[3]	Yates TB, Feng K, Zhang J, Singan V, Jawdy SS, et al. 2021. The ancient salicoid genome duplication event: a platform for reconstruction of de novo gene evolution in Populus trichocarpa. Genome Biology and Evolution 13:evab198 doi: 10.1093/gbe/evab198 CrossRef Google Scholar
[4]	Schiffthaler B, Delhomme N, Bernhardsson C, Jenkins J, Jansson S, et al. 2019. An improved genome assembly of the European aspen Populus tremula. bioRxiv doi: 10.1101/805614 CrossRef Google Scholar
[5]	Lin YC, Wang J, Delhomme N, Schiffthaler B, Sundström G, et al. 2018. Functional and evolutionary genomic inferences in Populus through genome and population sequencing of American and European aspen. Proceedings of the National Academy of Sciences of the United States of America 115:E10970−E10978 doi: 10.1073/pnas.1801437115 CrossRef Google Scholar
[6]	Liu S, Wang Z, Shi T, Dan X, Zhang Y, et al. 2023. Chromosomal-level genome assembly of Populus adenopoda. bioRxiv doi: 10.1101/2023.07.11.548479 CrossRef Google Scholar
[7]	Long Z, Sang Y, Feng J, Shi T, Dan X, et al. 2023. Chromosomal-level genome assembly of Populus lasiocarpa. bioRxiv doi: 10.1101/2023.07.11.548483 CrossRef Google Scholar
[8]	Zhang S, Wu Z, Ma D, Zhai J, Han X, et al. 2022. Chromosome-scale assemblies of the male and female Populus euphratica genomes reveal the molecular basis of sex determination and sexual dimorphism. Communications Biology 5:1186 doi: 10.1038/s42003-022-04145-7 CrossRef Google Scholar
[9]	Ma T, Wang J, Zhou G, Yue Z, Hu Q, et al. 2013. Genomic insights into salt adaptation in a desert poplar. Nature Communications 4:2797 doi: 10.1038/ncomms3797 CrossRef Google Scholar
[10]	Li C, Xing H, Li C, Ren Y, Li H, et al. 2022. Chromosome-scale genome assembly provides insights into the molecular mechanisms of tissue development of Populus wilsonii. Communications Biology 5:1125 doi: 10.1038/s42003-022-04106-0 CrossRef Google Scholar
[11]	Qiu D, Bai S, Ma J, Zhang L, Shao F, et al. 2019. The genome of Populus alba × Populus tremula var. glandulosa clone 84K. DNA Research 26:423−31 doi: 10.1093/dnares/dsz020 CrossRef Google Scholar
[12]	Huang X, Chen S, Peng X, Bae EK, Dai X, et al. 2021. An improved draft genome sequence of hybrid Populus alba × Populus glandulosa. Journal of Forestry Research 32:1663−72 doi: 10.1007/s11676-020-01235-2 CrossRef Google Scholar
[13]	Bai S, Wu H, Zhang J, Pan Z, Zhao W, et al. 2021. Genome assembly of Salicaceae Populus deltoides (Eastern Cottonwood) I-69 based on nanopore sequencing and Hi-C technologies. Journal of Heredity 112:303−10 doi: 10.1093/jhered/esab010 CrossRef Google Scholar
[14]	Li Y, Wang D, Wang W, Yang W, Gao J, et al. 2023. A chromosome-level Populus qiongdaoensis genome assembly provides insights into tropical adaptation and a cryptic turnover of sex determination. Molecular Ecology 32:1366−80 doi: 10.1111/mec.16566 CrossRef Google Scholar
[15]	Zhang Z, Chen Y, Zhang J, Ma X, Li Y, et al. 2020. Improved genome assembly provides new insights into genome evolution in a desert poplar (Populus euphratica). Molecular Ecology Resources 20:781−94 doi: 10.1111/1755-0998.13142 CrossRef Google Scholar
[16]	An X, Gao K, Chen Z, Li J, Yang X, et al. 2022. High quality haplotype-resolved genome assemblies of Populus tomentosa Carr., a stabilized interspecific hybrid species widespread in Asia. Molecular Ecology Resources 22:786−802 doi: 10.1111/1755-0998.13507 CrossRef Google Scholar
[17]	Chen S, Yu Y, Wang X, Wang S, Zhang T, et al. 2023. Chromosome-level genome assembly of a triploid poplar Populus alba 'Berolinensis'. Molecular Ecology Resources 23:1092−107 doi: 10.1111/1755-0998.13770 CrossRef Google Scholar
[18]	Ma J, Wan D, Duan B, Bai X, Bai Q, et al. 2019. Genome sequence and genetic transformation of a widely distributed and cultivated poplar. Plant Biotechnology Journal 17:451−60 doi: 10.1111/pbi.12989 CrossRef Google Scholar
[19]	Liu Y, Wang X, Zeng Q. 2019. De novo assembly of white poplar genome and genetic diversity of white poplar population in Irtysh River basin in China. Science China Life Sciences 62:609−18 doi: 10.1007/s11427-018-9455-2 CrossRef Google Scholar
[20]	Bae EK, Kang MJ, Lee SJ, Park EJ, Kim KT. 2023. Chromosome-level genome assembly of the Asian aspen Populus davidiana Dode. Scientific Data 10:431 doi: 10.1038/s41597-023-02350-5 CrossRef Google Scholar
[21]	Yang W, Wang K, Zhang J, Ma J, Liu J, et al. 2017. The draft genome sequence of a desert tree Populus pruinosa. GigaScience 6:gix075 doi: 10.1093/gigascience/gix075 CrossRef Google Scholar
[22]	Chen Z, Ai F, Zhang J, Ma X, Yang W, et al. 2020. Survival in the Tropics despite isolation, inbreeding and asexual reproduction: insights from the genome of the world's southernmost poplar (Populus ilicifolia). The Plant Journal 103:430−42 doi: 10.1111/tpj.14744 CrossRef Google Scholar
[23]	Zhou R, Jenkins JW, Zeng Y, Shu S, Jang H, et al. 2023. Haplotype-resolved genome assembly of Populus tremula × P. alba reveals aspen-specific megabase satellite DNA. The Plant Journal 116:1003−17 doi: 10.1111/tpj.16454 CrossRef Google Scholar
[24]	Wu H, Yao D, Chen Y, Yang W, Zhao W, et al. 2020. De novo genome assembly of Populus simonii further supports that Populus simonii and Populus trichocarpa belong to different sections. G3 Genes\|Genomes\|Genetics 10:455−66 doi: 10.1534/g3.119.400913 CrossRef Google Scholar
[25]	Shen L, Ding C, Zhang W, Zhang T, Li Z, et al. 2023. The Populus koreana genome provides insights into the biosynthesis of plant aroma. Industrial Crops and Products 197:116453 doi: 10.1016/j.indcrop.2023.116453 CrossRef Google Scholar
[26]	Zhang Y, Tian Y, Ding S, Lv Y, Samjhana W, et al. 2020. Growth, carbon storage, and optimal rotation in poplar plantations: a case study on clone and planting spacing effects. Forests 11:842 doi: 10.3390/f11080842 CrossRef Google Scholar
[27]	Zhang Y, Yang X, Cao P, Xiao Z, Zhan C, et al. 2020. The bZIP53–IAA4 module inhibits adventitious root development in Populus. Journal of Experimental Botany 71:3485−98 doi: 10.1093/jxb/eraa096 CrossRef Google Scholar
[28]	Luo J, Nvsvrot T, Wang N. 2021. Comparative transcriptomic analysis uncovers conserved pathways involved in adventitious root formation in poplar. Physiology and Molecular Biology of Plants 27:1903−18 doi: 10.1007/s12298-021-01054-7 CrossRef Google Scholar
[29]	Cai G, Zhang Y, Huang L, Wang N. 2023. Uncovering the role of PdePrx12 peroxidase in enhancing disease resistance in poplar trees. Journal of Fungi 9:410 doi: 10.3390/jof9040410 CrossRef Google Scholar
[30]	Yang X, Zhang K, Nvsvrot T, Zhang Y, Cai G, et al. 2022. Phosphate (Pi) stress-responsive transcription factors PdeWRKY6 and PdeWRKY65 regulate the expression of PdePHT1;9 to modulate tissue Pi concentration in poplar. The Plant Journal 111:1753−67 doi: 10.1111/tpj.15922 CrossRef Google Scholar
[31]	Luo J, Xia W, Cao P, Xiao Z, Zhang Y, et al. 2019. Integrated transcriptome analysis reveals plant hormones jasmonic acid and salicylic acid coordinate growth and defense responses upon fungal infection in poplar. Biomolecules 9:12 doi: 10.3390/biom9010012 CrossRef Google Scholar
[32]	Gui J, Luo L, Zhong Y, Sun J, Umezawa T, et al. 2019. Phosphorylation of LTF1, an MYB transcription factor in Populus, acts as a sensory switch regulating lignin biosynthesis in wood cells. Molecular Plant 12:1325−37 doi: 10.1016/j.molp.2019.05.008 CrossRef Google Scholar
[33]	Li R, Wang Z, Wang J, Li L. 2023. Combining single-cell RNA sequencing with spatial transcriptome analysis reveals dynamic molecular maps of cambium differentiation in the primary and secondary growth of trees. Plant Communications 4:100665 doi: 10.1016/j.xplc.2023.100665 CrossRef Google Scholar
[34]	Zhang Y, Cai G, Zhang K, Sun H, Huang L, et al. 2024. PdeERF114 recruits PdeWRKY75 to regulate callus formation in poplar by modulating the accumulation of H₂O₂ and the relaxation of cell walls. New Phytologist 241:732−46 doi: 10.1111/nph.19349 CrossRef Google Scholar
[35]	Zhang Y, Xiao Z, Zhan C, Liu M, Xia W, et al. 2019. Comprehensive analysis of dynamic gene expression and investigation of the roles of hydrogen peroxide during adventitious rooting in poplar. BMC Plant Biology 19:99 doi: 10.1186/s12870-019-1700-7 CrossRef Google Scholar
[36]	Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114−20 doi: 10.1093/bioinformatics/btu170 CrossRef Google Scholar
[37]	Wang Y, Huang J, Li E, Xu S, Zhan Z, et al. 2022. Phylogenomics and biogeography of Populus based on comprehensive sampling reveal deep-level relationships and multiple intercontinental dispersals. Frontiers in Plant Science 13:813177 doi: 10.3389/fpls.2022.813177 CrossRef Google Scholar
[38]	Liu B, Shi Y, Yuan J, Hu X, Zhang H, et al. 2013. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. arXiv 35:62−67 doi: 10.48550/arXiv.1308.2012 CrossRef Google Scholar
[39]	Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, et al. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research 27:722−36 doi: 10.1101/gr.215087.116 CrossRef Google Scholar
[40]	Cheng HY, Concepcion GT, Feng XW, Zhang HW, Li H. 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18:170−75 doi: 10.1038/s41592-020-01056-5 CrossRef Google Scholar
[41]	Xiao C, Chen Y, Xie S, Chen K, Wang Y, et al. 2017. MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nature Methods 14:1072−74 doi: 10.1038/nmeth.4432 CrossRef Google Scholar
[42]	Ruan J, Li H. 2020. Fast and accurate long-read assembly with wtdbg2. Nature Methods 17:155−58 doi: 10.1038/s41592-019-0669-3 CrossRef Google Scholar
[43]	Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, et al. 2014. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9:e112963 doi: 10.1371/journal.pone.0112963 CrossRef Google Scholar
[44]	Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, et al. 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356:92−95 doi: 10.1126/science.aal3327 CrossRef Google Scholar
[45]	Dudchenko O, Shamim MS, Batra SS, Durand NC, Musial NT, et al. 2018. The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. bioRxiv doi: 10.1101/254797 CrossRef Google Scholar
[46]	Seppey M, Manni M, Zdobnov EM. 2019. BUSCO: Assessing genome assembly and annotation completeness. In Gene Prediction, ed. Kollmar M. Volume 1962. New York, NY: Humana. pp. 227–45. https://doi.org/10.1007/978-1-4939-9173-0_14
[47]	Pan L, Liu M, Kang Y, Mei X, Hu G, et al. 2023. Comprehensive genomic analyses of Vigna unguiculata provide insights into population differentiation and the genetic basis of key agricultural traits. Plant Biotechnology Journal 21:1426−39 doi: 10.1111/pbi.14047 CrossRef Google Scholar
[48]	Nie C, Zhang Y, Zhang X, Xia W, Sun H, et al. 2023. Genome assembly, resequencing and genome-wide association analyses provide novel insights into the origin, evolution and flower colour variations of flowering cherry. The Plant Journal 114:519−33 doi: 10.1111/tpj.16151 CrossRef Google Scholar
[49]	Luo J, Ren W, Cai G, Huang L, Shen X, et al. 2022. The chromosome-scale genome sequence of Triadica sebifera provides insight into fatty acids and anthocyanin biosynthesis. Communications Biology 5:786 doi: 10.1038/s42003-022-03751-9 CrossRef Google Scholar
[50]	Stanke M, Keller O, Gunduz I, Hayes A, Waack S, et al. 2006. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research 34:W435−W439 doi: 10.1093/nar/gkl200 CrossRef Google Scholar
[51]	Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, et al. 2008. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biology 9:R7 doi: 10.1186/gb-2008-9-1-r7 CrossRef Google Scholar
[52]	Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, et al. 2003. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31:5654−66 doi: 10.1093/nar/gkg770 CrossRef Google Scholar
[53]	Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37:907−15 doi: 10.1038/s41587-019-0201-4 CrossRef Google Scholar
[54]	Robinson MD, McCarthy DJ, Smyth GK. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139−40 doi: 10.1093/bioinformatics/btp616 CrossRef Google Scholar
[55]	McCormick RF, Truong SK, Mullet JE. 2015. RIG: recalibration and interrelation of genomic sequence data with the GATK. G3 Genes\|Genomes\|Genetics 5:655−65 doi: 10.1534/g3.115.017012 CrossRef Google Scholar
[56]	Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, et al. 2018. MUMmer4: a fast and versatile genome alignment system. PLoS Computational Biology 14:e1005944 doi: 10.1371/journal.pcbi.1005944 CrossRef Google Scholar
[57]	O'Donnell S, Fischer G. 2020. MUM&Co: accurate detection of all SV types through whole-genome alignment. Bioinformatics 36:3242−43 doi: 10.1093/bioinformatics/btaa115 CrossRef Google Scholar
[58]	Yousaf A, Liu J, Ye S, Chen H. 2021. Current progress in evolutionary comparative genomics of great apes. Frontiers in Genetics 12:657468 doi: 10.3389/fgene.2021.657468 CrossRef Google Scholar
[59]	Gregg C, Zhang J, Butler JE, Haig D, Dulac C. 2010. Sex-specific parent-of-origin allelic expression in the mouse brain. Science 329:682−85 doi: 10.1126/science.1190831 CrossRef Google Scholar
[60]	Zhang J, Zhang W, Ji F, Qiu J, Song X, et al. 2020. A high-quality walnut genome assembly reveals extensive gene expression divergences after whole-genome duplication. Plant Biotechnology Journal 18:1848−50 doi: 10.1111/pbi.13350 CrossRef Google Scholar
[61]	Liu J, Shi C, Shi C, Li W, Zhang Q, et al. 2020. The chromosome-based rubber tree genome provides new insights into spurge genome evolution and rubber biosynthesis. Molecular Plant 13:336−50 doi: 10.1016/j.molp.2019.10.017 CrossRef Google Scholar
[62]	He L, Jia K, Zhang R, Wang Y, Shi T, et al. 2021. Chromosome-scale assembly of the genome of Salix dunnii reveals a male-heterogametic sex determination system on chromosome 7. Molecular Ecology Resources 21:1966−82 doi: 10.1111/1755-0998.13362 CrossRef Google Scholar
[63]	Yang YZ, Cuenca J, Wang N, Liang ZC, Sun HH, et al. 2020. A key 'foxy' aroma gene is regulated by homology-induced promoter indels in the iconic juice grape 'Concord'. Horticulture Research 7:67 doi: 10.1038/s41438-020-0304-6 CrossRef Google Scholar
[64]	Wu X, Liu Y, Zhang Y, Gu R. 2021. Advances in research on the mechanism of heterosis in plants. Frontiers in Plant Science 12:745726 doi: 10.3389/fpls.2021.745726 CrossRef Google Scholar
[65]	Liu S, Zhang L, Sang Y, Lai Q, Zhang X, et al. 2022. Demographic history and natural selection shape patterns of deleterious mutation load and barriers to introgression across Populus genome. Molecular Biology and Evolution 39:msac008 doi: 10.1093/molbev/msac008 CrossRef Google Scholar
[66]	Ma T, Wang K, Hu Q, Xi Z, Wan D, et al. 2017. Ancient polymorphisms and divergence hitchhiking contribute to genomic islands of divergence within a poplar species complex. Proceedings of the National Academy of Sciences of the United States of America 115:E236−E243 doi: 10.1073/pnas.1713288114 CrossRef Google Scholar
[67]	Wang M, Zhang L, Zhang Z, Li M, Wang D, et al. 2020. Phylogenomics of the genus Populus reveals extensive interspecific gene flow and balancing selection. New Phytologist 225:1370−82 doi: 10.1111/nph.16215 CrossRef Google Scholar
[68]	Liu N, Du Y, Warburton ML, Xiao Y, Yan J. 2021. Phenotypic plasticity contributes to maize adaptation and heterosis. Molecular Biology and Evolution 38:1262−75 doi: 10.1093/molbev/msaa283 CrossRef Google Scholar
[69]	Blum A. 2013. Heterosis, stress, and the environment: a possible road map towards the general improvement of crop yield. Journal of Experimental Botany 64:4829−37 doi: 10.1093/jxb/ert289 CrossRef Google Scholar
[70]	López-Maury L, Marguerat S, Bähler J. 2008. Tuning gene expression to changing environments: from rapid responses to evolutionary adaptation. Nature Reviews Genetics 9:583−93 doi: 10.1038/nrg2398 CrossRef Google Scholar
[71]	Ho WC, Zhang J. 2019. Genetic gene expression changes during environmental adaptations tend to reverse plastic changes even after the correction for statistical nonindependence. Molecular Biology and Evolution 36:1847−48 doi: 10.1093/molbev/msz073 CrossRef Google Scholar
[72]	Nvsvrot T, Yang X, Zhang Y, Huang L, Cai G, et al. 2023. The PdeWRKY65-UGT75L28 gene module negatively regulates lignin biosynthesis in poplar petioles. Industrial Crops and Products 191:115937 doi: 10.1016/j.indcrop.2022.115937 CrossRef Google Scholar
[73]	Liu M, Huang L, Zhang Y, Yan Z, Wang N. 2022. Overexpression of PdeGATA3 results in a dwarf phenotype in poplar by promoting the expression of PdeSTM and altering the content of gibberellins. Tree Physiology 42:2614−26 doi: 10.1093/treephys/tpac086 CrossRef Google Scholar
[74]	Xiao Z, Zhang Y, Liu M, Zhan C, Yang X, et al. 2020. Coexpression analysis of a large-scale transcriptome identified a calmodulin-like protein regulating the development of adventitious roots in poplar. Tree Physiology 40:1405−19 doi: 10.1093/treephys/tpaa078 CrossRef Google Scholar
[75]	Luo J, Liang Z, Wu M, Mei L. 2019. Genome-wide identification of BOR genes in poplar and their roles in response to various environmental stimuli. Environmental and Experimental Botany 164:101−13 doi: 10.1016/j.envexpbot.2019.04.006 CrossRef Google Scholar
[76]	Xia W, Yu H, Cao P, Luo J, Wang N. 2017. Identification of TIFY family genes and analysis of their expression profiles in response to phytohormone treatments and infection in poplar. Frontiers in Plant Science 8:493 doi: 10.3389/fpls.2017.00493 CrossRef Google Scholar
[77]	Zhang L, Liu M, Qiao G, Jiang J, Jiang Y, et al. 2013. Transgenic poplar "NL895" expressing CpFATB gene shows enhanced tolerance to drought stress. Acta Physiologiae Plantarum 35:603−13 doi: 10.1007/s11738-012-1101-0 CrossRef Google Scholar
[78]	Chen Y, Yuan B, Wei Z, Chen X, Chen Y, et al. 2018. The ion homeostasis and ROS scavenging responses in 'NL895' poplar plantlet organs under in vitro salinity stress. In Vitro Cellular & Developmental Biology-Plant 54:318−31 doi: 10.1007/s11627-018-9896-z CrossRef Google Scholar

About this article

Cite this article

Luo J, Wang Y, Li Z, Wang Z, Cao X, et al. 2024. Haplotype-resolved genome assembly of poplar line NL895 provides a valuable tree genomic resource. Forestry Research 4: e015 doi: 10.48130/forres-0024-0013

Luo J, Wang Y, Li Z, Wang Z, Cao X, et al. 2024. Haplotype-resolved genome assembly of poplar line NL895 provides a valuable tree genomic resource. Forestry Research 4: e015 doi: 10.48130/forres-0024-0013

Figures(5) / Tables(2)

Download PDF

Article Metrics

Article views(7658) PDF downloads(1612)

Other Articles By Authors

on this site
- Jie Luo
- Yan Wang
- Zihui Li
- Ziwei Wang
- Xu Cao
- Nian Wang
on Google Scholar
- Jie Luo
- Yan Wang
- Zihui Li
- Ziwei Wang
- Xu Cao
- Nian Wang

HTML

Introduction

Poplar (Populus) is a fast-growing tree species that consists of ca. 30 species, and is widely distributed in the Northern Hemisphere. The relatively small genome size, ease of genetic transformation by Agrobacterium, and vegetative propagation make Populus a model for tree biology studies^[1]. The first poplar genome Populus trichocarpa was released in 2006, to date, ca. 20 poplar genome resources have been released with the rapid updated sequencing and bioinformation methods, including P. trichocarpa^[2,3], P. tremula^[4,5], P. tremuloides Michx.^[5], P. adenopoda^[6], P. lasiocarpa^[7], P. euphratica^[8,9], P. wilsonii^[10], P. alba × P. tremula var. glandulosa (clone 84K)^[11,12], P. deltoides cultivar I-69^[13], P. qiongdaoensis^[14], P. euphratica^[15], P. tomentosa Carr.^[16], Populus alba 'Berolinensis'^[17], P. alba var. pyramidalis^[18], P. alba^[19], P. davidiana Dode^[20], P. pruinosa^[21], P. ilicifolia^[22], P. tremula × P. alba (INRA 717-1B4)^[23], P. simonii^[24] and P. koreana^[25]. The released poplar genomes uncovered the molecular mechanisms of extreme environmental adaptation, sex determination, biosynthesis of secondary metabolites, tissue development, and allopolyploidization effects of heterosis in poplar^{[8−10,14,15,17,22,25]}.

Plants with high genomic heterozygosity usually potentially display heterosis, which endows plants with high productivity as well as strong abiotic stress resistance. Recently, several haplotype-resolved genome assemblies have been available for hybrid poplars^{[11,12,16,18,23]}. For instance, the sequence differences of key genes involved in stress response between sub-genomes of P. alba var. pyramidalis contribute to the stress resistance^[18]. The haplotype-resolved assemblies of P. tremula × P. alba uncovered the aspen-specific megabase satellite DNA PtaM147 repeats^[23]. Furthermore, the phased assembly discovered that the transcriptional bias occurred between the two subgenomes of 84K (P. tremula × P. alba), the genes from subgenome P. tremula were dominantly expressed^[11]. The low recombination ratio caused by the low sexual fertility has been found among the subgenomes of P. tomentosa may contribute to its high productivity and strong adaptation as 'fixed heterosis'^[16]. However, the information about heterozygosity on genomic features in poplar need to be further explored.

Poplar NL895 (P. deltoids × P. euramericana) is a fast-growing hybrid poplar species that plays an important role in timber production and carbon sequestration in south China^[26]. Due to the economic importance, lots of studies have been conducted with poplar NL895 to uncover the molecular mechanisms of adventitious root formation, nutrient uptake, growth, wood formation, callus formation as well as disease resistance^[27−33], which makes NL895 a potential model for genetic research on trees. Thus, a chromosome-scale, haplotype-resolved assembly of NL895 is urgently needed to facilitate the molecular studies in polar line NL895 and help understand the molecular basis for tree growth and development. Furthermore, the high heterozygosity of NL985 between the different haplotypes will shed light on the understanding of heterosis in woody plants. The raw assembly of NL895 for the diploid genome included 606 contigs with a total size of ~815 Mb and the monoploid genome included 246 contigs with a total size of ~412 Mb. The two haplotype genomes were also generated and both showed high quality. By using these genomic data, the heterozygosity of NL895 was comprehensively investigated. We found NL895 harbors more genomic variants, more gene diversity, and more variable gene expression patterns. These three characters would be attributed to the high heterosis of poplar line NL895. Taken together, the genome sequence for NL895 provided in this work is a valuable tree genomic resource and it would greatly facilitate studies in poplar.

Materials and methods

Plant materials

The poplar line NL895 was used as plant material in this study. NL895 is a cultivar generated through the cross between P. deltoides Bartr. cv. I-69 and P. × euramericana cv. I-45. Fresh leaves of tissue culture plants of NL895 were collected for DNA isolation and high-throughput chromosome conformation capture (Hi-C) library construction. For RNA preparations, different tissues including leaves, petioles, stems, shoots, and roots were also collected from tissue culture plants of NL895. DNA and RNA were isolated according to procedures reported in our previous studies^[34,35].

Genome sequences generation
The pair-end (PE) reads were sequenced by using BGI-seq500 platform. The raw reads were filtered by using the software of Trimmomatic (version 0.39)^[36] with default parameters. Continuous Long Reads (CLR) were generated by using PacBio single-molecule real-time (SMRT) sequencing technology for NL895. Isoform sequencing (Iso-Seq) of mRNA for NL895 was also conducted by using the Single-molecule real-time (SMRT) PacBio platform. The long reads were called and filtered by using SMRT tools (version 9.0.0.92188) with default parameters. For comparison of genomic variants between NL895 and other poplar lines, public online data (Project ID: PRJNA687326) was retrieved from the SRA database. The PRJNA687326 project collected 103 poplar accessions and did whole-genome re-sequenced to uncover the phylogenetic relationships and biogeography history of Populus^[37].

Kmer estimation, genome assembly and genome quality assessment
Kmer estimation was calculated with 'kmerfreq' implemented in GCE software^[38] with parameter setting as '-k 17'. Genome assembly of NL895 was performed by using several different software, including canu (Version: v1.9)^[39], hifiasm (Version: 0.19.5-r590)^[40], mecat2 (Update 20190304)^[41] and wtdbg (Version: 1.1.006)^[42] according the manual books of these pipelines. The best genome assembly was selected to polish using pilon^[43] (version 1.23) with default parameters. The polished genome assembly was then purged to generate the monoploid assembly by using the purge_dups pipeline (Version: 1.2.5). The un-purged genome assembly was regarded as diploid genome assembly. Pseudo-chromosomes were constructed for both diploid and monoploid by using Hi-C data and 3d-DNA (Version: 180922)^[44] and Juicebox (Version: 1.11.08)^[45] pipelines. Chromosome numbers were assigned according to sequence similarity with the P. trichocarpa genome^[2,3]. Genome assemblies were assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO)^[46] analysis (The lineage dataset: embryophyta_odb10, Creation date: 2020-09-10, number of species: 50, number of BUSCOs: 1614).

Repeat sequence identification, gene prediction, and functional annotation
Repeat sequenced of the two sets of genomes were annotated with a combination of homology-based and de novo approaches. The software programs RepeatModeler 2.0, LTR Finder v. 1.0.6, and RepeatMasker v.4.0.5 were used to perform this analysis^[47−49]. All parameters and pipelines were set according to our previous genome analyses^[47−49]. The repeat-masked monoploid and diploid genomes of NL895 were used for gene prediction. Gene predictions were performed with three different strategies, including ab initio, homology-based, and transcriptome-based predictions. Briefly, AUGUSTUS (Version: 3.4.0)^[50] was used to perform the ab initio prediction. The Exonerate pipelines^[51] (Version: exonerate-2.4.0) were used to perform homology-based prediction. Proteins of five plants, including A. thaliana, V. vinifera, O. sativa, P. trichocarpa and J. regia, were used as the homology database for this analysis. RNA-Seq and Iso-Seq of NL895 were used to perform the transcriptome-based predictions with the Program to Assemble Spliced Alignments^[52] (Version: PASApipeline.v2.4.1). Finally, all three evidence of gene structure evidence was integrated with EVidenceModeler^[51] pipeline (Version: EVM1.1.1). Default parameters were set for all three analyses and these pipelines were identical to our previous genome analyses^[47−49].

Gene expression analysis
A total of 60 RNA-Seq data generated from different tissues/treatments of NL895 in our previous studies^{[31, 34]} were collected for gene expression analysis. For mapping rate comparisons of RNA-Seq data among genomes of P. trichocarpa^[2,3], the monoploid and diploid NL895. Clean RNA-Seq was mapped onto these three genomes using Hisat2 with default parameters^[53]. For allele-specific expression (ASE) analysis, clean RNA-Seq data was mapped onto the diploid genome of NL895 by using Hisat2^[53] pipeline with setting mismatch as zero. Differential gene expression and reads per million mapped reads (RPKM) analyses were performed using edgeR (Version: 4.3)^[54] software with default parameters.

Genomic variants identification
For SNP/Indels identification among accessions, a standard GATK (Version: 4.2.2.0)^[55] pipeline was employed for SNP/Indel calling for different poplar lines. Briefly, PE reads from PRJNA687326 of 103 poplar accessions was mapped onto the 'A' chromosomes of the diploid genome of NL895. PE reads of NL895 were also employed in this analysis. PCR duplications of PE reads in each sample were then marked and removed. Subsequently, SNPs for each sample were identified. Finally, SNPs/Indels were filtered with the parameters 'QUAL <30.0 || QD < 2.0 || FS > 60.0 || SOR > 4.0'.

For identification of SNP/Indels between the two haplotypes of the NL895 diploid genome, an alignment between each chromosome pair was first performed and variants including SNPs and InDels were called in the collinearity blocks. Briefly, the two haplotype genomes of NL895 were first aligned with mummer (Version: 4.0.0)^[56] pipelines with parameters of '-l 15000 -i 80 -o 80'. Then, the best alignment for the two haplotypes were identified. Finally, the program 'show-snps' was implemented in mummer (Version: 4.0.0) and used to generate SNP/Indels between the two haplotypes.

Gene pairs identification and structure variations (SVs) analysis
For gene pair analysis, the two haplotype genomes of NL895 were first aligned with mummer (Version: 4.0.0)^[56] pipelines with parameters of '-l 15000 -i 80 -o 80'. A reciprocal blast was also performed for proteins of the two haplotype genomes. A pair of genes were defined as alleles according to their locations and similarity in the 19 chromosome pairs if two genes showed the top similarity to each other and they were also located in a collinearity block. Structure variations (SVs) between the 2 haplotypes of the NL895 genome were identified by using MUMandCo (Version: 3.8)^[57] pipelines with default parameters.

Discussion

The poplar line NL895 is an elite cultivar growing alongside the Yangtze River. Compared with other poplar lines, NL895 shows high heterosis. According to the genome analysis in this study, at least three evidence can explain the cause of heterosis. First, the genome of NL895 showed higher heterozygosity. Both Kmer and SNP analyses revealed that the heterozygosity of NL895 is ca. 2.75%. This number is much higher than an ordinary poplar line (Fig. 3d), as well as several other tree plants, such as 1.0% for walnut, 1.60% to1.62% for rubber tree and 0.79% for willow tree^[60−62]. In our previous study, an elite cultivar of American grapes was also found to harbor ca. 2.70% heterozygosity^[63]. Generally, individual plants with higher heterozygosity in a population would potentially show elite performance and higher heterosis^[64]. However, high heterozygosity of polar genomes would bring deleterious mutations and genomic islands of divergence within a poplar species^[65,66]. These phenomena would make NL895 not suitable to be used as parents for producing elite offspring. Therefore, as an elite cultivar, asexual propagation such as cottage would be the best strategy for generations of new plants with large numbers.

Second, higher gene diversity is in the NL895 genome. There are 88,687 genes in the diploid genome of NL895. Among these genes, 30,892 allelic pairs (69.67%) and 26,93 haplotype-specific genes were predicted. Usually, two genes in one allelic pair function identically or similarly. Thus, it indicated there were 56,985 (30,892 + 26,093) genes showing different functions. This number is much higher than 49,520 genes in the monoploid genome. More functional genes would bring heterosis to the poplar line NL895. Previously, phylogenomics of the genus populus reveals extensive interspecific gene flow and balancing selection^[67]. Considering the genomes of parents of NL895 are also heterozygous and they would obtain functional genes from other populus species. This would further increase gene diversity in the genome of NL895.

Third, varied gene expression patterns of haplotype-specific genes allowed NL895 to adapt to different environments better. The higher ability of environmental adaptions usually could bring higher heterosis and this phenomenon has already been reported in some studies^[68,69]. Varied gene expression could be the major driver for plant adaption^[70,71]. In NL895, there are 26,093 haplotype-specific genes and their average RPKM is ca. 5.4 among the 60 RNA-seq samples. Although the relative expression levels are much lower than alleles, the CV of RPKM for these 26,093 haplotype-specific genes is much higher. This data suggested that these 26,093 haplotype-specific genes showed a much higher variation of gene expression than alleles. In other words, haplotype-specific genes showed very high expression in some environments, while they showed very low expression in some other environments. The large variations of gene expression would bring NL895 a higher ability for environmental adaptions. Taken together, the high heterosis of NL895 was uncovered according to a comprehensive analysis of its high-quality genome. Thus, the high-quality genome resource allows us insight into the heterosis in NL895.

The poplar line NL895 is mainly grown the South China. In this region, the spring and summer seasons are generally characterized by high temperatures or high humidity. Some other lines, such as lines 717 (P. tremula × P. alba), shanxin yang (P. davidiana × P. bolleana), P. tomentosa, and black cottonwood (P. trichocarpa), that widely used in poplar studies can not grow well in fields of South China. Poplar line NL895 shows elite performances in this region both in the field and laboratory. More importantly, this line can grow fast and be easily transformed. In the past decade, there were some studies conducted by using NL895 as the major plant material and this line can potentially become a model plant for poplar biology study^{[27,28,30,34,35,72−78]}. However, the lack of genome resources hindered the use of this line. In this study, we provided the haplotype-resolved genome sequence for NL895 would greatly facilitate studies by using NL895 as the major plant material.

Conclusions

In this study, we generated a high-quality of genome assembly for poplar hybrid NL895. Both monoploid and diploid genomes were released. By assessing these two-genome assemblies with different parameters, both showed high quality. By taking advantage of these genome resources, we found NL895 harbors more genomic variants, more gene diversity, and more variable gene expression patterns. These three characters would be attributed to the high heterosis of poplar line NL895. Taken together, the haplotype-resolved genome sequence for NL895 is a valuable tree genomic resource and it would greatly facilitate studies in poplar.

Author contributions

The authors confirm contribution to the paper as follows: conducting the experiments: Luo J, Wang Y, Li Z, Wang Z, Cao X, Wang N; writing and editing the manuscript: Wang N, Luo J; organizing and supervising the whole project: Wang N, Luo J. All authors reviewed the results and approved the final version of the manuscript.

Features	Monoploid	Diploid	Haplotype A	Haplotype B
Assembly length (bp)	412,628,918	815,138,040	N/A	N/A
Contig N50 Length (bp)	14,829,479	13,599,823	N/A	N/A
Shortest sequence length (bp)	2,548	2,548	N/A	N/A
Longest sequence length (bp)	26,037,430	26,037,430	N/A	N/A
Total number of contigs	242	606	N/A	N/A
Final genome size (bp)	404,381,870	824,114,582*	377,691,676	378,545,799
GC content (%)	34.38	33.91	33.49	33.47
Protein-coding gene number	49,520	88,687	41,561	41,660
Transcript number	61,532	101,353	48,090	47,169
Average of gene length (bp)	3,218.7	3,150.6	3,231.7	3,188.5
Average of mRNA length (bp)	1,860.1	1,728.8	1,771.5	1,16.2
Average of CDS length (bp)	1,329.3	1,349.2	1,359.6	1,359.4
Exon number	321,605	519,987	252,863	244,402
Intron number	260,073	418,634	204,773	197,233
Average of exon length (bp)	355.9	337	336.9	331.2
Average of intron length (bp)	415.7	406.8	408	408.2
* This size includes ca. 67 Mb bp scaffolds that could not be assigned onto any haplotypes, below parameters were concluded according to 791,466,253 bp.

Type	Monoploid			Diploid
Type	Number	Length (bp)	Percent (%)	Number	Length (bp)	Percent (%)
LINEs	4,935	4,193,674	1.04	9,572	7,934,717	0.96
LTR elements	82,685	57,109,054	14.12	167,625	112,791,927	13.69
DNA transposons	18,295	13,945,655	3.45	37,091	27,637,781	3.35
Rolling-circles	1,504	869,410	0.21	2,934	1,669,040	0.2
Unclassified	314,000	108,124,463	26.74	647,665	221,061,595	26.82
Low complexity	20,817	992,939	0.25	43,019	2,076,941	0.25
Simple repeats	118,555	4,603,428	1.14	242,702	9,687,811	1.18
Total	560,791	189,838,623	46.95	1,112,622	382,859,812	46.46

{{lists.name}}

Haplotype-resolved genome assembly of poplar line NL895 provides a valuable tree genomic resource