Gap-free genome of <i>Durio zibethinus</i> cv. Chuongbo

Shenghao Wang; Junyu Zhang; Guilian Guo; Zhidong Li; Fei Chen; Wenquan Wang; Shenghao Wang; Junyu Zhang; Guilian Guo; Zhidong Li; Fei Chen; Wenquan Wang

doi:10.48130/tp-0026-0003

2026 Volume 5

Article Contents

Next Previous

ARTICLE Open Access

Gap-free genome of Durio zibethinus cv. Chuongbo

1.
National Key Laboratory for Tropical Crop Breeding, Sanya Institute of Breeding and Multiplication, Hainan University, Sanya 572025, China
2.
College of Tropical Agriculture and Forestry, Hainan University, Danzhou 571737, China

More Information

Corresponding authors: feichen@hainanu.edu.cn (Chen F); wangwenquan@itbb.org.cn (Wang W)

Received: 08 December 2025
Revised: 23 January 2026
Accepted: 12 February 2026
Published online: 13 March 2026
Tropical Plants 5, Article number: e004 (2026) | Cite this article

Highlights

The first gap-free genome assembly of Durio zibethinus cv. Chuongbo by integrating PacBio HiFi, Oxford Nanopore, and Hi-C sequencing data.

Protein sequences of D. zibethinus cv. Chuongbo were compared with others to evaluate gene family expansion and contraction.

Comparative genomic analyses dated the divergence between D. zibethinus cv. Chuongbo and Herrania umbratica to 35 million years ago.

A total of 38 TERT genes were identified across 27 Malvaceae species.
Abstract

Durian (Durio zibethinus L.) is a tropical fruit of substantial nutritional and economic value from the family Malvaceae. Several genome assemblies for durian have been reported previously, but these assemblies contain gaps that have restricted their completeness and hindered their practical utility for downstream research. Here, we present the first gap-free genome assembly of Durio zibethinus cv. Chuongbo by integrating PacBio HiFi, Oxford Nanopore, and Hi-C sequencing data. The assembled genome is 824.78 Mb across 28 chromosomes, with a scaffold N50 of 30.88 Mb and 44,024 protein-coding genes. Comparative genomic analyses dated the divergence between D. zibethinus cv. Chuongbo and Herrania umbratica to 35 million years ago, and between two durian cultivars to 2 million years ago. D. zibethinus cv. Chuongbo exhibits substantial gene family expansion and a high abundance of species-specific genes, reflecting key genomic innovations underlying its unique biological traits. Additionally, comparative analysis of the TERT gene family across 27 Malvaceae species uncovered strong evolutionary constraints that maintain a predominant single-copy configuration, with two copies identified in several Gossypium taxa. This high-quality, gap-free genome provides a foundational resource for elucidating genome architecture, gene evolution, and the molecular basis of unique traits in durian and related Malvaceae species.

Graphical Abstract
- Durio zibethinus cv. Chuongbo,
- Gap-free genome,
- Evolution,
- TERT,
- Malvaceae

Supplementary information

Supplementary Fig. S1 Self-alignment dot plots of the Chromosome 1 centromere. Plots shown are a systematic sample(windows 100-104, 500-504, 1000-1004, 1500-1504, 2000-2004) uniformly show the long parallel diagonals diagnostic of centromeric tandem repeats.
Supplementary Table S1 Flow Cytometric Analysis.
Supplementary Table S2 Summary of HiFi, ONT, and Hi-C reads results.
Supplementary Tables S3 Length of each pseudo-chromosome by Hi-C.
Supplementary Tables S4 Statistics of updated assembly by Hi-C sequencing in scaffold level.
Supplementary Table S5 Statistics of HiFi Reads and ONT Reads Alignment Results.
Supplementary Table S6 Telomere motif counts.
Supplementary Table S7 Centromeres motif counts.
Supplementary Table S8 Repeat sequence statistics.
Supplementary Table S9 Gene models and Gene function annotation.
Supplementary Table S10 The TERT gene family in Malvaceae.
Supplementary Table S11 Conserved motif sequences of the TERT gene family in Malvaceae.

Rights and permissions
Copyright: © 2026 by the author(s). Published by Maximum Academic Press on behalf of Hainan University. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Thorogood CJ, Ghazalli MN, Siti-Munirah MY, Nikong D, Kusuma YWC, et al. 2022. The king of fruits. Plants, People, Planet 4:538−547 doi: 10.1002/ppp3.10288 CrossRef Google Scholar
[2]	Shearman JR, Sonthirod C, Naktang C, Sangsrakru D, Yoocha T, et al. 2020. Assembly of the durian chloroplast genome using long PacBio reads. Scientific Reports 10(1):15980 doi: 10.1038/s41598-020-73549-4 CrossRef Google Scholar
[3]	Teh BT, Lim K, Yong CH, Ng CCY, Rao, SR, et al. 2017. The draft genome of tropical fruit durian (Durio zibethinus). Nature Genetics 49(11):1633−1641 doi: 10.1038/ng.3972 CrossRef Google Scholar
[4]	Nawae W, Naktang C, Charoensri S, U-thoomporn S, Narong N, et al. 2023. Resequencing of durian genomes reveals large genetic variations among different cultivars. Frontiers in Plant Science 14:1137077 doi: 10.3389/fpls.2023.1137077 CrossRef Google Scholar
[5]	Li W, Chen X, Yu J, Zhu Y. 2024. Upgraded durian genome reveals the role of chromosome reshuffling during ancestral karyotype evolution, lignin biosynthesis regulation, and stress tolerance. Science China Life Sciences 67(6):1266−1279 doi: 10.1007/s11427-024-2580-3 CrossRef Google Scholar
[6]	Ji X, Zhong Y, Zheng D, Xie S, Shi M et al. 2025. Chromosome-scale haploid genome assembly of Durio zibethinus KanYao. Scientific Data 12(1):384 doi: 10.1038/s41597-025-04656-y CrossRef Google Scholar
[7]	Peska V, Garcia S. 2020. Origin, diversity, and evolution of telomere sequences in plants. Frontiers in Plant Science 11:117 doi: 10.3389/fpls.2020.00117 CrossRef Google Scholar
[8]	Shay JW, Wright WE. 2019. Telomeres and telomerase: three decades of progress. Nature Reviews Genetics 20(5):299−309 doi: 10.1038/s41576-019-0099-1 CrossRef Google Scholar
[9]	Zakian VA. 2012. Telomeres: the beginnings and ends of eukaryotic chromosomes. Experimental Cell Research 318(12):1456−1460 doi: 10.1016/j.yexcr.2012.02.015 CrossRef Google Scholar
[10]	Lan L, Hu H, Jia Y, Zhang X, Jia M, et al. 2025. Tips for improving genome annotation quality. Genomics Communications 2:e005 doi: 10.48130/gcomm-0025-0006 CrossRef Google Scholar
[11]	Zhou Y, Zhang J, Xiong X, Cheng Z, Chen F. 2022. De novo assembly of plant complete genomes. Tropical Plants 1:7 doi: 10.48130/tp-2022-0007 CrossRef Google Scholar
[12]	Porebski S, Bailey LG, Baum BR. 1997. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Molecular Biology Reporter 15:8−15 doi: 10.1007/BF02772108 CrossRef Google Scholar
[13]	Dellaporta SL, Wood J, Hicks JB. 1983. A plant DNA minipreparation: version II. Plant Molecular Biology Reporter 1:19−21 doi: 10.1007/BF02712670 CrossRef Google Scholar
[14]	Marçais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764−770 doi: 10.1093/bioinformatics/btr011 CrossRef Google Scholar
[15]	Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, et al. 2017. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33(14):2202−2204 doi: 10.1093/bioinformatics/btx153 CrossRef Google Scholar
[16]	Feng X, Cheng H, Portik D, Li H. 2022. Metagenome assembly of high-fidelity long reads with hifiasm-meta. Nature Methods 19(6):671−674 doi: 10.1038/s41592-022-01478-3 CrossRef Google Scholar
[17]	Guan D, McCarthy SA, Wood J, Howe K, Wang Y, et al. 2020. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36(9):2896−2898. doi: 10.1093/bioinformatics/btaa025 CrossRef Google Scholar
[18]	Hu J, Fan J, Sun Z, Liu S. 2020. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36:2253−2255 doi: 10.1093/bioinformatics/btz891 CrossRef Google Scholar
[19]	Durand NC, Shamim MS, Machol I, Rao SSP, Huntley MH, et al. 2016. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Systems 3:95−98 doi: 10.1016/j.cels.2016.07.002 CrossRef Google Scholar
[20]	Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, et al. 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356(6333):92−95 doi: 10.1126/science.aal3327 CrossRef Google Scholar
[21]	Robinson JT, Turner D, Durand NC, Thorvaldsdóttir H, Mesirov JP, et al. 2018. Juicebox.js provides a cloud-based visualization system for Hi-C data. Cell Systems 6(2):256−258.e1 doi: 10.1016/j.cels.2018.01.001 CrossRef Google Scholar
[22]	Xu M, Guo L, Gu S, Wang O, Zhang R, et al. 2020. TGS-GapCloser: a fast and accurate gap closer for large genomes with low coverage of error-prone long reads. GigaScience 9(9):giaa094 doi: 10.1093/gigascience/giaa094 CrossRef Google Scholar
[23]	Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34(18):3094−3100 doi: 10.1093/bioinformatics/bty191 CrossRef Google Scholar
[24]	Manni M, Berkeley MR, Seppey M, Zdobnov EM. 2021. BUSCO: assessing genomic data quality and beyond. Current Protocols 1:e323 doi: 10.1002/cpz1.323 CrossRef Google Scholar
[25]	Rhie A, Walenz BP, Koren S, Phillippy AM. 2020. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21(1):245 doi: 10.1186/s13059-020-02134-9 CrossRef Google Scholar
[26]	Nevers Y, Warwick Vesztrocy A, Rossier V, Train CM, Altenhoff A, et al. 2025. Quality assessment of gene repertoire annotations with OMArk. Nature Biotechnology 43(1):124−133 doi: 10.1038/s41587-024-02147-w CrossRef Google Scholar
[27]	Lin Y, Ye C, Li X, Chen Q, Wu Y, et al. 2023. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Horticulture Research 10(8):uhad127 doi: 10.1093/hr/uhad127 CrossRef Google Scholar
[28]	Lan MF, Wang XY, Zhang XC. 2026. CentriVision: an integrated platform for multiscale centromere analysis in plants. Plant Communications 7(2):101689 doi: 10.1016/j.xplc.2025.101689 CrossRef Google Scholar
[29]	Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, et al. 2020. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences of the United States of America 117:9451−9457 doi: 10.1073/pnas.1921046117 CrossRef Google Scholar
[30]	Tarailo-Graovac M, Chen N. 2009. Using RepeatMasker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics 4:4.10. 1−4.10. 14 doi: 10.1002/0471250953.bi0410s25 CrossRef Google Scholar
[31]	Kim D, Langmead B, Salzberg SL. 2015. HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12:357−360 doi: 10.1038/nmeth.3317 CrossRef Google Scholar
[32]	Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, et al. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078−2079 doi: 10.1093/bioinformatics/btp352 CrossRef Google Scholar
[33]	Gabriel L, Brůna T, Hoff KJ, Ebel M, Lomsadze A, et al. 2024. BRAKER3: fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Genome Research 34(5):769−777 doi: 10.1101/gr.278090.123 CrossRef Google Scholar
[34]	Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, et al. 2017. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Molecular Biology and Evolution 34:2115−2122 doi: 10.1093/molbev/msx148 CrossRef Google Scholar
[35]	Blum M, Andreeva A, Florentino LC, Chuguransky SR, Grego T, et al. 2025. InterPro: the protein sequence classification resource in 2025. Nucleic Acids Research 53(D1):D444−D456 doi: 10.1093/nar/gkae1082 CrossRef Google Scholar
[36]	Sayers EW, Beck J, Bolton EE, Brister JR, Chan J, et al. 2025. Database resources of the national center for biotechnology information in 2025. Nucleic Acids Research 53(D1):D20−D29 doi: 10.1093/nar/gkae979 CrossRef Google Scholar
[37]	The UniProt Consortium. 2017. UniProt: the universal protein knowledgebase. Nucleic Acids Research 45(D1):D158−D169 doi: 10.1093/nar/gkw1099 CrossRef Google Scholar
[38]	Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, et al. 2021. Pfam: the protein families database in 2021. Nucleic Acids Research 49:D412−D419 doi: 10.1093/nar/gkaa913 CrossRef Google Scholar
[39]	Nawrocki EP, Eddy SR. 2013. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29(22):2933−2935 doi: 10.1093/bioinformatics/btt509 CrossRef Google Scholar
[40]	Ontiveros-Palacios N, Cooke E, Nawrocki EP, Triebel S, Marz M, et al. 2025. Rfam 15: RNA families database in 2025. Nucleic Acids Research 53(D1):D258−D267 doi: 10.1093/nar/gkae1023 CrossRef Google Scholar
[41]	Chan PP, Lin BY, Mak AJ, Lowe TM. 2021. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Research 49(16):9077−9096 doi: 10.1093/nar/gkab688 CrossRef Google Scholar
[42]	Emms DM, Kelly S. 2019. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology 20:238 doi: 10.1186/s13059-019-1832-y CrossRef Google Scholar
[43]	Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution 32:268−274 doi: 10.1093/molbev/msu300 CrossRef Google Scholar
[44]	Sanderson MJ. 2003. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19(2):301−302 doi: 10.1093/bioinformatics/19.2.301 CrossRef Google Scholar
[45]	Kumar S, Stecher G, Suleski M, Hedges SB. 2017. TimeTree: a resource for timelines, timetrees, and divergence times. Molecular Biology and Evolution 7:1812−1819 doi: 10.1093/molbev/msx116 CrossRef Google Scholar
[46]	Mendes FK, Vanderpool D, Fulton B, Hahn MW. 2020. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics 36:5516−5518 doi: 10.1093/bioinformatics/btaa1022 CrossRef Google Scholar
[47]	Bardou P, Mariette J, Escudié F, Djemiel C, Klopp C. 2014. jvenn: an interactive Venn diagram viewer. BMC Bioinformatics 15:293 doi: 10.1186/1471-2105-15-293 CrossRef Google Scholar
[48]	Finn RD, Clements J, Eddy SR. 2011. HMMER web server: interactive sequence similarity searching. Nucleic Acids Research 39:W29−W37 doi: 10.1093/nar/gkr367 CrossRef Google Scholar
[49]	Edgar RC. 2022. Muscle5: high-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nature Communications 13(1):6968 doi: 10.1038/s41467-022-34630-w CrossRef Google Scholar
[50]	Kumar S, Stecher G, Suleski M, Sanderford M, Sharma S, et al. 2024. MEGA12: molecular evolutionary genetic analysis version 12 for adaptive and green computing. Molecular Biology And Evolution 41(12):msae263 doi: 10.1093/molbev/msae263 CrossRef Google Scholar
[51]	Bailey TL, Johnson J, Grant CE, Noble WS. 2015. The MEME suite. Nucleic Acids Research 43:W39−W49 doi: 10.1093/nar/gkv416 CrossRef Google Scholar
[52]	Li H, Durbin R. 2024. Genome assembly in the telomere-to-telomere era. Nature Reviews Genetics 25(9):658−670 doi: 10.1038/s41576-024-00718-w CrossRef Google Scholar
[53]	Thuronyi BW, Koblan LW, Levy JM, Yeh WH, Zheng C, et al. 2019. Continuous evolution of base editors with expanded target compatibility and improved activity. Nature Biotechnology 37:1070−1079 doi: 10.1038/s41587-019-0193-0 CrossRef Google Scholar
[54]	Yang Y, Du W, Li Y, Lei J, Pan W. 2025. Recent advances and challenges in de novo genome assembly. Genomics Communications 2:e014 doi: 10.48130/gcomm-0025-0015 CrossRef Google Scholar
[55]	Husin NA, Rahman S, Karunakaran R, Bhore SJ. 2018. A review on the nutritional, medicinal, molecular and genome attributes of Durian (Durio zibethinus L.), the King of fruits in Malaysia. Bioinformation 14(6):265−270 doi: 10.6026/97320630014265 CrossRef Google Scholar
[56]	Wang P, Wang F. 2023. A proposed metric set for evaluation of genome assembly quality. Trends in Genetics 39(3):175−186 doi: 10.1016/j.tig.2022.10.005 CrossRef Google Scholar
[57]	Prihatini R, Anggraeni L, Hadiati S, Pramanik D, Nugroho K, et al. 2025. Genomic research on the king of fruit (Durio spp.): a systematic literature review. Genetic Resources and Crop Evolution 72:7619−7638 doi: 10.1007/s10722-025-02430-y CrossRef Google Scholar
[58]	Wang T, Duan S, Xu C, Wang Y, Zhang X, et al. 2023. Pan-genome analysis of 13 Malus accessions reveals structural and sequence variations associated with fruit traits. Nature Communications 14(1):7377 doi: 10.1038/s41467-023-43270-7 CrossRef Google Scholar
[59]	Cao S, Sawettalake N, Shen L. 2025. Lactuca super-pangenome provides insights into lettuce genome evolution and domestication. Nature Communications 16(1):7257 doi: 10.1038/s41467-025-62641-w CrossRef Google Scholar
[60]	Fajkus P, Peška V, Fajkus J, Sýkorová E. 2021. Origin and fates of TERT gene copies in polyploid plants. International Journal of Molecular Sciences 22(4):1783 doi: 10.3390/ijms22041783 CrossRef Google Scholar

About this article

Cite this article

Wang S, Zhang J, Guo G, Li Z, Chen F, et al. 2026. Gap-free genome of Durio zibethinus cv. Chuongbo. Tropical Plants 5: e004 doi: 10.48130/tp-0026-0003

Wang S, Zhang J, Guo G, Li Z, Chen F, et al. 2026. Gap-free genome of Durio zibethinus cv. Chuongbo. Tropical Plants 5: e004 doi: 10.48130/tp-0026-0003

Figures(3) / Tables(1)

Download PDF

Article Metrics

Article views(2952) PDF downloads(771)

Other Articles By Authors

on this site
on Google Scholar

HTML

Introduction

Durian (Durio zibethinus L.), a tropical plant belonging to the genus Durio in the family Malvaceae, originated in Borneo and Sumatra^[1]. It is widely cultivated in Southeast Asian countries such as Malaysia, Brunei, and Thailand^[2], with popular cultivars including Durio zibethinus cv. Musang King, Durio zibethinus cv. Monthong, and Durio zibethinus cv. KanYao. The fruit peel varies in color from green to brown, and the edible flesh consists of arils that range in hue from pale yellow and white to golden yellow. The first draft genome of durian was assembled, with a size of approximately 738 Mb, using PacBio HiFi reads and Chicago high-throughput chromosome conformation capture (Hi-C) scaffolding^[3]. With the advancement of third-generation sequencing technologies, multiple cultivars have been sequenced to date. Nawae et al. generated chromosome-level genome assemblies for Durio zibethinus cv. Kradumthong, Durio zibethinus cv. Monthong, and Durio zibethinus cv. Puangmanee with assembled sizes of 832.7, 762.6, and 821.6 Mb, respectively, and their annotations covering 95.7%, 92.4%, and 92.7% of the embryophyta core proteins^[4]. Li et al. integrated Illumina, PacBio HiFi, and Oxford Nanopore Technologies (ONT) ultra-long reads to generate a chromosome-level genome assembly of 777.8 Mb, which was further anchored to 28 chromosomes using Hi-C data, resulting in a chromosome-level assembly of 730.67 Mb. This assembly had a contig N50 of 14.23 Mb, and a scaffold N50 of 26.20 Mb, with 38,728 protein-coding genes annotated^[5]. Ji et al. initially employed Illumina, PacBio HiFi, ONT reads, and Hi-C data to assemble a contiguous and complete chromosome-level haploid genome of D. zibethinus cv. KanYao^[6]. While 19 chromosomes were assembled gap-free, nine chromosomes still contained residual gaps.

Telomeres are evolutionarily conserved fundamental structures in plant genomes, typically composed of short, tandemly repeated minisatellite sequences^[7]. Telomerase is a ribonucleoprotein complex consisting of two core components: the telomerase RNA component and telomerase reverse transcriptase (TERT)^[8]. As a key gene encoding a critical subunit of the telomerase complex, TERT serves to synthesize telomeric DNA at chromosome ends. This process compensates for the progressive shortening of telomere length during cell division, playing an indispensable role in maintaining chromosomal stability^[9]. While TERT family genes have been extensively identified and characterized in various plant species, their systematic identification and analysis remain unexplored in durian.

Nowadays, genome annotation itself faces common challenges, including the only partial conservation of sequence patterns, highly variable intron lengths, inconsistent intergenic distances, prevalent alternative splicing, transposable element (TE) insertions, and the presence of pseudogenes^[10]. In the durian genome, these challenges are compounded by its inherently high repetitive content and the numerous resulting assembly gaps. Together, they collectively hinder accurate gene model prediction, comprehensive variant detection, and in-depth exploration of functional elements within repetitive sequences. Therefore, to overcome these limitations and provide a foundational resource for reliable studies in species evolution, population genetics, and functional genomics, generating a telomere-to-telomere (T2T) genome assembly is crucial^[11]. In recent years, T2T genomes have been successfully assembled for multiple species, including diverse plants such as Arabidopsis thaliana, Oryza sativa, Vitis vinifera, Zea mays, Brassica rapa, Citrullus lanatus, and Solanum lycopersicum. These achievements provide both a methodological blueprint and empirical support for tackling the challenges of fully assembling the complex durian genome. However, currently available durian genomes are not only fragmented but also primarily represent major tropical cultivars. In contrast, D. zibethinus cv. Chuongbo exhibits dwarf stature and enhanced cold tolerance, which are crucial for expanding durian cultivation. Thus, obtaining a high-quality, complete durian genome for D. zibethinus cv. Chuongbo addresses a key technical gap in durian genomics and enables the elucidation of the genetic basis for its adaptive traits, directly providing targets for molecular breeding.

By integrating PacBio HiFi reads, Oxford Nanopore ultra-long reads, and Hi-C chromatin conformation capture technology, we report the first gap-free genome of D. zibethinus cv. Chuongbo. This gap-free genome will serve as a reference resource of unprecedented precision, facilitating functional genomics, evolutionary studies, and the dissection of the genetic basis underlying key agronomic traits in durian.

Materials and methods

Materials and sequencing

Plant materials of the diploid durian cultivar D. zibethinus cv. Chuongbo were procured from Lingshui, Hainan (China). The plant is dwarf, cold-resistant, has delicate flesh with a milky aroma, and is suitable for cultivation in Hainan. Fresh young leaves were immediately frozen in liquid nitrogen, and stored at −80 °C for DNA extraction. For PacBio HiFi sequencing, High-molecular-weight genomic DNA was extracted from 0.5 g fresh young leaves using a CTAB^[12] method and purified with a QIAGEN genomic DNA kit (cat. 13,323). After quality control, the DNA was sheared, size-selected (> 15 kb) using a PippinHT system, and used to construct a SMRTbell library (SMRTbell Prep Kit 3.0). Sequencing was performed on the PacBio Revio platform to generate HiFi reads. For Oxford Nanopore ultra-long sequencing, high-molecular-weight genomic DNA was separately extracted from 0.5 g of fresh young leaves using an SDS^[13] method to maximize the recovery of ultra-long fragments. The DNA was purified and subjected to quality control, including visual inspection, agarose gel electrophoresis, NanoDrop spectrophotometry, and Qubit fluorometry. Following this, target DNA fragments were size-selected using a BluePippin system. A sequencing library was prepared by ligating adapters using the SQK-LSK109 kit. The final library was quantified with Qubit and sequenced on an Oxford Nanopore PromethION platform. To generate Hi-C libraries, chromatin from 0.5 g of formaldehyde-cross-linked fresh young leaves was digested with DpnII, and the ends were filled in with biotinylated nucleotides before proximity ligation. The DNA was then sheared to 300−700 bp, and interaction fragments were captured using streptavidin beads. Library quality was verified by Qubit 3.0, Agilent 2,100 Bioanalyzer, and qPCR. Qualified libraries were sequenced on the MGI platform with paired-end 150 bp reads.

To capture a comprehensive transcriptomic profile across different tissues of D. zibethinus cv. Chuongbo, fresh samples of root, stem, leaf, flower, and fruit were collected, immediately snap-frozen in liquid nitrogen, and stored at −80 °C to preserve RNA integrity. Total RNA was then extracted from these tissues using the RNeasy Plant Mini Kit (Qiagen, Germany) for subsequent RNA sequencing (RNA-Seq) analysis.

Estimation of durian genome size
The genome size of durian was estimated by flow cytometric analysis using maize as an internal reference standard. Briefly, nuclei isolated from durian leaf tissues were mixed with maize nuclei in a defined ratio and co-stained with propidium iodide. The mixed nuclear suspension was then analyzed on a BD FACScalibur flow cytometer. Samples were excited with a 488 nm blue laser, and PI fluorescence intensity was measured using an appropriate emission filter. The genome size of durian was calculated based on the ratio of the mean fluorescence intensities of durian and maize nuclei, using the known genome size of maize (approximately 2.3 Gb) as the reference.

The k-mer analysis was performed as part of a comprehensive genome survey. Initially, PacBio HiFi reads were filtered to retain sequences with a minimum length of 1,000 bp, and an average Phred quality score ≥ Q20. Following this, Jellyfish v2.2.10^[14] was employed to conduct a frequency distribution analysis with k-mer size set to 21. Subsequently, GenomeScope v2.0^[15] was used to estimate the genome size, heterozygosity, and duplication rate.

Genome assembly and quality assessment
The durian genome was assembled and analyzed using an integrated pipeline that combines long-read sequencing and chromatin conformation capture technologies. The initial assembly was performed with Hifiasm v0.16.1^[16], utilizing both the filtered PacBio HiFi reads and the processed ONT reads. To ensure the purity of the initial assembly, the resulting contigs were screened against the NCBI non-redundant nucleotide (nt) database to identify and exclude any potential non-plant sequences (e.g., bacteria or fungi). Duplications were then removed using Purge_dups v1.2.6^[17] with the -2 -T parameters to obtain a non-redundant assembly. Subsequently, the assembled contigs were polished iteratively using NextPolish v1.4.1^[18] with the raw HiFi and ONT reads as references to enhance base-level accuracy. The polished contigs were scaffolded into chromosome-level assemblies using Hi-C data. The Hi-C data were processed through the Juicer v1.6^[19] pipeline to generate chromatin interaction matrices. These matrices were used by the 3D-DNA v180114^[20] software to perform chromatin conformation-guided assembly, anchoring, ordering, and orienting contigs onto chromosomes. The preliminary chromosomal models were manually reviewed and adjusted in the Juicebox v1.11.08^[21] assembly visualization tool, where the contact maps were used to correct mis-joins and orientations, resulting in a high-quality chromosome-scale genome. However, this assembly still contained 18 gaps (represented by 'N's) within the sequences. To address this, we performed a gap-closing step using TGS-GapCloser v1.2.1^[22], leveraging both HiFi reads and ONT reads. This process systematically filled the sequence gaps, effectively bridging intervals caused by complex repeats or regions of low coverage, and ultimately yielded a continuous, gap-free durian genome assembly.

A comprehensive quality assessment of the final genome assembly was conducted from multiple perspectives: Mapping rates of both ONT and HiFi reads to the final assembly were calculated using Minimap2 v2.1^[23] to evaluate data utilization and assembly inclusiveness. The completeness of the genome assembly was assessed with BUSCO v5.7.1^[24] (Benchmarking Universal Single-Copy Orthologs). The consensus quality value (QV) was evaluated using Merqury v1.3^[25] to estimate sequence accuracy. Annotation quality was validated using OMArk^[26], which assesses proteome completeness, consistency, and contamination relative to conserved gene families.

Telomere and centromere detection
For structural annotation of the genome, the QuarTeT^[27] tool was employed to scan chromosomal termini, successfully identifying the canonical telomeric repeat pattern (CCCTAAA). The same tool was used to search for potential centromeric repeat sequences across the genome. Subsequently, the distribution of these candidate sequences was visualized with CentriVision v1.0.1 (minlength = 10, windows = 4,000)^[28]. By analyzing the frequency distribution of these candidate repeat sequences along each chromosome, the approximate boundaries of the centromeric regions were inferred, providing crucial clues for subsequent studies.

Gene prediction and annotation
For repetitive sequence analysis, RepeatModeler v2.0.3^[29] was utilized to cluster repeats through the construction of a de novo repeat library. Subsequently, RepeatMasker v4.1.2^[30] was employed to identify repetitive sequences. For coding gene prediction, HISAT2 v2.1.0^[31] was then used to align all transcriptome data to the genome. The resulting SAM files were converted to BAM format using SAMtools v1.22^[32]. Subsequently, BRAKER3 v3.0.3^[33] was used for de novo gene prediction, which automatically trains species-specific parameters and annotates gene structures by integrating transcriptomic alignments (from root, stem, leaf, flower, and fruit tissues), and protein homology evidence from five closely related Malvaceae species: Theobroma cacao (GCF_000208745.1), Hibiscus cannabinus (GCA_047302245.1), Gossypium arboreum (GCF_025698485.1), Corchorus olitorius (GCA_001974825.2), and Bombax ceiba (https://figshare.com/articles/dataset/Genome_of_B_ceiba_and_C_pentandra/21708509). For functional annotation, the predicted protein-coding genes were queried against the eggNOG^[34], InterPro^[35], NR^[36], Swiss-Prot^[37], and Pfam^[38] databases. The program cmscan in Infernal^[39] was used to identify ribosomal RNA (rRNA), small nuclear RNA (snRNA), and microRNA (miRNA) sequences using the Rfam database^[40]. tRNAscan-SE^[41] was used to predict transfer RNA (tRNA) sequences.

Genome evolution analysis
The protein sequences of 12 other species were extracted from public databases. The sequences for the following nine species were obtained from the NCBI databases under the provided accession numbers: Oryza sativa (GCF_001433935.1), Arabidopsis thaliana (GCF_000001735.4), Vitis vinifera (GCF_000003745.3), Corchorus olitorius (GCA_001974825.2), Corchorus capsularis (GCA_001974805.1), Theobroma cacao (GCF_000208745.1), Herrania umbratica (GCF_002168275.1), Gossypium barbadense (GCA_008761655.1), and Gossypium raimondii (GCA_000327365.1). Additionally, the protein sequences for Bombax ceiba and Ceiba pentandra were sourced from the figshare repository (https://figshare.com/articles/dataset/Genome_of_B_ceiba_and_C_pentandra/21708509), while those for Durio zibethinus cv. KanYao were obtained from another figshare dataset (https://figshare.com/articles/dataset/Durian_genome_annotation/25237591). Orthologous gene families were clustered using OrthoFinder v2.5.5^[42] under default parameters. A maximum-likelihood phylogenetic tree was then constructed from the aligned single-copy genes with IQ-TREE v2.2.3^[43], employing 1,000 ultrafast bootstrap replicates. Divergence times were estimated using R8s v1.81^[44] in conjunction with calibration points obtained from the TimeTree^[45] website (www.timetree.org). The fossil calibration points used were O. sativa vs A. thaliana at 142.1−163.5 million years ago (MYA), A. thaliana vs V. vinifera at 109.8−124.4 MYA, C. olitorius vs. T. cacao at 19.1−59.4 MYA, and B. ceiba vs H. umbratica at 30.3−42.0 MYA. Gene family expansion and contraction analyses were performed with CAFE5^[46]. The gene families of T. cacao, H. umbratica, B. ceiba, C. pentandra, and D. zibethinus cv. Chuongbo were clustered using jvenn (http://jvenn.toulouse.inra.fr/app/example.html)^[47]. The protein sequences of durian-specific gene families were screened for GO and KEGG enrichment analyses.

TERT gene family analysis and comparison
The characteristic protein domain of the TERT family (PF12009) was downloaded from the Pfam database (http://pfam.xfam.org). An initial hidden Markov model (HMM) profile was built using the retrieved domain sequence. Potential TERT homologs were searched against the genome assemblies and annotated proteomes of the 27 Malvaceae species using hmmer v3.3.2^[48]. The screened sequences were compared with ClustalW, and the hidden Markov model of these verified sequences was constructed by hmmbuild. Finally, 38 TERT family members were screened. All TERT protein sequences were aligned with MUSCLE v5.3^[49], and the comparison results were uploaded to MEGA v12.0.13^[50]. The Neighbor-Joining (NJ) phylogenetic tree was constructed with 1,000 bootstrap replicates and the Maximum Composite Likelihood model. The TERT family protein motifs were analyzed using MEME v5.5.7^[51].

Genome	Durio zibethinus
Genome	Chuongbo	KanYao^[5]
Ploidy	2n = 56	2n = 56
Estimated genome size (Mb)	790	808.9
Assembled genome size (Mb)	824.78	777.8
Genomic heterozygosity (%)	1.08	1.4
Largest contig (Mb)	40.32	35.2
Contig N50 (Mb)	21.64	14.23
Number of scaffold	28	111
Largest scaffold (Mb)	46.4	36.3
Scaffold N50 (Mb)	30.88	22.7
Repeat sequence content (%)	62.18	60.85
GC content (%)	33.3	32.69
Number of genes	44,024	38,728
Gaps	0	83
QV	40.0	37.5
Genome BUSCOs (%)	99.0	99.06
Completeness OMArk (%)	97.1	94.45

{{lists.name}}

Gap-free genome of Durio zibethinus cv. Chuongbo

Highlights

Abstract