Chromosome-level genomes of seeded and seedless date plum based on third-generation DNA sequencing and Hi-C analysis

Weitao Mao; Guoxin Yao; Shangde Wang; Lei Zhou; Guosong Chen; Ningguang Dong; Guanglong Hu; Weitao Mao; Guoxin Yao; Shangde Wang; Lei Zhou; Guosong Chen; Ningguang Dong; Guanglong Hu

doi:10.48130/FR-2021-0009

2021 Volume 1

Article Contents

Next Previous

ARTICLE Open Access

Chromosome-level genomes of seeded and seedless date plum based on third-generation DNA sequencing and Hi-C analysis

1.
Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Ministry of Agriculture, Beijing Engineering Research Center for Deciduous Fruit Trees, Beijing Academy of Forestry and Pomology Sciences, Beijing 100093, China
2.
Hubei Key Laboratory of Quality Control of Characteristic Fruits and Vegetables, College of Life Science and Technology, Hubei Engineering University, Xiaogan 432000, China
3.
School of Life Science, Hubei University, Wuhan 430062, China
4.
Hubei Key Laboratory of Food Crop Germplasm and Genetic Improvement, Food Crops Institute, Hubei Academy of Agricultural Sciences, Wuhan 430072, China
5.
Beijing XinTaoYuan Commerce & Trading Co., Ltd., Beijing 101215, China
^# These authors contributed equally: Weitao Mao, Guoxin Yao

More Information

Corresponding authors: dongng@sina.com; hglcau@gmail.com

Received: 25 March 2021
Accepted: 14 May 2021
Published online: 27 May 2021
Forestry Research 1, Article number: 9 (2021) | Cite this article

An Author Correction to this article was published on 19 April 2023,

http://doi.org/10.48130/FR-2023-0010.

Abstract

Diospyros lotus L. (Date plum) is an important tree species that produces fruit with a high nutritional value. An accurate chromosomal assembly of a species facilitates research on chromosomal evolution and functional gene mapping. In this study, we assembled the first chromosome-level genomes of seeded and seedless D. lotus using Illumina short reads, PacBio long reads, and Hi-C technology. The assembled genomes comprising 15 chromosomes were 617.66 and 647.31 Mb in size, with a scaffold N50 of 40.72 and 42.67 Mb for the seedless and seeded D. lotus, respectively. A BUSCO analysis revealed that the seedless and seeded D. lotus genomes were 91.53% and 91.60% complete, respectively. Additionally, 20,689 (95.4%) and 22,844 (98.5%) protein-coding genes in the seedless and seeded D. lotus genomes were annotated, respectively. Comparisons of the chromosomes between genomes revealed inversions and translocations on chromosome 8 and inversions on chromosome 11. We identified 490 and 424 gene families that expanded in the seedless and seeded D. lotus, respectively. The enriched pathways among these gene families included the estrogen signaling pathway, the MAPK signaling pathway, and biosynthetic pathways for flavonoids, monoterpenoids, and glucosinolates. Moreover, we constructed the first Diospyros genome database (http://www.persimmongenome.cn). On the basis of our data, we developed the first high-quality annotated D. lotus reference genomes, which will be useful for genomic studies on persimmon and for clarifying the molecular mechanisms underlying important traits. Comparisons between the seeded and seedless D. lotus genomes may also elucidate the molecular basis of seedlessness.
- Diospyros lotus,
- genome assembly,
- seedlessness

Supplementary information

Supplemental Fig. S1 Length distribution comparison on total gene, CDS, exon, and intron of annotated gene models of the Seedless Diospyros lotus with other closely related species.
Supplemental Fig. S2 Length distribution comparison on total gene, CDS, exon, and intron of annotated gene models of the Seeded Diospyros lotus with other closely related species.
Supplemental Fig. S3 Gene family expansion and contraction analysis of 19 species.Green marks the number of expanding genes, and red marks the number of contracting genes.
Supplemental Table S1 Seedless Diospyros lotus Hi−C assisted assembly statistics.
Supplemental Table S2 Seeded Diospyros lotus Hi−C assisted assembly statistics.
Supplemental Table S3 Mapping rate of Illumina reads to Diospyros lotus genomes assembly.
Supplemental Table S4 Mapping rate of PacBio reads to Diospyros lotus genome assembly.
Supplemental Table S5 BUSCO notation assessment of the Diospyros lotus genomes.
Supplemental Table S6 Statistics of homozygous and heterozygous rates.
Supplemental Table S7 Classification of repetitive elements in Seedless Diospyros lotus genome.
Supplemental Table S8 Classification of repetitive elements in Seeded Diospyros lotus genome.
Supplemental Table S9 Seedless Diospyros lotus genome non-coding RNA annotation statistics.
Supplemental Table S10 Seeded Diospyros lotus genome non-coding RNA annotation statistics.
Supplemental Table S11 Gene family clustering of Diospyros lotus and other 17 species.
Supplemental Table S12 Seedless Diospyros lotus expansion gene KEGG enriched (p < 0.05).
Supplemental Table S13 Seedless Diospyros lotus expansion gene GO enriched (p < 0.05).
Supplemental Table S14 Seedless Diospyros lotus Contraction gene KEGG enriched (p < 0.05).
Supplemental Table S15 Seedless Diospyros lotus Contraction gene GO enrichedv (p < 0.05).
Supplemental Table S16 Seeded Diospyros lotus expansion gene KEGG enriched (p < 0.05).
Supplemental Table S17 Seeded Diospyros lotus expansion gene GO enriched (p < 0.05).
Supplemental Table S18 Seeded Diospyros lotus Contraction gene KEGG enriched (p < 0.05).
Supplemental Table S19 Seeded Diospyros lotus Contraction gene GO enriched (p < 0.05).

Rights and permissions
Copyright: © 2023 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Christophel D. 1982. Earliest floral evidence for the Ebenaceae in Australia. Nature 296:439−41 doi: 10.1038/296439a0 CrossRef Google Scholar
[2]	Duangjai S, Wallnöfer B, Samuel R, Munzinger J, Chase MW. 2006. Generic delimitation and relationships in Ebenaceae sensu lato: evidence from six plastid DNA regions. American Journal of Botany 93:1808−27 doi: 10.3732/ajb.93.12.1808 CrossRef Google Scholar
[3]	Turner B, Munzinger J, Duangjai S, Temsch EM, Stockenhuber R, et al. 2013. Molecular phylogenetics of New Caledonian Diospyros (Ebenaceae) using plastid and nuclear markers. Molecular Phylogenetics and Evolution 69:740−63 doi: 10.1016/j.ympev.2013.07.002 CrossRef Google Scholar
[4]	Loizzo MR, Said A, Tundis R, Hawas UW, Rashed K, et al. 2009. Antioxidant and Antiproliferative Activity of Diospyros lotus L. Extract and Isolated Compounds. Plant Foods Hum. Nutr. 64:264 doi: 10.1007/s11130-009-0133-0 CrossRef Google Scholar
[5]	Rauf A, Uddin G, Siddiqui BS, Muhammad N, Khan H. 2014. Antipyretic and antinociceptive activity of Diospyros lotus L. in animals. Asian Pac. J. Trop. Biomed. 4:S382−S386 doi: 10.12980/APJTB.4.2014C1020 CrossRef Google Scholar
[6]	Yang Y, Yang T, Jing Z. 2015. Genetic diversity and taxonomic studies of date plum (Diospyros lotus L.) using morphological traits and SCoT markers. Biochem. Syst. Ecol. 61:253−59 doi: 10.1016/j.bse.2015.06.008 CrossRef Google Scholar
[7]	Cho BO, Yin HH, Park SH, Byun EB, Ha HY, et al. 2016. Anti-inflammatory activity of myricetin from Diospyros lotus through suppression of NF-κB and STAT1 activation and Nrf2-mediated HO-1 induction in lipopolysaccharide-stimulated RAW264.7 macrophages. Biosci. Biotechnol. Biochem. 80:1520−30 doi: 10.1080/09168451.2016.1171697 CrossRef Google Scholar
[8]	Zhou R, Zhang X, Hu H, Li G, Song R. 2016. Plant regeneration from leaves of seedless date plum (Diospyros lotus L.). Northern Horticulture 40(22):104−6 doi: 10.11937/bfyy.201622026 CrossRef Google Scholar
[9]	Ali S, Khan AS, Raza SA, Naveed R, Rehman R. 2013. Innovative breeding methods to develop seedless citrus cultivars. International Journal of Biosciences 3:191−201 doi: 10.12692/ijb/3.8.191-201 CrossRef Google Scholar
[10]	Mesejo C, Martínez-Fuentes A, Reig C, Rivas F, Agustí M. 2006. The inhibitory effect of CuSO₄ on Citrus pollen germination and pollen tube growth and its application for the production of seedless fruit. Plant Science 170:37−43 doi: 10.1016/j.plantsci.2005.07.023 CrossRef Google Scholar
[11]	Sugiyama K, Morishita M. 2000. Production of seedless watermelon using soft-X-irradiated pollen. Scientia Horticulturae 84:255−64 doi: 10.1016/S0304-4238(99)00104-1 CrossRef Google Scholar
[12]	Mesejo C, Reig C, Martínez-Fuentes A, Agustí M. 2010. Parthenocarpic fruit production in loquat (Eriobotrya japonica Lindl.) by using gibberellic acid. Scientia Horticulturae 126:37−41 doi: 10.1016/j.scienta.2010.06.009 CrossRef Google Scholar
[13]	Doyle JJ, Doyle JL. 1986. A rapid DNA isolation procedure for small quantities of fresh leaf tissues. Phytochemical Bulletin 19:11−15 Google Scholar
[14]	Koren S, Walenz PB, Berlin K, Miller JR, Bergman NH, et al. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research 27:722−36 doi: 10.1101/gr.215087.116 CrossRef Google Scholar
[15]	Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, et al. 2013. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods 10:563−69 doi: 10.1038/nmeth.2474 CrossRef Google Scholar
[16]	Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, et al. 2014. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS One 9:e112963 doi: 10.1371/journal.pone.0112963 CrossRef Google Scholar
[17]	Roach MJ, Schmidt SA, Borneman AR. 2018. Purge Haplotigs: Allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19:460 doi: 10.1186/s12859-018-2485-7 CrossRef Google Scholar
[18]	Marçais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764−70 doi: 10.1093/bioinformatics/btr011 CrossRef Google Scholar
[19]	Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods 9:357−59 doi: 10.1038/nmeth.1923 CrossRef Google Scholar
[20]	Wingett S, Ewels P, Furlan-Magaril M, Nagano T, Schoenfelder S, et al. 2015. HiCUP: Pipeline for mapping and processing Hi-C data. F1000Research 4:1310 doi: 10.12688/f1000research.7334.1 CrossRef Google Scholar
[21]	Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, et al. 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356:92−95 doi: 10.1126/science.aal3327 CrossRef Google Scholar
[22]	Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754−60 doi: 10.1093/bioinformatics/btp324 CrossRef Google Scholar
[23]	Chaisson MJ, Tesler G. 2012. Mapping single molecule sequencing reads using Basic Local Alignment with Successive Refinement (BLASR): Theory and Application. BMC Bioinformatics 13:238 doi: 10.1186/1471-2105-13-238 CrossRef Google Scholar
[24]	Simão F, Waterhouse R, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210−12 doi: 10.1093/bioinformatics/btv351 CrossRef Google Scholar
[25]	Ou S, Jiang N. 2017. LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiology 176:1410−22 doi: 10.1104/pp.17.01310 CrossRef Google Scholar
[26]	Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, et al. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078−79 doi: 10.1093/bioinformatics/btp352 CrossRef Google Scholar
[27]	Lam HYK, Clark MJ, Chen R, Chen R, Natsoulis G, et al. 2012. Performance comparison of whole-genome sequencing platforms. Nature Biotechnology 30:78−82 doi: 10.1038/nbt.2065 CrossRef Google Scholar
[28]	McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. 2010. The genome analysis toolkit: A mapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20:1297−303 doi: 10.1101/gr.107524.110 CrossRef Google Scholar
[29]	Tarailo-Graovac M, Chen N. 2009. Using RepeatMasker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics 25:4.10.1−4.10.14 doi: 10.1002/0471250953.bi0410s25 CrossRef Google Scholar
[30]	Benson G. 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27:573−80 doi: 10.1093/nar/27.2.573 CrossRef Google Scholar
[31]	Stanke M, Keller O, Gunduz I, Hayes A, Waack S, et al. 2006. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research 34:W435−W439 doi: 10.1093/nar/gkl200 CrossRef Google Scholar
[32]	Gertz EM, Yu YK, Agarwala R, Schäffer AA, Altschul SF. 2006. Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST. BMC Biology 4:41 doi: 10.1186/1741-7007-4-41 CrossRef Google Scholar
[33]	Trapnell C, Pachter L, Salzberg SL. 2009. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105−11 doi: 10.1093/bioinformatics/btp120 CrossRef Google Scholar
[34]	Trapnell C, Roberts A, Goff L, Pertea G, Kim D, et al. 2012. Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and Cufflinks. Nature Protocols 7:562−78 doi: 10.1038/nprot.2012.016 CrossRef Google Scholar
[35]	Campbell MS, Holt C, Moore B, Yandell M. 2014. Genome annotation and curation using MAKER and MAKER-P. Current Protocols in Bioinformatics 48:4.11.1−4.11.39 doi: 10.1002/0471250953.bi0411s48 CrossRef Google Scholar
[36]	Lowe TM, Eddy SR. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research 25:955−64 doi: 10.1093/nar/25.5.955 CrossRef Google Scholar
[37]	Kalvari I, Nawrocki EP, Argasinska J, Quinones-Olvera N, Finn RD, et al. 2018. Non-coding RNA analysis using the Rfam database. Current Protocols in Bioinformatics 62:e51 doi: 10.1002/cpbi.51 CrossRef Google Scholar
[38]	Nawrocki EP, Kolbe DL, Eddy SR. 2009. Infernal 1.0: inference of RNA alignments. Bioinformatics 25:1335−37 doi: 10.1093/bioinformatics/btp157 CrossRef Google Scholar
[39]	Wang Y, Tang H, DeBarry J, Tan X, Li J, et al. 2012. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Research 40:e49 doi: 10.1093/nar/gkr1293 CrossRef Google Scholar
[40]	Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local aligment search tool. J. Mol. Biol. 215:403−10 doi: 10.1016/S0022-2836(05)80360-2 CrossRef Google Scholar
[41]	Emms DM, Kelly S. 2015. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biology 16:157 doi: 10.1186/s13059-015-0721-2 CrossRef Google Scholar
[42]	Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792−97 doi: 10.1093/nar/gkh340 CrossRef Google Scholar
[43]	Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, et al. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic Biology 59:307−21 doi: 10.1093/sysbio/syq010 CrossRef Google Scholar
[44]	Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24:1586−91 doi: 10.1093/molbev/msm088 CrossRef Google Scholar
[45]	De Bie T, Cristianini N, Demuth JP, Hahn MW. 2006. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22:1269−71 doi: 10.1093/bioinformatics/btl097 CrossRef Google Scholar
[46]	Martin G, Carreel F, Coriton O, Hervouet C, Cardi C, et al. 2017. Evolution of the banana genome (Musa acuminata) is impacted by large chromosomal translocations. Molecular Biology and Evolution 34:2140−52 doi: 10.1093/molbev/msx164 CrossRef Google Scholar
[47]	Copley RR, Letunic I, Bork P. 2002. Genome and protein evolution in eukaryotes. Curr. Opin. Chem. Biol. 6:39−45 doi: 10.1016/S1367-5931(01)00278-2 CrossRef Google Scholar
[48]	Danquah A, de Zelicourt A, Colcombet J, Hirt H. 2014. The role of ABA and MAPK signaling pathways in plant abiotic stress responses. Biotechnology Advances 32:40−52 doi: 10.1016/j.biotechadv.2013.09.006 CrossRef Google Scholar
[49]	Roudier F, Gissot L, Beaudoin F, Haslam R, Michaelson L, et al. 2010. Very-long-chain fatty acids are involved in polar auxin transport and developmental patterning in Arabidopsis. The Plant Cell 22:364−75 doi: 10.1105/tpc.109.071209 CrossRef Google Scholar
[50]	Duangjai S, Samuel R, Munzinger J, Forest F, Wallnöfer B, et al. 2009. A multi-locus plastid phylogenetic analysis of the pantropical genus Diospyros (Ebenaceae), with an emphasis on the radiation and biogeographic origins of the New Caledonian endemic species. Mol. Phylogenet. Evol. 52:602−20 doi: 10.1016/j.ympev.2009.04.021 CrossRef Google Scholar
[51]	Rauf A, Uddin G, Patel S, Khan A, Halim SA, et al. 2017. Diospyros, an under-utilized, multi-purpose plant genus: A review. Biomedicine Pharmacotherapy 91:714−30 doi: 10.1016/j.biopha.2017.05.012 CrossRef Google Scholar

About this article

Cite this article

Mao W, Yao G, Wang S, Zhou L, Chen G, et al. 2021. Chromosome-level genomes of seeded and seedless date plum based on third-generation DNA sequencing and Hi-C analysis. Forestry Research 1:9 doi: 10.48130/FR-2021-0009

Mao W, Yao G, Wang S, Zhou L, Chen G, et al. 2021. Chromosome-level genomes of seeded and seedless date plum based on third-generation DNA sequencing and Hi-C analysis. Forestry Research 1:9 doi: 10.48130/FR-2021-0009

Figures(7) / Tables(3)

Download PDF

Article Metrics

Article views(11599) PDF downloads(1141)

Library type	Seedless Diospyros lotus (W01)			Seeded Diospyros lotus (Yz01)
Library type	Library size (bp)	Clean data (Gb)	Coverage (×)	Library size (bp)	Clean data (Gb)	Coverage (×)
Illumina	350	80.99	119.53	350	79.21	114.98
Pacbio	20,000	92.1	103.29	20,000	133.51	166.98
Hi-C	350	86.96	−	350	107.87	−

Parameter	Seedless Diospyros lotus (W01)		Seeded Diospyros lotus (Yz01)
Parameter	Contig length (bp)	Contig number	Contig length (bp)	Contig number
N90	561,232	228	537,928	279
N80	1,144,354	151	1,078,450	194
N70	1,625,012	106	1,450,541	143
N60	2,258,638	73	2,059,392	106
N50	3,006,748	49	2,463,960	77
Total length	617,662,490	−	647,313,630	−
Number (≥ 100 bp)	−	706	−	743
Number (≥ 2 kb)	−	691	−	734
Max length	16,262,241	−	14,842,567	−

Type		Seedless Diospyros lotus (W01)		Seeded Diospyros lotus (Yz01)
Type		Number	Percent (%)	Number	Percent (%)
Total		21,684	−	23,193	−
Annotated		20,689	95.41	22,844	98.5
	InterPro	17,473	80.58	20,037	86.39
	GO	12,161	56.08	14,066	60.65
	KEGG ALL	20,547	94.76	22,750	98.09
	KEGG KO	8,435	38.90	9,812	42.31
	Swissprot	15,057	69.44	16,896	72.85
	TrEMBL	20,587	94.94	22,790	98.26
	TF	1,560	7.19	1,572	6.78
	Pfam	17,064	78.69	19,709	84.98
	NR	20,607	95.03	22,794	98.28
	KOG	17,990	82.96	20,208	87.13
Unannotated	−	995	4.59	349	1.50

{{lists.name}}

Chromosome-level genomes of seeded and seedless date plum based on third-generation DNA sequencing and Hi-C analysis